Gemma 4 26B Coding API

Claude Code setup

export ANTHROPIC_BASE_URL=\
  https://YOUR-USER-space-name.hf.space
export ANTHROPIC_API_KEY=gemma4-local

claude --model gemma-4-26b

OpenAI Python client

from openai import OpenAI
client = OpenAI(
  base_url="https://YOUR-SPACE.hf.space/v1",
  api_key="gemma4-local",
)
r = client.chat.completions.create(
  model="gemma-4-26b",
  messages=[{"role":"user",
    "content":"write binary search"}],
)

curl quick test

curl YOUR-SPACE.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-26b",
    "messages": [
      {"role":"user","content":"hello"}
    ]
  }'

First boot: The model (~11.2 GB) downloads on first start — allow 5–10 min. Watch the container logs for a live progress bar. /health returns model_loaded: false until ready. Subsequent restarts load from disk in ~60 s.

Method	Path	Notes
GET	/health	Status + model_loaded
GET	/v1/models	Model list (OpenAI)
POST	/v1/chat/completions	OpenAI-compatible · streaming supported
POST	/v1/messages	Anthropic-compatible · used by Claude Code

Gemma 4 26B A4B