Gemma 4 26B A4B

Coding-tuned · Anthropic & OpenAI compatible · HuggingFace Spaces

Model ready IQ3_XXS · 11.2 GB ctx 4096 · 2 vCPU · 16 GB RAM temp 0.3 · top-k 20 · min-p 0.1
Claude Code setup
export ANTHROPIC_BASE_URL=\
  https://YOUR-USER-space-name.hf.space
export ANTHROPIC_API_KEY=gemma4-local

claude --model gemma-4-26b
OpenAI Python client
from openai import OpenAI
client = OpenAI(
  base_url="https://YOUR-SPACE.hf.space/v1",
  api_key="gemma4-local",
)
r = client.chat.completions.create(
  model="gemma-4-26b",
  messages=[{"role":"user",
    "content":"write binary search"}],
)
curl quick test
curl YOUR-SPACE.hf.space/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gemma-4-26b",
    "messages": [
      {"role":"user","content":"hello"}
    ]
  }'
First boot: The model (~11.2 GB) downloads on first start — allow 5–10 min. Watch the container logs for a live progress bar. /health returns model_loaded: false until ready. Subsequent restarts load from disk in ~60 s.
MethodPathNotes
GET/healthStatus + model_loaded
GET/v1/modelsModel list (OpenAI)
POST/v1/chat/completionsOpenAI-compatible · streaming supported
POST/v1/messagesAnthropic-compatible · used by Claude Code