Using Claude Code with Open Models: A Complete Beginner's Guide
Imagine you have a powerful coding assistant called Claude Code that helps you write and debug code right from your terminal (command line). Normally, Claude Code connects to Anthropic's servers to use their AI models. But what if you could use the same tool with free, open-source AI models instead? That's exactly what this guide will teach you!
Why would you want to do this?
- Save money: Open models can be cheaper or even free to run
- More control: You can host models on your own servers
- Privacy: Keep your code and data within your own infrastructure
- Experimentation: Try different AI models to see which works best for your needs
Understanding the Key Concepts
Before we dive in, let's break down some important terms:
What is Claude Code?
Claude Code is a command-line tool that lets you interact with AI models to help with coding tasks. Think of it as having a coding buddy that lives in your terminal and can help you write, review, and debug code.
What are Open Models?
Recently, OpenAI released GPT-OSS (Open Source Software) models - these are AI models similar to ChatGPT but with open weights, meaning anyone can download and run them. There's also Qwen3-Coder, another powerful coding-focused AI model. These are called "open" because unlike proprietary models, you can host them yourself!
The Magic Trick: API Compatibility
Here's the clever part: Claude Code expects to talk to Anthropic's servers, but we can trick it into talking to other AI models instead! It's like putting a different engine in your car while keeping the same dashboard - everything looks the same from the driver's seat, but under the hood, it's completely different.
Prerequisites: What You'll Need
Think of these as your ingredients before cooking:
- Claude Code version 0.5.3 or higher - Check by typing
claude --version
in your terminal - A Hugging Face account - This is like GitHub but for AI models (free to create)
- Either:
- Hugging Face credits to run models on their servers, OR
- An OpenRouter account (a service that gives you access to many models with one API key)
Method 1: Using Hugging Face (Self-Hosting)
This method is like renting a powerful computer in the cloud to run your AI model.
Step 1: Choose Your Model
First, pick which AI model you want to use:
- GPT-OSS-20B: A smaller, faster model (20 billion parameters)
- GPT-OSS-120B: A larger, more powerful model (120 billion parameters)
- Qwen3-Coder: Specifically optimized for coding tasks
Think of parameters like the model's "brain cells" - more parameters usually means smarter responses but also costs more to run.
Step 2: Accept the License
- Go to the model's page on Hugging Face (like visiting a software download page)
- Click "Accept" on the Apache-2.0 license (this is a very permissive open-source license)
Step 3: Create Your Endpoint
An endpoint is like a phone number for your AI model - it's how Claude Code will know where to send requests.
- On the model page, click Deploy → Inference Endpoint
- Select Text Generation Inference (TGI) template
- Important: Check the box for "Enable OpenAI compatibility" - this is what makes the magic work!
- Choose your hardware:
- CPU: Cheapest but slowest (like using a regular computer)
- A10G GPU: Faster, moderate cost (like using a gaming computer)
- A100 GPU: Fastest but most expensive (like using a supercomputer)
Step 4: Configure Claude Code
Now we need to tell Claude Code where to find your model. We do this using environment variables - think of these as sticky notes that tell programs important information.
In your terminal, type:
bash
export ANTHROPIC_BASE_URL="https://your-endpoint-url.huggingface.cloud"
export ANTHROPIC_AUTH_TOKEN="hf_yourtoken123456"
export ANTHROPIC_MODEL="gpt-oss-20b"
What's happening here?
ANTHROPIC_BASE_URL
: This is like changing the delivery address - instead of Anthropic, packages go to your Hugging Face endpointANTHROPIC_AUTH_TOKEN
: Your password to access the endpointANTHROPIC_MODEL
: Which specific model to use
Step 5: Test It Out!
Run this command:
bash
claude --model gpt-oss-20b
If everything worked, Claude Code is now using your open model instead of Anthropic's!
Method 2: Using OpenRouter (The Easy Way)
If Method 1 seemed complicated, OpenRouter is like ordering from a restaurant instead of cooking yourself - someone else handles all the complex stuff.
Step 1: Get an OpenRouter Account
- Sign up at openrouter.ai
- Copy your API key (it starts with
or_
)
Step 2: Configure Claude Code
Set these environment variables:
bash
export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"
export ANTHROPIC_AUTH_TOKEN="or_yourkey123456"
export ANTHROPIC_MODEL="openai/gpt-oss-20b"
Step 3: Start Using It!
bash
claude --model openai/gpt-oss-20b
That's it! OpenRouter handles all the complex server stuff for you.
Advanced: Using LiteLLM (For Power Users)
If you want to use multiple different models and switch between them easily, LiteLLM acts like a smart switchboard operator.
What is LiteLLM?
LiteLLM is a proxy - imagine it as a receptionist that takes your calls and forwards them to the right department. You can tell it "send coding questions to Model A, but send writing tasks to Model B."
Setting Up LiteLLM
- Create a configuration file (like writing a phone directory):
yaml
model_list:
- model_name: gpt-oss-20b
litellm_params:
model: openai/gpt-oss-20b
api_key: your_openrouter_key
- model_name: qwen3-coder
litellm_params:
model: qwen/qwen3-coder-480b
api_key: your_openrouter_key
- Start the LiteLLM proxy (like turning on the switchboard)
- Point Claude Code to LiteLLM instead of directly to the models
Bonus: Using Claude Flow for Advanced Features
Claude Flow is like adding a turbo boost to your setup. It can:
- Coordinate multiple AI agents working together (like a team of coders)
- Track costs across different models
- Save conversation history between sessions
- Distribute complex tasks across multiple model instances
To enable it:
bash
# Install Claude Flow
claude mcp add claude-flow npx claude-flow@alpha mcp start
# Enable advanced features
export CLAUDE_FLOW_HOOKS_ENABLED="true"
export CLAUDE_FLOW_TELEMETRY_ENABLED="true"
Common Problems and Solutions
"I get a 404 error!"
Problem: The model can't understand Claude Code's requests Solution: Make sure you enabled "OpenAI compatibility" when setting up your endpoint
"The responses are empty"
Problem: Claude Code is asking for the wrong model Solution: Double-check that your ANTHROPIC_MODEL
environment variable matches exactly
"It's really slow the first time"
Problem: The model needs to "wake up" if it's been idle Solution: This is normal - the first request takes longer, then it speeds up
"It's costing too much!"
Problem: Large models on powerful hardware burn through credits quickly Solution:
- Use smaller models for simple tasks
- Set up auto-scaling limits in Hugging Face
- Consider OpenRouter's pay-per-use pricing
Tips for Success
- Start Small: Begin with the 20B model before trying larger ones
- Monitor Costs: Both Hugging Face and OpenRouter charge by usage - set spending limits!
- Test Thoroughly: Different models have different strengths - experiment to find what works
- Keep Keys Secret: Never share your API tokens - treat them like passwords
Resources and Links
- Claude Code Documentation
- Hugging Face Inference Endpoints
- OpenRouter Documentation
- LiteLLM Documentation
- Claude Flow GitHub
Happy coding with your new AI assistants!