Using Claude Code with Open Models: A Complete Beginner's Guide

Using Claude Code with Open Models: A Complete Beginner's Guide
Photo by Luca Bravo / Unsplash

Imagine you have a powerful coding assistant called Claude Code that helps you write and debug code right from your terminal (command line). Normally, Claude Code connects to Anthropic's servers to use their AI models. But what if you could use the same tool with free, open-source AI models instead? That's exactly what this guide will teach you!

Why would you want to do this?

  • Save money: Open models can be cheaper or even free to run
  • More control: You can host models on your own servers
  • Privacy: Keep your code and data within your own infrastructure
  • Experimentation: Try different AI models to see which works best for your needs

Understanding the Key Concepts

Before we dive in, let's break down some important terms:

What is Claude Code?

Claude Code is a command-line tool that lets you interact with AI models to help with coding tasks. Think of it as having a coding buddy that lives in your terminal and can help you write, review, and debug code.

What are Open Models?

Recently, OpenAI released GPT-OSS (Open Source Software) models - these are AI models similar to ChatGPT but with open weights, meaning anyone can download and run them. There's also Qwen3-Coder, another powerful coding-focused AI model. These are called "open" because unlike proprietary models, you can host them yourself!

The Magic Trick: API Compatibility

Here's the clever part: Claude Code expects to talk to Anthropic's servers, but we can trick it into talking to other AI models instead! It's like putting a different engine in your car while keeping the same dashboard - everything looks the same from the driver's seat, but under the hood, it's completely different.

Prerequisites: What You'll Need

Think of these as your ingredients before cooking:

  1. Claude Code version 0.5.3 or higher - Check by typing claude --version in your terminal
  2. A Hugging Face account - This is like GitHub but for AI models (free to create)
  3. Either:
    • Hugging Face credits to run models on their servers, OR
    • An OpenRouter account (a service that gives you access to many models with one API key)

Method 1: Using Hugging Face (Self-Hosting)

This method is like renting a powerful computer in the cloud to run your AI model.

Step 1: Choose Your Model

First, pick which AI model you want to use:

  • GPT-OSS-20B: A smaller, faster model (20 billion parameters)
  • GPT-OSS-120B: A larger, more powerful model (120 billion parameters)
  • Qwen3-Coder: Specifically optimized for coding tasks

Think of parameters like the model's "brain cells" - more parameters usually means smarter responses but also costs more to run.

Step 2: Accept the License

  1. Go to the model's page on Hugging Face (like visiting a software download page)
  2. Click "Accept" on the Apache-2.0 license (this is a very permissive open-source license)

Step 3: Create Your Endpoint

An endpoint is like a phone number for your AI model - it's how Claude Code will know where to send requests.

  1. On the model page, click Deploy → Inference Endpoint
  2. Select Text Generation Inference (TGI) template
  3. Important: Check the box for "Enable OpenAI compatibility" - this is what makes the magic work!
  4. Choose your hardware:
    • CPU: Cheapest but slowest (like using a regular computer)
    • A10G GPU: Faster, moderate cost (like using a gaming computer)
    • A100 GPU: Fastest but most expensive (like using a supercomputer)

Step 4: Configure Claude Code

Now we need to tell Claude Code where to find your model. We do this using environment variables - think of these as sticky notes that tell programs important information.

In your terminal, type:

bash

export ANTHROPIC_BASE_URL="https://your-endpoint-url.huggingface.cloud"
export ANTHROPIC_AUTH_TOKEN="hf_yourtoken123456"
export ANTHROPIC_MODEL="gpt-oss-20b"

What's happening here?

  • ANTHROPIC_BASE_URL: This is like changing the delivery address - instead of Anthropic, packages go to your Hugging Face endpoint
  • ANTHROPIC_AUTH_TOKEN: Your password to access the endpoint
  • ANTHROPIC_MODEL: Which specific model to use

Step 5: Test It Out!

Run this command:

bash

claude --model gpt-oss-20b

If everything worked, Claude Code is now using your open model instead of Anthropic's!

Method 2: Using OpenRouter (The Easy Way)

If Method 1 seemed complicated, OpenRouter is like ordering from a restaurant instead of cooking yourself - someone else handles all the complex stuff.

Step 1: Get an OpenRouter Account

  1. Sign up at openrouter.ai
  2. Copy your API key (it starts with or_)

Step 2: Configure Claude Code

Set these environment variables:

bash

export ANTHROPIC_BASE_URL="https://openrouter.ai/api/v1"
export ANTHROPIC_AUTH_TOKEN="or_yourkey123456"
export ANTHROPIC_MODEL="openai/gpt-oss-20b"

Step 3: Start Using It!

bash

claude --model openai/gpt-oss-20b

That's it! OpenRouter handles all the complex server stuff for you.

Advanced: Using LiteLLM (For Power Users)

If you want to use multiple different models and switch between them easily, LiteLLM acts like a smart switchboard operator.

What is LiteLLM?

LiteLLM is a proxy - imagine it as a receptionist that takes your calls and forwards them to the right department. You can tell it "send coding questions to Model A, but send writing tasks to Model B."

Setting Up LiteLLM

  1. Create a configuration file (like writing a phone directory):

yaml

model_list:
  - model_name: gpt-oss-20b
    litellm_params:
      model: openai/gpt-oss-20b
      api_key: your_openrouter_key
  - model_name: qwen3-coder
    litellm_params:
      model: qwen/qwen3-coder-480b
      api_key: your_openrouter_key
  1. Start the LiteLLM proxy (like turning on the switchboard)
  2. Point Claude Code to LiteLLM instead of directly to the models

Bonus: Using Claude Flow for Advanced Features

Claude Flow is like adding a turbo boost to your setup. It can:

  • Coordinate multiple AI agents working together (like a team of coders)
  • Track costs across different models
  • Save conversation history between sessions
  • Distribute complex tasks across multiple model instances

To enable it:

bash

# Install Claude Flow
claude mcp add claude-flow npx claude-flow@alpha mcp start

# Enable advanced features
export CLAUDE_FLOW_HOOKS_ENABLED="true"
export CLAUDE_FLOW_TELEMETRY_ENABLED="true"

Common Problems and Solutions

"I get a 404 error!"

Problem: The model can't understand Claude Code's requests Solution: Make sure you enabled "OpenAI compatibility" when setting up your endpoint

"The responses are empty"

Problem: Claude Code is asking for the wrong model Solution: Double-check that your ANTHROPIC_MODEL environment variable matches exactly

"It's really slow the first time"

Problem: The model needs to "wake up" if it's been idle Solution: This is normal - the first request takes longer, then it speeds up

"It's costing too much!"

Problem: Large models on powerful hardware burn through credits quickly Solution:

  • Use smaller models for simple tasks
  • Set up auto-scaling limits in Hugging Face
  • Consider OpenRouter's pay-per-use pricing

Tips for Success

  1. Start Small: Begin with the 20B model before trying larger ones
  2. Monitor Costs: Both Hugging Face and OpenRouter charge by usage - set spending limits!
  3. Test Thoroughly: Different models have different strengths - experiment to find what works
  4. Keep Keys Secret: Never share your API tokens - treat them like passwords

Happy coding with your new AI assistants!