Available Models

Subscriptions

Always-On, LoRA, and Embeddings models are included in every subscription.

Always-On Models

These models are included in all Standard and Pro subscriptions. Per-token pricing is also available with usage-based billing.

Model	Context length	Status
hf:deepseek-ai/DeepSeek-R1	128k tokens	✓ Included
hf:deepseek-ai/DeepSeek-R1-0528	128k tokens	✓ Included
hf:deepseek-ai/DeepSeek-V3	128k tokens	✓ Included
hf:deepseek-ai/DeepSeek-V3-0324	128k tokens	✓ Included
hf:deepseek-ai/DeepSeek-V3.1	128k tokens	✓ Included
hf:deepseek-ai/DeepSeek-V3.1-Terminus	128k tokens	✓ Included
hf:meta-llama/Llama-3.1-405B-Instruct	128k tokens	✓ Included
hf:meta-llama/Llama-3.1-70B-Instruct	128k tokens	✓ Included
hf:meta-llama/Llama-3.1-8B-Instruct	128k tokens	✓ Included
hf:meta-llama/Llama-3.3-70B-Instruct	128k tokens	✓ Included
hf:meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8	524k tokens	✓ Included
hf:meta-llama/Llama-4-Scout-17B-16E-Instruct	328k tokens	✓ Included
hf:moonshotai/Kimi-K2-Instruct	128k tokens	✓ Included
hf:moonshotai/Kimi-K2-Instruct-0905	256k tokens	✓ Included
hf:openai/gpt-oss-120b	128k tokens	✓ Included
hf:Qwen/Qwen2.5-Coder-32B-Instruct	32k tokens	✓ Included
hf:Qwen/Qwen3-235B-A22B-Instruct-2507	256k tokens	✓ Included
hf:Qwen/Qwen3-235B-A22B-Thinking-2507	256k tokens	✓ Included
hf:Qwen/Qwen3-Coder-480B-A35B-Instruct	256k tokens	✓ Included
hf:zai-org/GLM-4.5	128k tokens	✓ Included

LoRA Models

What's a LoRA?

Low-rank adapters — called "LoRAs" — are small, efficient fine-tunes that run on top of existing models. They can modify a model to be much more effective at specific tasks.

We support LoRAs for the following base models:

Model	Context length	Status
meta-llama/Llama-3.2-1B-Instruct	128k tokens	✓ Included
meta-llama/Llama-3.2-3B-Instruct	128k tokens	✓ Included
meta-llama/Meta-Llama-3.1-8B-Instruct	128k tokens	✓ Included
meta-llama/Meta-Llama-3.1-70B-Instruct	128k tokens	✓ Included

Embedding Models

Embedding models convert text into numerical vectors for search, clustering, and other applications.

There's no additional charge for using embeddings, and embeddings requests don't count against your subscription rate limit.

Model	Context length	Status
hf:nomic-ai/nomic-embed-text-v1.5	8k tokens	✓ Included

On-Demand Models

Beyond our always-on models, you can run (almost) any model from Hugging Face on-demand.

Simply provide the Hugging Face model name in your API request, and we'll automatically boot up a GPU cluster and run it for you.

For GPU pricing details, see our pricing page.

Getting Started

Ready to start using our models? Check out:

Getting Started Guide - Your first API call
chat/completions - Most popular endpoint for conversations

Need help choosing the right model? Join our Discord community for recommendations!