synthetic
    /Dev
    • Back to app
    Usage-based pricing
    • Usage & Billing
    API Documentation
    API
    • Overview
    • Getting Started
    • Models
    OpenAI Reference
    • /models
    • /chat/completions
    • /completions
    • /embeddings
    Anthropic Reference
    • /messages
    • /messages/count_tokens
    Synthetic Reference
    • /quotas
    Guides
    • Claude Code
    • Octofriend by Synthetic
    • GitHub Copilot
    • Xcode Intelligence

    Available Models

    Subscriptions

    Always-On, LoRA, and Embeddings models are included in every subscription.

    Always-On Models

    These models are included in all Standard and Pro subscriptions. Per-token pricing is also available with usage-based billing.

    ModelProviderContext lengthStatus
    hf:MiniMaxAI/MiniMax-M2.1Synthetic192k tokens✓ Included
    hf:moonshotai/Kimi-K2-ThinkingSynthetic256k tokens✓ Included
    hf:zai-org/GLM-4.7Synthetic198k tokens✓ Included
    hf:deepseek-ai/DeepSeek-R1-0528Fireworks128k tokens✓ Included
    hf:deepseek-ai/DeepSeek-V3-0324Fireworks128k tokens✓ Included
    hf:deepseek-ai/DeepSeek-V3.1Fireworks128k tokens✓ Included
    hf:deepseek-ai/DeepSeek-V3.1-TerminusFireworks128k tokens✓ Included
    hf:deepseek-ai/DeepSeek-V3.2Fireworks159k tokens✓ Included
    hf:meta-llama/Llama-3.3-70B-InstructFireworks128k tokens✓ Included
    hf:meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8Fireworks524k tokens✓ Included
    hf:MiniMaxAI/MiniMax-M2Fireworks192k tokens✓ Included
    hf:moonshotai/Kimi-K2-Instruct-0905Fireworks256k tokens✓ Included
    hf:openai/gpt-oss-120bFireworks128k tokens✓ Included
    hf:Qwen/Qwen3-235B-A22B-Instruct-2507Fireworks256k tokens✓ Included
    hf:Qwen/Qwen3-Coder-480B-A35B-InstructFireworks256k tokens✓ Included
    hf:Qwen/Qwen3-VL-235B-A22B-InstructFireworks250k tokens✓ Included
    hf:zai-org/GLM-4.5Fireworks128k tokens✓ Included
    hf:zai-org/GLM-4.6Fireworks198k tokens✓ Included
    hf:deepseek-ai/DeepSeek-V3Together AI128k tokens✓ Included
    hf:Qwen/Qwen3-235B-A22B-Thinking-2507Together AI256k tokens✓ Included

    LoRA Models

    What's a LoRA?

    Low-rank adapters — called "LoRAs" — are small, efficient fine-tunes that run on top of existing models. They can modify a model to be much more effective at specific tasks.

    We support LoRAs for the following base models:

    ModelProviderContext lengthStatus
    meta-llama/Llama-3.2-1B-InstructTogether AI128k tokens✓ Included
    meta-llama/Llama-3.2-3B-InstructTogether AI128k tokens✓ Included
    meta-llama/Meta-Llama-3.1-8B-InstructTogether AI128k tokens✓ Included
    meta-llama/Meta-Llama-3.1-70B-InstructTogether AI128k tokens✓ Included

    Embedding Models

    Embedding models convert text into numerical vectors for search, clustering, and other applications.

    There's no additional charge for using embeddings, and embeddings requests don't count against your subscription rate limit.

    ModelProviderContext lengthStatus
    hf:nomic-ai/nomic-embed-text-v1.5Fireworks8k tokens✓ Included

    On-Demand Models

    Beyond our always-on models, you can run (almost) any model from Hugging Face on-demand.

    Simply provide the Hugging Face model name in your API request, and we'll automatically boot up a GPU cluster and run it for you.

    For GPU pricing details, see our pricing page.

    Getting Started

    Ready to start using our models? Check out:

    • Getting Started Guide - Your first API call
    • chat/completions - Most popular endpoint for conversations

    Need help choosing the right model? Join our Discord community for recommendations!