Available Models
Subscriptions
Always-On, LoRA, and Embeddings models are included in every subscription.
Always-On Models
These models are included in all Standard and Pro subscriptions. Per-token pricing is also available with usage-based billing.
LoRA Models
What's a LoRA?
Low-rank adapters — called "LoRAs" — are small, efficient fine-tunes that run on top of existing models. They can modify a model to be much more effective at specific tasks.
We support LoRAs for the following base models:
Model | Context length | Status |
---|---|---|
meta-llama/Llama-3.2-1B-Instruct | 128k tokens | ✓ Included |
meta-llama/Llama-3.2-3B-Instruct | 128k tokens | ✓ Included |
meta-llama/Meta-Llama-3.1-8B-Instruct | 128k tokens | ✓ Included |
meta-llama/Meta-Llama-3.1-70B-Instruct | 128k tokens | ✓ Included |
Embedding Models
Embedding models convert text into numerical vectors for search, clustering, and other applications.
There's no additional charge for using embeddings, and embeddings requests don't count against your subscription rate limit.
Model | Context length | Status |
---|---|---|
hf:nomic-ai/nomic-embed-text-v1.5 | 8k tokens | ✓ Included |
On-Demand Models
Beyond our always-on models, you can run (almost) any model from Hugging Face on-demand.
Simply provide the Hugging Face model name in your API request, and we'll automatically boot up a GPU cluster and run it for you.
For GPU pricing details, see our pricing page.
Getting Started
Ready to start using our models? Check out:
- Getting Started Guide - Your first API call
- chat/completions - Most popular endpoint for conversations
Need help choosing the right model? Join our Discord community for recommendations!