Together AI · Pricing Plans

Together Ai Plans Pricing

Together AI offers transparent, usage-based pricing across serverless inference, embeddings, rerank, image, video, and audio generation; managed fine-tuning; asynchronous batch with up to 50% discount; and hourly dedicated/reserved GPU compute. Per-model serverless rates vary widely - representative ranges are captured below; verify current rates on the pricing page.

6 Plans API Commons Plans

View Source

AILLMInferenceOpen SourceFine-tuningGPUPlans

Plans

Serverless Inference (Pay-as-you-go) usage

On-demand per-token / per-asset pricing for chat, embeddings, rerank, image, video, and audio across the Together model catalog. Start free with promotional credits, scale on demand without commitments.

Chat (small) (tokens · month) from $0.10/$0.15 per 1M (e.g., Qwen3.5 9B input/output) USD

Chat (mid) (tokens · month) ~$0.88/$0.88 per 1M (e.g., Llama 3.3 70B) USD

Chat (premium) (tokens · month) $1.40-$7.00 per 1M (e.g., GLM-5.1, DeepSeek-R1) USD

Embeddings (tokens · month) from $0.02 per 1M (multilingual-e5-large-instruct) USD

Image Generation (images · month) $0.0006-$0.134 per image USD

Video Generation (videos · month) $0.14-$3.20 per video USD

Audio (TTS / STT) (characters · month) $0.0015-$65.00 per 1M characters USD

Chat Completions
Embeddings
Rerank
Images
Video
Audio
Vision

Batch API usage

Asynchronous batch inference at up to 50% discount over the equivalent serverless rate.

Batch Tokens (tokens · usage) up to 50% off serverless rates USD

Batch Chat
Batch Embeddings

Fine-Tuning usage

Supervised fine-tuning (LoRA and full) priced per 1M training tokens, scaled by base-model size.

Up to 16B (LoRA / Full) (tokens · usage) $0.48-$1.35 per 1M USD

17B-69B (tokens · usage) $1.50-$4.12 per 1M USD

70B-100B (tokens · usage) $2.90-$8.00 per 1M USD

Specialized (DeepSeek-R1, GLM-5, Kimi) (tokens · usage) $9-$100 per 1M USD

Supervised Fine-Tuning
LoRA
Full Fine-Tuning
DPO

Dedicated Inference Endpoints usage

Reserved single-tenant GPU-backed inference endpoints billed hourly.

1x H100 - 1x B200 (hourly) (hours · usage) $3.99-$9.95 per hour USD

H100
H200
B200

GPU Clusters usage

On-demand and reserved bare-metal GPU clusters for training and self-managed inference.

On-demand (hourly) (hours · usage) $3.49-$7.49 per hour USD

Reserved (7-30+ days) (hours · usage) $2.99-$7.15 per hour (deeper discounts at 181+ days) USD

H100 Cluster
H200 Cluster
B200 Cluster

Enterprise enterprise

Volume commitments, custom capacity, dedicated regions, and procurement-friendly contracts. Contact Together AI sales.

Enterprise Agreement (contract · year) contact sales USD

Custom Volume Pricing
SLA
VPC

Together Ai Plans Pricing

Plans

Sources