Together Ai Plans Pricing
Together AI offers transparent, usage-based pricing across serverless inference, embeddings, rerank, image, video, and audio generation; managed fine-tuning; asynchronous batch with up to 50% discount; and hourly dedicated/reserved GPU compute. Per-model serverless rates vary widely - representative ranges are captured below; verify current rates on the pricing page.
Plans
On-demand per-token / per-asset pricing for chat, embeddings, rerank, image, video, and audio across the Together model catalog. Start free with promotional credits, scale on demand without commitments.
- Chat Completions
- Embeddings
- Rerank
- Images
- Video
- Audio
- Vision
Asynchronous batch inference at up to 50% discount over the equivalent serverless rate.
- Batch Chat
- Batch Embeddings
Supervised fine-tuning (LoRA and full) priced per 1M training tokens, scaled by base-model size.
- Supervised Fine-Tuning
- LoRA
- Full Fine-Tuning
- DPO
Reserved single-tenant GPU-backed inference endpoints billed hourly.
- H100
- H200
- B200
On-demand and reserved bare-metal GPU clusters for training and self-managed inference.
- H100 Cluster
- H200 Cluster
- B200 Cluster
Volume commitments, custom capacity, dedicated regions, and procurement-friendly contracts. Contact Together AI sales.
- Custom Volume Pricing
- SLA
- VPC