Groq Plans Pricing
GroqCloud uses a transparent pay-as-you-go pricing model with linear, predictable per-token rates per model. Audio is billed per 1M characters (TTS) or per hour transcribed (STT). Tools are priced per call or per compute hour. The Batch API offers a 50% discount; Prompt Caching offers a 50% discount on cached input tokens. There are no formal Free / Developer / Enterprise tiers; users start with free access and pay for usage.
Plans
Token-, character-, and call-metered usage across all GroqCloud APIs with no monthly minimum.
- Chat Completions
- Vision
- Reasoning
- Speech-to-Text
- Text-to-Speech
- Content Moderation
- Tools
- LoRA Inference
Asynchronous batch inference at 50% off synchronous rates.
- Batch Chat Completions
Flexible service tier offering higher throughput at relaxed latency targets at reduced cost vs. standard.
- Chat Completions (Flex tier)
Volume commitments, on-prem / private deployments (GroqRack / GroqCloud Enterprise), dedicated support and negotiated terms. Contact Groq sales.
- Custom Volume Pricing
- GroqRack
- GroqCloud Enterprise