Groq · Pricing Plans

Groq Plans Pricing

GroqCloud uses a transparent pay-as-you-go pricing model with linear, predictable per-token rates per model. Audio is billed per 1M characters (TTS) or per hour transcribed (STT). Tools are priced per call or per compute hour. The Batch API offers a 50% discount; Prompt Caching offers a 50% discount on cached input tokens. There are no formal Free / Developer / Enterprise tiers; users start with free access and pay for usage.

4 Plans API Commons Plans
View Source
AILLMInferenceLPULow LatencyPlans

Plans

Pay-as-you-go usage

Token-, character-, and call-metered usage across all GroqCloud APIs with no monthly minimum.

Llama 3.1 8B Instant (tokens · month) $0.05 input / $0.08 output per 1M USD
GPT OSS 20B (tokens · month) $0.075 input / $0.30 output per 1M USD
Llama 3.3 70B Versatile (tokens · month) $0.59 input / $0.79 output per 1M USD
GPT OSS 120B (tokens · month) $0.15 input / $0.60 output per 1M USD
TTS (Orpheus family) (characters · month) $22-$40 per 1M characters USD
STT (Whisper family) (hours · month) $0.04-$0.111 per hour transcribed USD
Web Search Tool (requests · month) $5-$8 per 1,000 requests USD
Code Execution Tool (hours · month) $0.18 per hour USD
Prompt Caching (tokens · month) 50% off the standard input rate on cached tokens USD
Batch API usage

Asynchronous batch inference at 50% off synchronous rates.

Batch Tokens (tokens · usage) 50% off synchronous rates USD
Flex Processing usage

Flexible service tier offering higher throughput at relaxed latency targets at reduced cost vs. standard.

Flex Tokens (tokens · usage) see Groq pricing page (discounted vs. standard) USD
Enterprise enterprise

Volume commitments, on-prem / private deployments (GroqRack / GroqCloud Enterprise), dedicated support and negotiated terms. Contact Groq sales.

Enterprise Agreement (contract · year) contact sales USD

Sources