Fireworks AI offers serverless pay-per-token inference, on-demand dedicated GPU deployments billed per GPU-second, batch inference at 50% of serverless, cached input tokens at 50% of standard, and managed fine-tuning. New users get $1 in free credits and postpaid billing as usage grows.
On-demand per-token inference with high rate limits, postpaid billing, and zero cold starts.
Chat / Vision Tokens (tokens · month)per 1M tokens, varies by model (see pricing page) USD
Cached Input Tokens (tokens · month)50% of the standard input rate USD
Embeddings (up to 150M params) (tokens · month)$0.008 per 1M USD
Embeddings (150M-350M params) (tokens · month)$0.016 per 1M USD
Embeddings (Qwen3 8B) (tokens · month)$0.10 per 1M USD
Chat Completions
Vision
Embeddings
Rerank
Images
Audio
Batch Inferenceusage
Asynchronous batch jobs priced at 50% of serverless input and output rates.
Batch Tokens (tokens · usage)50% of serverless rates (input and output) USD
Batch Chat Completions
Batch Embeddings
Fine-Tuningusage
Supervised fine-tuning (LoRA and full) priced per 1M training tokens by model size and method, plus reinforcement fine-tuning billed per GPU-hour at on-demand deployment rates.
Supervised Fine-Tuning Tokens (tokens · usage)$0.50-$40.00 per 1M training tokens (varies by model size and method) USD
Reinforcement Fine-Tuning (hours · usage)per GPU-hour at on-demand rates ($7-$12 per hour) USD
Fine-Tuned Model Serving (tokens · month)same per-token price as base model USD
SFT (LoRA)
SFT (Full)
Reinforcement Fine-Tuning
On-Demand Deployments (Dedicated GPUs)usage
Pay-per-GPU-second dedicated GPU deployments with autoscaling and no cold-start charge.