Parasail Plans Pricing
Parasail offers four commercial surfaces — Serverless, Dedicated Serverless, Dedicated, and Batch — billed on a pay-per-token or GPU-hour basis with no long-term contracts. Tiers (Free, User, Dedicated Serverless, Dedicated Serverless Pro, Enterprise) gate request-per-minute capacity. Free credits are provided for new accounts.
Parasail Plans Pricing is the machine-readable pricing-plan profile for Parasail on the APIs.io network, conforming to the API Commons Plans specification.
It defines 7 plans, with named plans including Free, User, Dedicated Serverless, Dedicated Serverless Pro, Dedicated, and 2 more.
Tagged areas include AI, Artificial Intelligence, GPU, Inference, and Large Language Models.
Plans
Free tier with starter credits for evaluating Parasail's serverless inference.
- Pay-per-token serverless inference (after free credits exhausted)
- Access to all serverless models exposed on /v1/models
- OpenAI-compatible /v1/chat/completions, /v1/completions, /v1/embeddings
- 5 RPM rate limit
- Free credits for new users
Standard pay-per-token serverless tier for individual developers and small teams.
- 500 RPM
- All serverless models
- Batch API access (50% off serverless, +30% off cached tokens)
- No quotas on monthly token volume
Guaranteed throughput against a chosen model on isolated capacity, still billed per token but with reserved GPUs behind the endpoint.
- 1,000 RPM
- Isolated capacity for a chosen model
- Pay-per-token billing on reserved pool
- Control-plane API for pause/resume/scale
Higher-throughput dedicated serverless tier for production workloads.
- 4,000 RPM
- Production-grade SLOs
- All Dedicated Serverless features
Fully reserved GPU deployments billed on GPU-hours. Bring any Hugging Face or custom model and choose the device SKU and replica count.
- Bring-your-own model (any Hugging Face / custom)
- Choose GPU SKU (H100, A100, H200, etc.)
- Autoscaling between min/max replicas
- Pause and resume to control cost
Asynchronous batch inference for offline workloads at 50% off serverless rates.
- 24-hour completion window
- Supports /v1/chat/completions and /v1/embeddings
- OpenAI-compatible JSONL batch file format
- 80-90% cheaper than real-time for large offline jobs (combined with caching)
Custom contracts for large-scale tokenmaxxing customers.
- Unlimited RPM
- Custom model onboarding and dedicated capacity
- Premium support and SLOs
- Volume discounts