Parasail · Pricing Plans

Parasail Plans Pricing

Name: Parasail Plans Pricing
Creator: Parasail
Keywords: AI, Artificial Intelligence, GPU, Inference, Large Language Models, Open Source Models, Hugging Face, Batch, Embeddings, Tokenmaxxing, Supercloud

Parasail offers four commercial surfaces — Serverless, Dedicated Serverless, Dedicated, and Batch — billed on a pay-per-token or GPU-hour basis with no long-term contracts. Tiers (Free, User, Dedicated Serverless, Dedicated Serverless Pro, Enterprise) gate request-per-minute capacity. Free credits are provided for new accounts.

Parasail Plans Pricing is the machine-readable pricing-plan profile for Parasail on the APIs.io network, conforming to the API Commons Plans specification.

It defines 7 plans, with named plans including Free, User, Dedicated Serverless, Dedicated Serverless Pro, Dedicated, and 2 more.

Tagged areas include AI, Artificial Intelligence, GPU, Inference, and Large Language Models.

7 Plans

View Source

AIArtificial IntelligenceGPUInferenceLarge Language ModelsOpen Source ModelsHugging FaceBatchEmbeddingsTokenmaxxingSupercloud

Plans

Free

Free tier with starter credits for evaluating Parasail's serverless inference.

User (user · month) 0 USD

Pay-per-token serverless inference (after free credits exhausted)
Access to all serverless models exposed on /v1/models
OpenAI-compatible /v1/chat/completions, /v1/completions, /v1/embeddings
5 RPM rate limit
Free credits for new users

User

Standard pay-per-token serverless tier for individual developers and small teams.

User (user · month) PayPerToken USD

500 RPM
All serverless models
Batch API access (50% off serverless, +30% off cached tokens)
No quotas on monthly token volume

Dedicated Serverless

Guaranteed throughput against a chosen model on isolated capacity, still billed per token but with reserved GPUs behind the endpoint.

Deployment (deployment · month) Reserved USD

1,000 RPM
Isolated capacity for a chosen model
Pay-per-token billing on reserved pool
Control-plane API for pause/resume/scale

Dedicated Serverless Pro

Higher-throughput dedicated serverless tier for production workloads.

Deployment (deployment · month) Reserved USD

4,000 RPM
Production-grade SLOs
All Dedicated Serverless features

Dedicated

Fully reserved GPU deployments billed on GPU-hours. Bring any Hugging Face or custom model and choose the device SKU and replica count.

GPU (gpu-hour · hour) GPUHour USD

Bring-your-own model (any Hugging Face / custom)
Choose GPU SKU (H100, A100, H200, etc.)
Autoscaling between min/max replicas
Pause and resume to control cost

Batch

Asynchronous batch inference for offline workloads at 50% off serverless rates.

Tokens (Input + Output) (token · usage) 50PctOffServerless USD

24-hour completion window
Supports /v1/chat/completions and /v1/embeddings
OpenAI-compatible JSONL batch file format
80-90% cheaper than real-time for large offline jobs (combined with caching)

Enterprise

Custom contracts for large-scale tokenmaxxing customers.

Contract (contract · year) Call USD

Unlimited RPM
Custom model onboarding and dedicated capacity
Premium support and SLOs
Volume discounts