Prime Intellect · Pricing Plans

Prime Intellect Plans Pricing

Reconciled plans and pricing for Prime Intellect — GPU compute marketplace (on-demand hourly), hosted RL training (per-million-token input/output/training), inference (per-million-token input/output), sandboxes, and persistent disks. Source: primeintellect.ai and docs.primeintellect.ai.

Prime Intellect Plans Pricing is the machine-readable pricing-plan profile for Prime Intellect on the APIs.io network, conforming to the API Commons Plans specification.

It defines 6 plans, covering usage-based and quote-based tiers, with named plans including Compute Marketplace — On-Demand GPUs, Compute Marketplace — Liquid Reserved Clusters, Hosted Training — Qwen3.5 0.8B, Hosted Training — Qwen3.5 397B-A17B, Hosted Training — Additional Models, and 1 more.

Tagged areas include AI, GPU Compute, Reinforcement Learning, Inference, and Sandboxes.

6 Plans API Commons Plans
View Source
AIGPU ComputeReinforcement LearningInferenceSandboxes

Plans

Compute Marketplace — On-Demand GPUs usage-based

Hourly on-demand GPU instances across 50+ providers. Single-node and multi-node (1-256 GPUs) with Infiniband, SLURM, and Kubernetes orchestration.

H100 GPU-hour (representative) (gpu_hour · usage) 2.43 USD
H200 GPU-hour (representative low) (gpu_hour · usage) 0.47 USD
H200 GPU-hour (representative high) (gpu_hour · usage) 1.99 USD
B300 GPU-hour (representative) (gpu_hour · usage) 4.99 USD
Compute Marketplace — Liquid Reserved Clusters quote-based

Custom-quoted reserved GPU clusters built by gathering competitive quotes from 50+ providers. Pricing is per-engagement; talk to sales.

Hosted Training — Qwen3.5 0.8B usage-based

Hosted RL post-training for the smallest Qwen3.5 model. Billed per million tokens across input, output, and training.

Input tokens (tokens · usage) 0.02 USD
Output tokens (tokens · usage) 0.06 USD
Training tokens (tokens · usage) 0.06 USD
Hosted Training — Qwen3.5 397B-A17B usage-based

Hosted RL post-training for the largest available Qwen3.5 MoE. Billed per million tokens across input, output, and training.

Input tokens (tokens · usage) 1.00 USD
Output tokens (tokens · usage) 3.00 USD
Training tokens (tokens · usage) 4.00 USD
Hosted Training — Additional Models usage-based

Other supported base models include Qwen3.5 sizes 2B, 4B, 9B, 35B-A3B, 122B-A10B; Llama 1B/3B Instruct; NVIDIA Nemotron 30B/120B; and OpenAI gpt-oss 20B/120B. All billed per million tokens (input, output, training).

Inference API usage-based

OpenAI-compatible inference at api.pinference.ai. Per-million-token pricing with usage and cost returned in every response.

Per-request usage metadata (cost field) (tokens · usage) see-models-endpoint USD

Sources