Ollama Plans Pricing
Ollama is free and open-source for local inference (no auth, no charge, no rate limit on http://localhost:11434). Ollama Cloud is a hosted-inference add-on with three tiers (Free, Pro, Max), measured by GPU utilization rather than tokens. Cloud usage resets on a 5-hour session and 7-day weekly cycle. All tiers permit unlimited use of open models on the user's own hardware; tier differences apply only to ollama.com cloud-model concurrency and weekly cloud usage.
Plans
Run Ollama on your own hardware. Open source. No account or API key required for the local server at http://localhost:11434.
- Unlimited local inference
- Open-source models
- No authentication required for localhost
- Ollama API
- Ollama OpenAI Compatibility API
- Ollama Anthropic Compatibility API
Free hosted inference at ollama.com. Requires an ollama.com account and API key. Cloud usage is metered by GPU time, not tokens.
- Run cloud models from CLI/API
- Unlimited public models
- Run models on your own hardware
- Ollama Cloud API
$20/month or $200/year. Run 3 cloud models at a time; 50x more cloud usage than Free.
- 3 concurrent cloud models
- 50x cloud usage of Free
- Ollama Cloud API
$100/month. Run 10 cloud models at a time; 5x more usage than Pro.
- 10 concurrent cloud models
- 5x cloud usage of Pro
- Ollama Cloud API