Creator | Model name | Price ($ per 1M tokens) | Context size (tokens) | Description | Vision enabled |
---|---|---|---|---|---|
Open AI | GPT-3.5 | in: $0.50 | |||
out: $1.50 | 16k | The model used in the free version of ChatGPT - fast and effective for most needs | |||
GPT-4 | in: $30.00 | ||||
out: $60.00 | 8k | A more powerful but slower model - useful when complex reasoning is required | |||
GPT-4 Turbo | in: $10.00 out: $30.00 | 128k | The model used in the paid version of ChatGPT - a powerful GPT-4 level reasoning model with speed similar to GPT-3.5 and vision capabilities | Yes | |
Note: take a look here for OpenAI’s image token pricing, approx. $0.011 per 1080p image | |||||
GPT-4o | in: $5.00 | ||||
out: $15.00 | 128k | Yes | |||
Note: take a look here for OpenAI’s image token pricing, approx. $0.0055 per 1080p image | |||||
GPT-4o mini | in: $0.15 | ||||
out: $0.60 | 128k | The most cost-efficient model available from OpenAI | Yes | ||
Note: take a look here for OpenAI’s image token pricing, approx. $0.0055 per 1080p image | |||||
o1 | in: $15 | ||||
out: $60 | 200k | Reasoning model designed to solve hard problems across domains. Now with vision capabilities and structured generations. | Yes | ||
Note: take a look here for OpenAI’s image token pricing, approx. $$0.015 per 1080p image | |||||
o1-mini | in: $1.10 | ||||
out: $4.40 | 128k | Faster and cheaper reasoning model particularly good at coding, math and science. | |||
o3-mini | in: $1.10 | ||||
out: $4.40 | 200k | OpenAI’s newest reasoning model | |||
Gemini 2.0 Flash - EXPERIMENTAL | in: $0 | ||||
out: $0 | 1M | A very fast multi-modal LLM with a huge 1 million token context window | 15 Requests per minute | ||
NOT FOR PRODUCTION USE | |||||
Gemini 1.5 Flash | in: $0.35/$0.70 | ||||
out: $0.53/$1.05 |
(128k/>128k tokens) | 1M | A very good value price-performance model with a huge 1 million token context length and high speeds | Yes ~$0.00265 / image | | | Gemini 1.5 Pro | in: $3.50/$7.00 out: $10.50/$21
(128k/>128k tokens) | 1M | Google’s latest multi-modal LLM with a huge 1 million token context window | Yes ~$0.00265 / image | | | Gemma 2B | $0.10 | 8k | An older generation fast and cheap open source LLM from Google | | | | Gemma 2 9B | $0.30 | 8k | A next generation fast and cheap open source LLM from Google | | | | Gemma 2 27B | $0.80 | 8k | A larger next generation open source LLM from Google | | | Mistral AI | Mistral 7B | $0.20 | 8k | Fast and small (7B) model | | | | Mixtral 8x7B | $0.60
($0.24 superfast) | 32k | A mixture-of-experts (8x7B) model that outperforms LLaMA-2 70B but with an inference time and cost equivalent to a 13B model
This model has a ‘superfast’ mode available that uses groq for model inference. It’s currently not recommended for production deployments but is worth experimenting with for low latency applications | | | | Mixtral 8x22B | $1.20
(in:$2.00 out: $6.00 structured) | 64k | A mixture-of-experts model fluent in five languages, excels in math and coding, outperforms neary all open models | | | | Mistral NeMo | $0.30 | 128k | Mistral’s best small model with 128k context length | | | | Mistral Medium | in: $2.70 out: $8.10 | 32k | Another Mistral model, fairly performant | | | | Mistral Large | in: $8.00 out: $24.00 | 32k | A good reasoning model with strong multilingual capabilities | | | | Mistral Small | in: $0.10 out: $0.30 | 32k | A latency-optimized model from Mistral AI with 24B parameters that is fast and cheap. | | | Anthropic | Claude 2 | in: $8.00 out: $24.00 | 200k | Claude 2.1 is a powerful model with 200k token context capable of complex reasoning | | | | Claude 3 Haiku | in: $0.25 out: $1.25 | 200k | Fast and cheap model with big context | Yes Note: take a look here for Anthropic’s image token pricing, approx. $0.008 per 1080p image | | | Claude 3.5 Haiku | in: $1.00 out: $5.00 | 200k | Upgraded fast and cheap model with big context | | | | Claude 3 Sonnet | in: $3.00 out: $15.00 | 200k | Ideal balance of intelligence and speed for enterprise workloads | Yes Note: take a look here for Anthropic’s image token pricing, approx. $0.008 per 1080p image | | | Claude 3.5 Sonnet | in: $3.00 out: $15.00 | 200k | Anthropic’s current best model, faster, cheaper and more capable than Claude 3 Opus | Yes Note: take a look here for Anthropic’s image token pricing, approx. $0.008 per 1080p image | | | Claude 3 Opus | in: $15.00 out: $75.00 | 200k | A powerful model model for highly complex tasks though likely a worse choice than Claude 3.5 Sonnet nowadays. | Yes Note: take a look here for Anthropic’s image token pricing, approx. $0.008 per 1080p image | | | | | | | | | Meta | Llama 3 8B | $0.20
($0.05/$0.08 superfast) | 8k | The best small open model available currently
This model has a ‘superfast’ mode available that uses groq for model inference. It’s currently not recommended for production deployments but is worth experimenting with for low latency applications | | | | Llama 3 70B | $0.90
($0.59/$0.79superfast) | 8k | A powerful open model from Meta
This model has a ‘superfast’ mode available that uses groq for model inference. It’s currently not recommended for production deployments but is worth experimenting with for low latency applications | | | | Llama 3.1 8B | $0.18 | 128k | A small, fast and effective model from Meta | | | | Llama 3.1 70B | $0.54 | 128k | A medium sized model from Meta | | | | Llama 3.1 405B | $3.50 | 128k | A heavyweight model from Meta that’s one of the most capable open models | | | | Llama 3.2 3B | $0.06 | 128k | A super small and fast model from meta, useful for low-latency inference | | | | Llama 3.2 11B | $0.18 | 128k | A medium sized model from Meta that can also perform visual understanding | Yes ~$0.00115/image | | | Llama 3.2 90B | $1.20 | 128k | Meta’s most advanced model with vision capabilities | Yes ~$0.0077/image | | | Llama 3.3 70B | $0.88
($0.59/$0.79superfast) | 128k | A highly performant, text-only, cost effective model from Meta
This model has a ‘superfast’ mode available that uses groq for model inference. It’s currently not recommended for production deployments but is worth experimenting with for low latency applications | | | Databricks | DBRX | $1.20 | 32k | An open-source mixture-of-experts model from Databricks that outperforms GPT-3.5 and all other open-source models | | | Cohere | Command R | in: $0.15 out: $0.60 | 128k | Cohere’s latest mid-weight model with the ability to enable ‘online’ mode for web search before answering | | | | Command R+ | in: $2.50 out: $10.00 | 128k | Cohere’s latest and best model optimised for RAG and tool use with the ability to enable ‘online’ mode for web search before answering | | | Perplexity | Sonar | in: $3.00 out: $15.00 (+ $0.005 per request) | 127k | Perplexity AI’s lightweight ‘online’ model with up to date information | | | | Sonar Pro | $1.00 (+ $0.005 per request) | 200k | Perplexity AI’s powerful ‘online’ model with up to date information | | | | Sonar Small (deprecated - maps to Sonar) | in: $3.00 out: $15.00 (+ $0.005 per request) | 127k | Perplexity AI’s lightweight ‘online’ model with up to date information | | | | Sonar Large (deprecated - maps to Sonar) | in: $3.00 out: $15.00 (+ $0.005 per request) | 127k | Perplexity AI’s lightweight ‘online’ model with up to date information | | | | Sonar Huge (deprecated - maps to Sonar Pro) | $1.00 (+ $0.005 per request) | 200k | Perplexity AI’s powerful ‘online’ model with up to date information | | | OpenChat | OpenChat 3.5 | $0.20 | 8k | A fast and small model (7B) model that outperforms GPT-3.5 | | | Amazon | Nova Micro | in: $0.035 out: $0.14 | 128k | A small and fast text-only model by Amazon | | | | Nova Lite | in: $0.06 out: $0.24 | 300k | A low-cost multimodal model by Amazon | | | | Nova Pro | in: $0.8 out: $3.2 | 300k | Amazons largest multimodal model of the Nova family yet | | | DeepSeek | DeepSeek V3 | in: $1.25 out: $1.25 | 64k | DeepSeek’s chat model | | | | DeepSeek R1 | in: $7.00 out: $7.00 | 64k | DeepSeek’s reasoning model | |
<aside> 💡 To request a specific model or enquire about using a fine-tuned model reach out to [[email protected]](mailto:[email protected]?subject=Model%20request)
</aside>