Language models

Creator Model name Price ($ per 1M tokens) Context size (tokens) Description Vision enabled
Open AI GPT-3.5 in: $0.50
out: $1.50 16k The model used in the free version of ChatGPT - fast and effective for most needs
GPT-4 in: $30.00
out: $60.00 8k A more powerful but slower model - useful when complex reasoning is required
GPT-4 Turbo in: $10.00 out: $30.00 128k The model used in the paid version of ChatGPT - a powerful GPT-4 level reasoning model with speed similar to GPT-3.5 and vision capabilities Yes
Note: take a look here for OpenAI’s image token pricing, approx. $0.011 per 1080p image
GPT-4o in: $5.00
out: $15.00 128k Yes
Note: take a look here for OpenAI’s image token pricing, approx. $0.0055 per 1080p image
GPT-4o mini in: $0.15
out: $0.60 128k The most cost-efficient model available from OpenAI Yes
Note: take a look here for OpenAI’s image token pricing, approx. $0.0055 per 1080p image
o1 in: $15
out: $60 128k Reasoning model designed to solve hard problems across domains.
o1-mini in: $3
out: $12 128k Faster and cheaper reasoning model particularly good at coding, math and science.
Google Gemini Pro in: $0.50
out: $1.50 32k A multi-modal LLM from Google Yes
$0.0025/image
Gemini 1.5 Flash in: $0.35/$0.70
out: $0.53/$1.05

(128k/>128k tokens) | 1M | A very good value price-performance model with a huge 1 million token context length and high speeds | Yes ~$0.00265 / image | | | Gemini 1.5 Pro | in: $3.50/$7.00 out: $10.50/$21

(128k/>128k tokens) | 1M | Google’s latest multi-modal LLM with a huge 1 million token token context window | Yes ~$0.00265 / image | | | Gemma 2B | $0.10 | 8k | An older generation fast and cheap open source LLM from Google | | | | Gemma 2 9B | $0.20 | 8k | A next generation fast and cheap open source LLM from Google | | | | Gemma 2 27B | $0.80 | 8k | A larger next generation open source LLM from Google | | | Mistral AI | Mistral 7B | $0.20 | 8k | Fast and small (7B) model | | | | Mixtral 8x7B | $0.60

($0.24 superfast) | 32k | A mixture-of-experts (8x7B) model that outperforms LLaMA-2 70B but with an inference time and cost equivalent to a 13B model

This model has an ‘superfast’ mode available that uses groq for model inference. It’s currently not recommended for production deployments but is worth experimenting with for low latency applications | | | | Mixtral 8x22B | $1.20

(in:$2.00 out: $6.00 structured) | 64k | A mixture-of-experts model fluent in five languages, excels in math and coding, outperforms neary all open models | | | | Mistral NeMo | $0.30 | 128k | Mistral’s best small model with 128k context length | | | | Mistral Medium | in: $2.70 out: $8.10 | 32k | Another Mistral model, fairly performant | | | | Mistral Large | in: $8.00 out: $24.00 | 32k | A good reasoning model with strong multilingual capabilities | | | Anthropic | Claude Instant | in: $0.80 out: $2.40 | 100k | A low-latency model with 100k token context | | | | Claude 2 | in: $8.00 out: $24.00 | 200k | Claude 2.1 is a powerful model with 200k token context capable of complex reasoning | | | | Claude 3 Haiku | in: $0.25 out: $1.25 | 200k | Fast and cheap model with big context | Yes Note: take a look here for Anthropic’s image token pricing, approx. $0.008 per 1080p image | | | Claude 3 Sonnet | in: $3.00 out: $15.00 | 200k | Ideal balance of intelligence and speed for enterprise workloads | Yes Note: take a look here for Anthropic’s image token pricing, approx. $0.008 per 1080p image | | | Claude 3.5 Sonnet | in: $3.00 out: $15.00 | 200k | Anthropic’s current best model, faster, cheaper and more capable than Claude 3 Opus | Yes Note: take a look here for Anthropic’s image token pricing, approx. $0.008 per 1080p image | | | Claude 3 Opus | in: $15.00 out: $75.00 | 200k | A powerful model model for highly complex tasks though likely a worse choice than Claude 3.5 Sonnet nowadays. | Yes Note: take a look here for Anthropic’s image token pricing, approx. $0.008 per 1080p image | | Meta | LLAMA 3 70B | $0.90

($0.59/$0.79superfast) | 8k | A powerful open model from Meta

This model has an ‘superfast’ mode available that uses groq for model inference. It’s currently not recommended for production deployments but is worth experimenting with for low latency applications | | | | LLAMA 3 8B | $0.20

($0.05/$0.08 superfast) | 8k | The best small open model available currently

This model has an ‘superfast’ mode available that uses groq for model inference. It’s currently not recommended for production deployments but is worth experimenting with for low latency applications | | | Databricks | DBRX | $1.20 | 32k | An open-source mixture-of-experts model from Databricks that outperforms GPT-3.5 and all other open-source models | | | Cohere | Command R | in: $0.15 out: $0.60 | 128k | Cohere’s latest mid-weight model with the ability to enable ‘online’ mode for web search before answering | | | | Command R+ | in: $2.50 out: $10.00 | 128k | Cohere’s latest and best model optimised for RAG and tool use with the ability to enable ‘online’ mode for web search before answering | | | Perplexity | Sonar Small | $0.20 (+ $0.005 per request) | 128k | Perplexity AI’s small ‘online’ model with up to date information based on Llama 3.1 8B | | | | Sonar Large | $1.00 (+ $0.005 per request) | 128k | Perplexity AI’s large ‘online’ model with up to date information based on Llama 3.1 70B | | | | Sonar Huge | $5.00 (+ $0.005 per request) | 128k | Perplexity AI’s huge mixture of experts ‘online’ model with up to date information based on Llama 3.1 405B | | | OpenChat | OpenChat 3.5 | $0.20 | 8k | A fast and small model (7B) model that outperforms GPT-3.5 | |

<aside> 💡 To request a specific model or enquire about using a fine-tuned model reach out to [[email protected]](mailto:[email protected]?subject=Model%20request)

</aside>