free-llm-api-resources

Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

[!NOTE]
Please don’t abuse these services, else we might lose them.

[!WARNING]
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

Free Providers

Provider Provider Limits/Notes Model Name Model Limits
GroqDistil Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Gemma 2 9B Instruct14,400 requests/day
15,000 tokens/minute
Gemma 7B Instruct14,400 requests/day
15,000 tokens/minute
LLaVA 1.5 7B14,400 requests/day
30,000 tokens/minute
Llama 3 70B14,400 requests/day
6,000 tokens/minute
Llama 3 70B - Groq Tool Use Preview14,400 requests/day
15,000 tokens/minute
Llama 3 8B14,400 requests/day
30,000 tokens/minute
Llama 3 8B - Groq Tool Use Preview14,400 requests/day
15,000 tokens/minute
Llama 3.1 70B14,400 requests/day
6,000 tokens/minute
Llama 3.1 8B14,400 requests/day
20,000 tokens/minute
Llama 3.2 11B (Text Only)7,000 requests/day
7,000 tokens/minute
Llama 3.2 11B Vision7,000 requests/day
7,000 tokens/minute
Llama 3.2 1B7,000 requests/day
7,000 tokens/minute
Llama 3.2 3B7,000 requests/day
7,000 tokens/minute
Llama 3.2 90B (Text Only)7,000 requests/day
7,000 tokens/minute
Llama 3.2 90B Vision3,500 requests/day
7,000 tokens/minute
Llama Guard 3 8B14,400 requests/day
15,000 tokens/minute
Mixtral 8x7B14,400 requests/day
5,000 tokens/minute
Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Whisper Large v3 Turbo7,200 audio-seconds/minute
2,000 requests/day
OpenRouter20 requests/minute
200 requests/day
Gemma 2 9B Instruct
Hermes 3 Llama 3.1 405B
Liquid LFM 40B
Llama 3 8B Instruct
Llama 3.1 405B Instruct
Llama 3.1 70B Instruct
Llama 3.1 8B Instruct
Llama 3.2 11B Vision Instruct
Llama 3.2 1B Instruct
Llama 3.2 3B Instruct
Llama 3.2 90B Vision Instruct
Mistral 7B Instruct
Mythomax L2 13B
OpenChat 7B
Phi-3 Medium 128k Instruct
Phi-3 Mini 128k Instruct
Qwen 2 7B Instruct
Toppy M 7B
Zephyr 7B Beta
Google AI Studio Data is used for training (when used outside of the UK/CH/EEA/EU). Gemini 1.5 Flash 1,000,000 tokens/minute
1,500 requests/day
15 requests/minute
Gemini 1.5 Flash (Experimental) 1,000,000 tokens/minute
1,500 requests/day
5 requests/minute
Gemini 1.5 Flash-8B 1,000,000 tokens/minute
1,500 requests/day
15 requests/minute
Gemini 1.5 Flash-8B (Experimental) 1,000,000 tokens/minute
1,500 requests/day
15 requests/minute
Gemini 1.5 Pro 32,000 tokens/minute
50 requests/day
2 requests/minute
Gemini 1.5 Pro (Experimental) 1,000,000 tokens/minute
50 requests/day
2 requests/minute
Gemini 1.0 Pro 32,000 tokens/minute
1,500 requests/day
15 requests/minute
text-embedding-004 150 batch requests/minute
1,500 requests/minute
100 content/batch
embedding-001
Lambda Labs (Free Preview)Requires credit card verification.Hermes 3 405B
Hermes 3 70B
Hermes 3 8B
Liquid LFM 40B
Llama 3.1 405B Instruct (FP8)
Llama 3.1 70B Instruct (FP8)
Llama 3.1 8B Instruct
Llama 3.1 Nemotron 70B Instruct
Llama 3.2 3B Instruct
Mistral (La Plateforme) Free tier (Experiment plan) requires opting into data training, requires phone number verification. Open and Proprietary Mistral models 1 request/second
500,000 tokens/minute
1,000,000,000 tokens/month
Mistral (Codestral) Currently free to use, monthly subscription based, requires phone number verification. Codestral 30 requests/minute
2,000 requests/day
HuggingFace Serverless Inference Limited to models smaller than 10GB.
Some popular models are supported even if they exceed 10GB.
Various open models 1,000 requests/day (with an account)
SambaNova Cloud Llama 3.1 405B 10 requests/minute
Llama 3.2 90B 1 request/minute
Llama 3.1 70B 20 requests/minute
Llama 3.2 11B 10 requests/minute
Llama 3.1 8B 30 requests/minute
Llama 3.2 3B 30 requests/minute
Llama 3.2 1B 30 requests/minute
Cerebras Waitlist
Free tier restricted to 8K context
Llama 3.1 8B 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.1 70B 30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
GitHub ModelsWaitlist
Rate limits dependent on Copilot subscription tier
AI21-Jamba-Instruct
Cohere Command R
Cohere Command R+
Cohere Embed v3 English
Cohere Embed v3 Multilingual
Meta-Llama-3-70B-Instruct
Meta-Llama-3-8B-Instruct
Meta-Llama-3.1-405B-Instruct
Meta-Llama-3.1-70B-Instruct
Meta-Llama-3.1-8B-Instruct
Mistral Large
Mistral Large (2407)
Mistral Nemo
Mistral Small
OpenAI GPT-4o
OpenAI GPT-4o mini
OpenAI Text Embedding 3 (large)
OpenAI Text Embedding 3 (small)
Phi-3-medium instruct (128k)
Phi-3-medium instruct (4k)
Phi-3-mini instruct (128k)
Phi-3-mini instruct (4k)
Phi-3-small instruct (128k)
Phi-3-small instruct (8k)
Phi-3.5-mini instruct (128k)
OVH AI Endpoints (Free Alpha)Token expires every week.CodeLlama 13B Instruct12 requests/minute
Codestral Mamba 7B v0.112 requests/minute
Llama 2 13B Chat12 requests/minute
Llama 3 70B Instruct12 requests/minute
Llama 3 8B Instruct12 requests/minute
Llama 3.1 70B Instruct12 requests/minute
Mathstral 7B v0.112 requests/minute
Mistral 7B Instruct12 requests/minute
Mixtral 8x22B Instruct12 requests/minute
Mixtral 8x7B Instruct12 requests/minute
Cloudflare Workers AI10,000 tokens/dayDeepseek Coder 6.7B Base (AWQ)
Deepseek Coder 6.7B Instruct (AWQ)
Deepseek Math 7B Instruct
Discolm German 7B v1 (AWQ)
Falcom 7B Instruct
Gemma 2B Instruct (LoRA)
Gemma 7B Instruct
Gemma 7B Instruct (LoRA)
Hermes 2 Pro Mistral 7B
Llama 2 13B Chat (AWQ)
Llama 2 7B Chat (FP16)
Llama 2 7B Chat (INT8)
Llama 2 7B Chat (LoRA)
Llama 3 8B Instruct
Llama 3 8B Instruct
Llama 3 8B Instruct (AWQ)
Llama 3.1 8B Instruct
Llama 3.1 8B Instruct (AWQ)
Llama 3.1 8B Instruct (FP8)
Llama 3.2 11B Vision Instruct
Llama 3.2 1B Instruct
Llama 3.2 3B Instruct
LlamaGuard 7B (AWQ)
Mistral 7B Instruct v0.1
Mistral 7B Instruct v0.1 (AWQ)
Mistral 7B Instruct v0.2
Mistral 7B Instruct v0.2 (LoRA)
Neural Chat 7B v3.1 (AWQ)
OpenChat 3.5 0106
OpenHermes 2.5 Mistral 7B (AWQ)
Phi-2
Qwen 1.5 0.5B Chat
Qwen 1.5 1.8B Chat
Qwen 1.5 14B Chat (AWQ)
Qwen 1.5 7B Chat (AWQ)
SQLCoder 7B 2
Starling LM 7B Beta
TinyLlama 1.1B Chat v1.0
Una Cybertron 7B v2 (BF16)
Zephyr 7B Beta (AWQ)
Together Llama 3.2 11B Vision Instruct Free for 2024
Cohere 20 requests/min
1,000 requests/month
Command-R Shared Limit
Command-R+
Google Cloud Vertex AI Very stringent payment verification for Google Cloud. Llama 3.1 70B Instruct Llama 3.1 API Service free during preview.
60 requests/minute
Llama 3.1 8B Instruct Llama 3.1 API Service free during preview.
60 requests/minute
Llama 3.2 90B Vision Instruct Llama 3.2 API Service free during preview.
30 requests/minute
Gemini Flash Experimental Experimental Gemini model.
10 requests/minute
Gemini Pro Experimental
glhf.chat (Free Beta) Email for API access Any model on Hugging Face runnable on vLLM and fits on a A100 node (~640GB VRAM), including Llama 3.1 405B at FP8

Providers with trial credits

Provider Credits Requirements Models
Together $5 Various open models
Fireworks $1 Various open models
Unify $10 (+$40 for getting into contact) Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc)
DeepInfra $1.80 Various open models
NVIDIA NIM 1,000 API calls for 1 month Various open models
AI21 $10 for 3 months Jamba/Jurrasic-2
NLP Cloud $15 Phone number verification Various open models
Upstage $10 for 3 months Solar Pro/Mini
Baseten $30 Any supported model - pay by compute time
xAI $25/month until end of 2024 Grok
Hyperbolic$10DeepSeek V2.5
Hermes 3 Llama 3.1 70B
Llama 3 70B Instruct
Llama 3.1 405B Base
Llama 3.1 405B Base (FP8)
Llama 3.1 405B Instruct
Llama 3.1 70B Instruct
Llama 3.1 8B Instruct
Llama 3.2 3B Instruct
Pixtral 12B (2409)
Qwen2-VL 72B Instruct
Qwen2-VL 7B Instruct
Qwen2.5 72B Instruct
Qwen2.5 Coder 32B Instruct