This lists various services that provide free access or credits towards API-based LLM usage.
[!NOTE]
Please don’t abuse these services, else we might lose them.
[!WARNING]
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
Provider | Provider Limits/Notes | Model Name | Model Limits |
---|---|---|---|
OpenRouter | 20 requests/minute 200 requests/day | Gemma 2 9B Instruct | |
Llama 3 8B Instruct | |||
Llama 3.1 405B Instruct | |||
Llama 3.1 70B Instruct | |||
Llama 3.1 8B Instruct | |||
Llama 3.2 11B Vision Instruct | |||
Llama 3.2 1B Instruct | |||
Llama 3.2 3B Instruct | |||
Llama 3.2 90B Vision Instruct | |||
Mistral 7B Instruct | |||
Mythomax L2 13B | |||
OpenChat 7B | |||
Phi-3 Medium 128k Instruct | |||
Phi-3 Mini 128k Instruct | |||
Qwen 2 7B Instruct | |||
Toppy M 7B | |||
Zephyr 7B Beta | |||
Google AI Studio | Data is used for training (when used outside of the UK/CH/EEA/EU). | Gemini 2.0 Flash Experimental | 4,000,000 tokens/minute 10 requests/minute |
Gemini 1.5 Flash | 1,000,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
Gemini 1.5 Flash (Experimental) | 1,000,000 tokens/minute 1,500 requests/day 5 requests/minute |
||
Gemini 1.5 Flash-8B | 1,000,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
Gemini 1.5 Flash-8B (Experimental) | 1,000,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
Gemini 1.5 Pro | 32,000 tokens/minute 50 requests/day 2 requests/minute |
||
Gemini 1.5 Pro (Experimental) | 1,000,000 tokens/minute 100 requests/day 5 requests/minute |
||
LearnLM 1.5 Pro (Experimental) | 1,500 requests/day 15 requests/minute |
||
Gemini 1.0 Pro | 32,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
text-embedding-004 | 150 batch requests/minute 1,500 requests/minute 100 content/batch |
||
embedding-001 | |||
Mistral (La Plateforme) | Free tier (Experiment plan) requires opting into data training, requires phone number verification. | Open and Proprietary Mistral models | 1 request/second 500,000 tokens/minute 1,000,000,000 tokens/month |
Mistral (Codestral) | Currently free to use, monthly subscription based, requires phone number verification. | Codestral | 30 requests/minute 2,000 requests/day |
HuggingFace Serverless Inference | Limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB. |
Various open models | 1,000 requests/day (with an account) |
SambaNova Cloud | Llama 3.1 8B | 30 requests/minute | |
Llama 3.1 70B | 20 requests/minute | ||
Llama 3.1 405B | 10 requests/minute | ||
Llama 3.2 1B | 30 requests/minute | ||
Llama 3.2 3B | 30 requests/minute | ||
Llama 3.2 11B | 10 requests/minute | ||
Llama 3.2 90B | 1 requests/minute | ||
Llama 3.3 70B | 20 requests/minute | ||
Llama Guard 3 8B | 30 requests/minute | ||
Qwen 2.5 72B | 20 requests/minute | ||
Qwen 2.5 Coder 32B | 20 requests/minute | ||
QwQ 32B Preview | 10 requests/minute | ||
Cerebras | Free tier restricted to 8K context | Llama 3.1 8B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
Llama 3.3 70B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
||
Groq | Distil Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day | |
Gemma 2 9B Instruct | 14,400 requests/day 15,000 tokens/minute | ||
Llama 3 70B | 14,400 requests/day 6,000 tokens/minute | ||
Llama 3 8B | 14,400 requests/day 30,000 tokens/minute | ||
Llama 3.1 70B | 14,400 requests/day 6,000 tokens/minute | ||
Llama 3.1 8B | 14,400 requests/day 20,000 tokens/minute | ||
Llama 3.2 11B Vision | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 1B | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 3B | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 90B Vision | 3,500 requests/day 7,000 tokens/minute | ||
Llama 3.3 70B | 1,000 requests/day 6,000 tokens/minute | ||
Llama 3.3 70B (Speculative Decoding) | 1,000 requests/day 6,000 tokens/minute | ||
Llama Guard 3 8B | 14,400 requests/day 15,000 tokens/minute | ||
Mixtral 8x7B | 14,400 requests/day 5,000 tokens/minute | ||
Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day | ||
Whisper Large v3 Turbo | 7,200 audio-seconds/minute 2,000 requests/day | ||
Scaleway Generative APIs (Free Beta) | BGE-Multilingual-Gemma2 | 100 requests/minute 200,000 tokens/minute | |
Llama 3.1 70B Instruct | 300 requests/minute 100,000 tokens/minute | ||
Llama 3.1 8B Instruct | 300 requests/minute 100,000 tokens/minute | ||
Llama 3.3 70B Instruct | |||
Mistral Nemo 2407 | 300 requests/minute 100,000 tokens/minute | ||
Pixtral 12B (2409) | 300 requests/minute 100,000 tokens/minute | ||
Qwen2.5 Coder 32B Instruct | |||
sentence-t5-xxl | 100 requests/minute 200,000 tokens/minute | ||
OVH AI Endpoints (Free Beta) | CodeLlama 13B Instruct | 12 requests/minute | |
Codestral Mamba 7B v0.1 | 12 requests/minute | ||
Llama 2 13B Chat | 12 requests/minute | ||
Llama 3 70B Instruct | 12 requests/minute | ||
Llama 3 8B Instruct | 12 requests/minute | ||
Llama 3.1 70B Instruct | 12 requests/minute | ||
Llava Next Mistral 7B | 12 requests/minute | ||
Mathstral 7B v0.1 | 12 requests/minute | ||
Mistral 7B Instruct | 12 requests/minute | ||
Mistral Nemo 2407 | 12 requests/minute | ||
Mixtral 8x22B Instruct | 12 requests/minute | ||
Mixtral 8x7B Instruct | 12 requests/minute | ||
Together | Llama 3.2 11B Vision Instruct | ||
Llama 3.3 70B Instruct | |||
Cohere | 20 requests/min 1,000 requests/month |
Command-R | Shared Limit |
Command-R+ | |||
GitHub Models | Extremely restrictive input/output token limits. Rate limits dependent on Copilot subscription tier (Free/Pro/Business/Enterprise) | AI21 Jamba 1.5 Large | |
AI21 Jamba 1.5 Mini | |||
Codestral 25.01 | |||
Cohere Command R | |||
Cohere Command R 08-2024 | |||
Cohere Command R+ | |||
Cohere Command R+ 08-2024 | |||
Cohere Embed v3 English | |||
Cohere Embed v3 Multilingual | |||
JAIS 30b Chat | |||
Llama-3.2-11B-Vision-Instruct | |||
Llama-3.2-90B-Vision-Instruct | |||
Llama-3.3-70B-Instruct | |||
Meta-Llama-3-70B-Instruct | |||
Meta-Llama-3-8B-Instruct | |||
Meta-Llama-3.1-405B-Instruct | |||
Meta-Llama-3.1-70B-Instruct | |||
Meta-Llama-3.1-8B-Instruct | |||
Ministral 3B | |||
Mistral Large | |||
Mistral Large (2407) | |||
Mistral Large 24.11 | |||
Mistral Nemo | |||
Mistral Small | |||
OpenAI GPT-4o | |||
OpenAI GPT-4o mini | |||
OpenAI Text Embedding 3 (large) | |||
OpenAI Text Embedding 3 (small) | |||
OpenAI o1 | |||
OpenAI o1-mini | |||
OpenAI o1-preview | |||
Phi-3-medium instruct (128k) | |||
Phi-3-medium instruct (4k) | |||
Phi-3-mini instruct (128k) | |||
Phi-3-mini instruct (4k) | |||
Phi-3-small instruct (128k) | |||
Phi-3-small instruct (8k) | |||
Phi-3.5-MoE instruct (128k) | |||
Phi-3.5-mini instruct (128k) | |||
Phi-3.5-vision instruct (128k) | |||
Phi-4 | |||
Cloudflare Workers AI | 10,000 tokens/day | Deepseek Coder 6.7B Base (AWQ) | |
Deepseek Coder 6.7B Instruct (AWQ) | |||
Deepseek Math 7B Instruct | |||
Discolm German 7B v1 (AWQ) | |||
Falcom 7B Instruct | |||
Gemma 2B Instruct (LoRA) | |||
Gemma 7B Instruct | |||
Gemma 7B Instruct (LoRA) | |||
Hermes 2 Pro Mistral 7B | |||
Llama 2 13B Chat (AWQ) | |||
Llama 2 7B Chat (FP16) | |||
Llama 2 7B Chat (INT8) | |||
Llama 2 7B Chat (LoRA) | |||
Llama 3 8B Instruct | |||
Llama 3 8B Instruct | |||
Llama 3 8B Instruct (AWQ) | |||
Llama 3.1 8B Instruct | |||
Llama 3.1 8B Instruct (AWQ) | |||
Llama 3.1 8B Instruct (FP8) | |||
Llama 3.2 11B Vision Instruct | |||
Llama 3.2 1B Instruct | |||
Llama 3.2 3B Instruct | |||
Llama 3.3 70B Instruct (FP8) | |||
LlamaGuard 7B (AWQ) | |||
Mistral 7B Instruct v0.1 | |||
Mistral 7B Instruct v0.1 (AWQ) | |||
Mistral 7B Instruct v0.2 | |||
Mistral 7B Instruct v0.2 (LoRA) | |||
Neural Chat 7B v3.1 (AWQ) | |||
OpenChat 3.5 0106 | |||
OpenHermes 2.5 Mistral 7B (AWQ) | |||
Phi-2 | |||
Qwen 1.5 0.5B Chat | |||
Qwen 1.5 1.8B Chat | |||
Qwen 1.5 14B Chat (AWQ) | |||
Qwen 1.5 7B Chat (AWQ) | |||
SQLCoder 7B 2 | |||
Starling LM 7B Beta | |||
TinyLlama 1.1B Chat v1.0 | |||
Una Cybertron 7B v2 (BF16) | |||
Zephyr 7B Beta (AWQ) | |||
Google Cloud Vertex AI | Very stringent payment verification for Google Cloud. | Llama 3.1 70B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
Llama 3.1 8B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
||
Llama 3.2 90B Vision Instruct | Llama 3.2 API Service free during preview. 30 requests/minute |
||
Gemini 2.0 Flash Experimental | Experimental Gemini model. 10 requests/minute |
||
Gemini Flash Experimental | |||
Gemini Pro Experimental |
Provider | Credits | Requirements | Models |
---|---|---|---|
Together | $1 when you add a payment method | Various open models | |
Fireworks | $1 | Various open models | |
Unify | $5 when you add a payment method | Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc) | |
NVIDIA NIM | 1,000 API calls for 1 month | Various open models | |
Baseten | $30 | Any supported model - pay by compute time | |
Nebius | $1 | Various open models | |
Novita | $0.5 | Various open models | |
Hyperbolic | $10 | DeepSeek V2.5 | |
DeepSeek V3 | |||
Hermes 3 Llama 3.1 70B | |||
Llama 3 70B Instruct | |||
Llama 3.1 405B Base | |||
Llama 3.1 405B Base (FP8) | |||
Llama 3.1 405B Instruct | |||
Llama 3.1 405B Instruct Virtuals | |||
Llama 3.1 70B Instruct | |||
Llama 3.1 8B Instruct | |||
Llama 3.2 3B Instruct | |||
Llama 3.3 70B Instruct | |||
Pixtral 12B (2409) | |||
Qwen QwQ 32B Preview | |||
Qwen2-VL 72B Instruct | |||
Qwen2-VL 7B Instruct | |||
Qwen2.5 72B Instruct | |||
Qwen2.5 Coder 32B Instruct | |||
AI21 | $10 for 3 months | Jamba/Jurrasic-2 | |
Upstage | $10 for 3 months | Solar Pro/Mini | |
NLP Cloud | $15 | Phone number verification | Various open models |
Alibaba Cloud (International) Model Studio | Token/time-limited trials on a per-model basis | Various open and proprietary Qwen models |