This lists various services that provide free access or credits towards API-based LLM usage.
[!NOTE]
Please don’t abuse these services, else we might lose them.
[!WARNING]
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)
Provider | Provider Limits/Notes | Model Name | Model Limits |
---|---|---|---|
Groq | Distil Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day | |
Gemma 2 9B Instruct | 14,400 requests/day 15,000 tokens/minute | ||
Gemma 7B Instruct | 14,400 requests/day 15,000 tokens/minute | ||
LLaVA 1.5 7B | 14,400 requests/day 30,000 tokens/minute | ||
Llama 3 70B | 14,400 requests/day 6,000 tokens/minute | ||
Llama 3 70B - Groq Tool Use Preview | 14,400 requests/day 15,000 tokens/minute | ||
Llama 3 8B | 14,400 requests/day 30,000 tokens/minute | ||
Llama 3 8B - Groq Tool Use Preview | 14,400 requests/day 15,000 tokens/minute | ||
Llama 3.1 70B | 14,400 requests/day 6,000 tokens/minute | ||
Llama 3.1 8B | 14,400 requests/day 20,000 tokens/minute | ||
Llama 3.2 11B (Text Only) | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 11B Vision | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 1B | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 3B | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 90B (Text Only) | 7,000 requests/day 7,000 tokens/minute | ||
Llama 3.2 90B Vision | 3,500 requests/day 7,000 tokens/minute | ||
Llama Guard 3 8B | 14,400 requests/day 15,000 tokens/minute | ||
Mixtral 8x7B | 14,400 requests/day 5,000 tokens/minute | ||
Whisper Large v3 | 7,200 audio-seconds/minute 2,000 requests/day | ||
Whisper Large v3 Turbo | 7,200 audio-seconds/minute 2,000 requests/day | ||
OpenRouter | 20 requests/minute 200 requests/day | Gemma 2 9B Instruct | |
Hermes 3 Llama 3.1 405B | |||
Liquid LFM 40B | |||
Llama 3 8B Instruct | |||
Llama 3.1 405B Instruct | |||
Llama 3.1 70B Instruct | |||
Llama 3.1 8B Instruct | |||
Llama 3.2 11B Vision Instruct | |||
Llama 3.2 1B Instruct | |||
Llama 3.2 3B Instruct | |||
Llama 3.2 90B Vision Instruct | |||
Mistral 7B Instruct | |||
Mythomax L2 13B | |||
OpenChat 7B | |||
Phi-3 Medium 128k Instruct | |||
Phi-3 Mini 128k Instruct | |||
Qwen 2 7B Instruct | |||
Toppy M 7B | |||
Zephyr 7B Beta | |||
Google AI Studio | Data is used for training (when used outside of the UK/CH/EEA/EU). | Gemini 1.5 Flash | 1,000,000 tokens/minute 1,500 requests/day 15 requests/minute |
Gemini 1.5 Flash (Experimental) | 1,000,000 tokens/minute 1,500 requests/day 5 requests/minute |
||
Gemini 1.5 Flash-8B | 1,000,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
Gemini 1.5 Flash-8B (Experimental) | 1,000,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
Gemini 1.5 Pro | 32,000 tokens/minute 50 requests/day 2 requests/minute |
||
Gemini 1.5 Pro (Experimental) | 1,000,000 tokens/minute 50 requests/day 2 requests/minute |
||
Gemini 1.0 Pro | 32,000 tokens/minute 1,500 requests/day 15 requests/minute |
||
text-embedding-004 | 150 batch requests/minute 1,500 requests/minute 100 content/batch |
||
embedding-001 | |||
Lambda Labs (Free Preview) | Requires credit card verification. | Hermes 3 405B | |
Hermes 3 70B | |||
Hermes 3 8B | |||
Liquid LFM 40B | |||
Llama 3.1 405B Instruct (FP8) | |||
Llama 3.1 70B Instruct (FP8) | |||
Llama 3.1 8B Instruct | |||
Llama 3.1 Nemotron 70B Instruct | |||
Llama 3.2 3B Instruct | |||
Mistral (La Plateforme) | Free tier (Experiment plan) requires opting into data training, requires phone number verification. | Open and Proprietary Mistral models | 1 request/second 500,000 tokens/minute 1,000,000,000 tokens/month |
Mistral (Codestral) | Currently free to use, monthly subscription based, requires phone number verification. | Codestral | 30 requests/minute 2,000 requests/day |
HuggingFace Serverless Inference | Limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB. |
Various open models | 1,000 requests/day (with an account) |
SambaNova Cloud | Llama 3.1 405B | 10 requests/minute | |
Llama 3.2 90B | 1 request/minute | ||
Llama 3.1 70B | 20 requests/minute | ||
Llama 3.2 11B | 10 requests/minute | ||
Llama 3.1 8B | 30 requests/minute | ||
Llama 3.2 3B | 30 requests/minute | ||
Llama 3.2 1B | 30 requests/minute | ||
Cerebras | Waitlist Free tier restricted to 8K context |
Llama 3.1 8B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
Llama 3.1 70B | 30 requests/minute 60,000 tokens/minute 900 requests/hour 1,000,000 tokens/hour 14,400 requests/day 1,000,000 tokens/day |
||
GitHub Models | Waitlist Rate limits dependent on Copilot subscription tier | AI21-Jamba-Instruct | |
Cohere Command R | |||
Cohere Command R+ | |||
Cohere Embed v3 English | |||
Cohere Embed v3 Multilingual | |||
Meta-Llama-3-70B-Instruct | |||
Meta-Llama-3-8B-Instruct | |||
Meta-Llama-3.1-405B-Instruct | |||
Meta-Llama-3.1-70B-Instruct | |||
Meta-Llama-3.1-8B-Instruct | |||
Mistral Large | |||
Mistral Large (2407) | |||
Mistral Nemo | |||
Mistral Small | |||
OpenAI GPT-4o | |||
OpenAI GPT-4o mini | |||
OpenAI Text Embedding 3 (large) | |||
OpenAI Text Embedding 3 (small) | |||
Phi-3-medium instruct (128k) | |||
Phi-3-medium instruct (4k) | |||
Phi-3-mini instruct (128k) | |||
Phi-3-mini instruct (4k) | |||
Phi-3-small instruct (128k) | |||
Phi-3-small instruct (8k) | |||
Phi-3.5-mini instruct (128k) | |||
OVH AI Endpoints (Free Alpha) | Token expires every week. | CodeLlama 13B Instruct | 12 requests/minute |
Codestral Mamba 7B v0.1 | 12 requests/minute | ||
Llama 2 13B Chat | 12 requests/minute | ||
Llama 3 70B Instruct | 12 requests/minute | ||
Llama 3 8B Instruct | 12 requests/minute | ||
Llama 3.1 70B Instruct | 12 requests/minute | ||
Mathstral 7B v0.1 | 12 requests/minute | ||
Mistral 7B Instruct | 12 requests/minute | ||
Mixtral 8x22B Instruct | 12 requests/minute | ||
Mixtral 8x7B Instruct | 12 requests/minute | ||
Cloudflare Workers AI | 10,000 tokens/day | Deepseek Coder 6.7B Base (AWQ) | |
Deepseek Coder 6.7B Instruct (AWQ) | |||
Deepseek Math 7B Instruct | |||
Discolm German 7B v1 (AWQ) | |||
Falcom 7B Instruct | |||
Gemma 2B Instruct (LoRA) | |||
Gemma 7B Instruct | |||
Gemma 7B Instruct (LoRA) | |||
Hermes 2 Pro Mistral 7B | |||
Llama 2 13B Chat (AWQ) | |||
Llama 2 7B Chat (FP16) | |||
Llama 2 7B Chat (INT8) | |||
Llama 2 7B Chat (LoRA) | |||
Llama 3 8B Instruct | |||
Llama 3 8B Instruct | |||
Llama 3 8B Instruct (AWQ) | |||
Llama 3.1 8B Instruct | |||
Llama 3.1 8B Instruct (AWQ) | |||
Llama 3.1 8B Instruct (FP8) | |||
Llama 3.2 11B Vision Instruct | |||
Llama 3.2 1B Instruct | |||
Llama 3.2 3B Instruct | |||
LlamaGuard 7B (AWQ) | |||
Mistral 7B Instruct v0.1 | |||
Mistral 7B Instruct v0.1 (AWQ) | |||
Mistral 7B Instruct v0.2 | |||
Mistral 7B Instruct v0.2 (LoRA) | |||
Neural Chat 7B v3.1 (AWQ) | |||
OpenChat 3.5 0106 | |||
OpenHermes 2.5 Mistral 7B (AWQ) | |||
Phi-2 | |||
Qwen 1.5 0.5B Chat | |||
Qwen 1.5 1.8B Chat | |||
Qwen 1.5 14B Chat (AWQ) | |||
Qwen 1.5 7B Chat (AWQ) | |||
SQLCoder 7B 2 | |||
Starling LM 7B Beta | |||
TinyLlama 1.1B Chat v1.0 | |||
Una Cybertron 7B v2 (BF16) | |||
Zephyr 7B Beta (AWQ) | |||
Together | Llama 3.2 11B Vision Instruct | Free for 2024 | |
Cohere | 20 requests/min 1,000 requests/month |
Command-R | Shared Limit |
Command-R+ | |||
Google Cloud Vertex AI | Very stringent payment verification for Google Cloud. | Llama 3.1 70B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
Llama 3.1 8B Instruct | Llama 3.1 API Service free during preview. 60 requests/minute |
||
Llama 3.2 90B Vision Instruct | Llama 3.2 API Service free during preview. 30 requests/minute |
||
Gemini Flash Experimental | Experimental Gemini model. 10 requests/minute |
||
Gemini Pro Experimental | |||
glhf.chat (Free Beta) | Email for API access | Any model on Hugging Face runnable on vLLM and fits on a A100 node (~640GB VRAM), including Llama 3.1 405B at FP8 |
Provider | Credits | Requirements | Models |
---|---|---|---|
Together | $5 | Various open models | |
Fireworks | $1 | Various open models | |
Unify | $10 (+$40 for getting into contact) | Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc) | |
DeepInfra | $1.80 | Various open models | |
NVIDIA NIM | 1,000 API calls for 1 month | Various open models | |
AI21 | $10 for 3 months | Jamba/Jurrasic-2 | |
NLP Cloud | $15 | Phone number verification | Various open models |
Upstage | $10 for 3 months | Solar Pro/Mini | |
Baseten | $30 | Any supported model - pay by compute time | |
xAI | $25/month until end of 2024 | Grok | |
Hyperbolic | $10 | DeepSeek V2.5 | |
Hermes 3 Llama 3.1 70B | |||
Llama 3 70B Instruct | |||
Llama 3.1 405B Base | |||
Llama 3.1 405B Base (FP8) | |||
Llama 3.1 405B Instruct | |||
Llama 3.1 70B Instruct | |||
Llama 3.1 8B Instruct | |||
Llama 3.2 3B Instruct | |||
Pixtral 12B (2409) | |||
Qwen2-VL 72B Instruct | |||
Qwen2-VL 7B Instruct | |||
Qwen2.5 72B Instruct | |||
Qwen2.5 Coder 32B Instruct |