free-llm-api-resources

Free LLM API resources

This lists various services that provide free access or credits towards API-based LLM usage.

[!NOTE]
Please don’t abuse these services, else we might lose them.

[!WARNING]
This list explicitly excludes any services that are not legitimate (eg reverse engineers an existing chatbot)

Free Providers

OpenRouter

Limits:

20 requests/minute
50 requests/day
1000 requests/day with $10 credit balance

Models share a common quota.

Google AI Studio

Data is used for training when used outside of the UK/CH/EEA/EU.

Model NameModel Limits
Gemini 2.5 Pro (Experimental)1,000,000 tokens/day
250,000 tokens/minute
25 requests/day
5 requests/minute
Gemini 2.5 Flash (Preview)250,000 tokens/minute
500 requests/day
10 requests/minute
Gemini 2.0 Flash1,000,000 tokens/minute
1,500 requests/day
15 requests/minute
Gemini 2.0 Flash-Lite1,000,000 tokens/minute
1,500 requests/day
30 requests/minute
Gemini 2.0 Flash (Experimental)4,000,000 tokens/minute
1,500 requests/day
10 requests/minute
Gemini 1.5 Flash1,000,000 tokens/minute
1,500 requests/day
15 requests/minute
Gemini 1.5 Flash-8B1,000,000 tokens/minute
1,500 requests/day
15 requests/minute
Gemini 1.5 Pro32,000 tokens/minute
50 requests/day
2 requests/minute
LearnLM 1.5 Pro (Experimental)1,500 requests/day
15 requests/minute
Gemma 3 27B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 12B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 4B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
Gemma 3 1B Instruct15,000 tokens/minute
14,400 requests/day
30 requests/minute
text-embedding-004150 batch requests/minute
1,500 requests/minute
100 content/batch
Shared Quota
embedding-001

NVIDIA NIM

Phone number verification required. Models tend to be context window limited.

Limits: 40 requests/minute

Mistral (La Plateforme)

Limits (per-model): 1 request/second, 500,000 tokens/minute, 1,000,000,000 tokens/month

Mistral (Codestral)

Limits: 30 requests/minute, 2,000 requests/day

HuggingFace Inference Providers

HuggingFace Serverless Inference limited to models smaller than 10GB. Some popular models are supported even if they exceed 10GB.

Limits: $0.10/month in credits

Cerebras

Free tier restricted to 8K context.

Model NameModel Limits
Llama 4 Scout30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.1 8B30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day
Llama 3.3 70B30 requests/minute
60,000 tokens/minute
900 requests/hour
1,000,000 tokens/hour
14,400 requests/day
1,000,000 tokens/day

Groq

Model NameModel Limits
Allam 2 7B7,000 requests/day
6,000 tokens/minute
DeepSeek R1 Distill Llama 70B1,000 requests/day
6,000 tokens/minute
Distil Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Gemma 2 9B Instruct14,400 requests/day
15,000 tokens/minute
Groq compound-beta200 requests/day
70,000 tokens/minute
Groq compound-beta-mini200 requests/day
70,000 tokens/minute
Llama 3 70B14,400 requests/day
6,000 tokens/minute
Llama 3 8B14,400 requests/day
6,000 tokens/minute
Llama 3.1 8B14,400 requests/day
6,000 tokens/minute
Llama 3.3 70B1,000 requests/day
12,000 tokens/minute
Llama 4 Maverick 17B 128E Instruct1,000 requests/day
6,000 tokens/minute
Llama 4 Scout Instruct1,000 requests/day
30,000 tokens/minute
Llama Guard 3 8B14,400 requests/day
15,000 tokens/minute
Mistral Saba 24B1,000 requests/day
6,000 tokens/minute
Qwen QwQ 32B1,000 requests/day
6,000 tokens/minute
Whisper Large v37,200 audio-seconds/minute
2,000 requests/day
Whisper Large v3 Turbo7,200 audio-seconds/minute
2,000 requests/day

OVH AI Endpoints (Free Beta)

Model NameModel Limits
DeepSeek R1 Distill Llama 70B12 requests/minute
Llama 3.1 70B Instruct12 requests/minute
Llama 3.1 8B Instruct12 requests/minute
Llama 3.3 70B Instruct12 requests/minute
Llava Next Mistral 7B12 requests/minute
Mamba Codestral 7B v0.112 requests/minute
Mistral 7B Instruct v0.312 requests/minute
Mistral Nemo 240712 requests/minute
Mixtral 8x7B Instruct v0.112 requests/minute
Qwen 2.5 VL 72B Instruct12 requests/minute
Qwen2.5 Coder 32B Instruct12 requests/minute

Together (Free)

Limits: Up to 60 requests/minute

Cohere

Limits:

20 requests/minute
1,000 requests/month

Models share a common quota.

GitHub Models

Extremely restrictive input/output token limits.

Limits: Dependent on Copilot subscription tier (Free/Pro/Pro+/Business/Enterprise)

Chutes

Distributed, decentralized crypto-based compute. Data is sent to individual hosts.

Cloudflare Workers AI

Limits: 10,000 neurons/day

Google Cloud Vertex AI

Very stringent payment verification for Google Cloud.

Model NameModel Limits
Gemini 2.5 Pro (Experimental)10 requests/minute
Shared Quota
Gemini 2.0 Flash (Experimental)
Gemini 2.0 Flash Thinking (Experimental)
Gemini 2.0 Pro (Experimental)
Llama 4 Maverick Instruct60 requests/minute
Free during preview
Llama 4 Scout Instruct60 requests/minute
Free during preview
Llama 3.3 70B Instruct30 requests/minute
Free during preview
Llama 3.2 90B Vision Instruct30 requests/minute
Free during preview
Llama 3.1 70B Instruct60 requests/minute
Free during preview
Llama 3.1 8B Instruct60 requests/minute
Free during preview

Providers with trial credits

Together

Credits: $1 when you add a payment method

Models: Various open models

Fireworks

Credits: $1

Models: Various open models

Unify

Credits: $5 when you add a payment method

Models: Routes to other providers, various open models and proprietary models (OpenAI, Gemini, Anthropic, Mistral, Perplexity, etc)

Baseten

Credits: $30

Models: Any supported model - pay by compute time

Nebius

Credits: $1

Models: Various open models

Novita

Credits: $0.5 for 1 year, $10 for 3 months for LLMs with referral code + GitHub account connection

Models: Various open models

AI21

Credits: $10 for 3 months

Models: Jamba family of models

Upstage

Credits: $10 for 3 months

Models: Solar Pro/Mini

NLP Cloud

Credits: $15

Requirements: Phone number verification

Models: Various open models

Alibaba Cloud (International) Model Studio

Credits: Token/time-limited trials on a per-model basis

Models: Various open and proprietary Qwen models

Credits: $5/month upon sign up, $30/month with payment method added

Models: Any supported model - pay by compute time

Hyperbolic

Credits: $1

Models:

SambaNova Cloud

Credits: $5 for 3 months

Models:

Scaleway Generative APIs

Credits: 1,000,000 free tokens

Models: