How many free requests per day does OpenRouter allow?

For models whose ID ends in :free, an account that has never bought credits gets 50 requests per day. Once you have purchased at least $10 in credits (lifetime), the cap rises to 1,000 requests per day, with a per-minute limit on top (checked against OpenRouter docs, Jul 4, 2026).

Why do OpenRouter free models return 429 when my quota is unused?

The 429 that says temporarily rate-limited upstream comes from the provider pool every account shares, not from your quota. In our test, llama-3.3-70b:free returned 429 on all 9 attempts while the day's 1,000-request quota was untouched.

Are OpenRouter free models suitable for production?

No. Availability depends on a shared pool you cannot control, and it can stay saturated for stretches at a time. They are great for evaluating models and for retry-friendly scripts. Anything a user waits on should run on owned quota such as Workers AI, or a pay-per-token API.

Which free models does OpenRouter have?

23 as of Jul 4, 2026, including nemotron-3-ultra at 550B, hermes-405b, OpenAI's gpt-oss-120b reasoning model, qwen3-next-80b, and llama-3.3-70b, the same model Cloudflare Workers AI serves on its free tier.

OpenRouter Free Models Tested: Daily Limits, 429s, and When to Use Them

I have a personal investing agent running every day, and its brain has been wired through OpenRouter from day one. When I opened the usage page while writing this post, the lifetime bill read $0.47. Not a typo. Forty-seven cents, for a job that runs daily.

In the previous post, about putting an AI box on a static site, I briefly explained why I did not pick OpenRouter's free models, quoting numbers from their docs. This time, with a real key in hand, I did not have to trust the docs. I measured everything myself: the free catalog, the quota rules, and the question that matters more than anything else: when you call it, do you actually get an answer?

Part 123 free models, and the rule hiding in one number

OpenRouter lists hundreds of models, and 23 of them are free (counted through their own API on Jul 4, 2026). They are not small ones either. The current highlights:

nemotron-3-ultra at 550B, the largest in the free catalog
hermes-405b at 405B
gpt-oss-120b OpenAI's downloadable open-weight model, a reasoning type (it thinks in steps before answering)
qwen3-next-80b the newer Qwen line
llama-3.3-70b the same model we use on Workers AI in the previous post

The free-usage rules fit on one docs page (checked the same day). Three lines:

Free models are the variants whose ID ends in :free, with a per-minute cap on top
An account that has never bought credits gets 50 requests a day
Once your lifetime top-up reaches $10, the cap rises to 1,000 requests a day

My account topped up once (the $0.47 bill drew from that credit), and you can check the status on their /key endpoint: my is_free_tier says false, so my cap is 1,000 a day. Reading this far, it looks like a great deal. Pay $10 once, call 550B-class models a thousand times a day.

But there is something the number 1,000 never promised. The quota belongs to your account; the machines belong to everyone. And this is exactly where the docs and reality part ways.

Part 2The real rate limit: full quota, nine 429s

On the afternoon of Jul 4, I fired at three free models with a real key, on a day when none of the quota had been used. Here is what came back:

Model (:free variant)	Attempts	Result
llama-3.3-70b	9 in ~5 minutes	429 on all 9, no answer at all
qwen3-next-80b	3	429 on all 3
gpt-oss-120b	2	answer on attempt 2, in 34.7 seconds

The error message is more interesting than the numbers. It reads temporarily rate-limited upstream. The one saying "can't take it" is not OpenRouter but the downstream provider donating machines to run that free model. We got 429 nine times while the day's 1,000-request quota sat untouched, because the queue that filled up was not our account's queue. It was the shared queue of everyone on the internet calling that same free model that minute.

As for gpt-oss-120b, the one that answered, two things are worth telling. First, Thai output worked. Second, the 34.7 seconds were not queue time: this is a reasoning model that thinks before it answers, and the 1,380 output tokens include the thinking (input was only 123). If you plan to use this class of model, remember that the thinking is something you wait for, and on paid tiers it is also something you pay for.

Part 3Hosted free vs shared-queue free

The clearest picture comes from llama-3.3-70b, because we hold the same model in two places: the :free variant on OpenRouter that just returned nine 429s, and Workers AI, which we measured in the previous post. Same day, same model.

Path	Measured result (Jul 4, 2026)
OpenRouter llama-3.3-70b:free	9 attempts, zero answers (429 every time)
Workers AI llama-3.3-70b (fast)	5 attempts, 5 answers, 9.8s average, no 429s

The difference is not the model. It is who owns the queue. Seen from that angle, free AI today comes in two kinds:

Hosted free a free quota tied to your account, on infrastructure the provider fully controls. Workers AI gives 10,000 neurons a day (their usage-metering unit). Slow-ish, but it shows up.
Shared-queue free a central pool everyone shares, like OpenRouter's :free variants. Sometimes open, sometimes saturated all afternoon. You cannot predict it and cannot control it.

The first kind suits anything with a person waiting at the screen. The second suits work that can wait, can retry, or just wants a taste.

Part 4Which job, which path

Your job	Best path	Why
Trying new models, comparing several	OpenRouter :free	One catalog, 23 models up to 550B, one key, switching is a one-line change
Background scripts that can wait and retry	OpenRouter :free + retry logic	429s are the nature of a free pool; design for them and it works
A web endpoint someone waits on	Workers AI or any hosted quota	The queue is yours. Slow is fine, absent is not. The full how-to is in the previous post
Traffic outgrowing the free quotas	OpenRouter pay-per-token	Almost no code change, and the first $10 top-up also unlocks the 1,000-a-day :free cap

If you keep one principle from this post, make it this: before you use anything free, ask who owns the queue. If the queue is yours, free means cheap. If the queue is everyone's, free means unpredictable, and the job has to be designed to live with that.

Where to start

Sign up for OpenRouter, create a key, and fire a :free model at your real workload. No card required.
If you hit 429, do not conclude it is unusable. That is the nature of a free queue. Write a retry and let it run in the background.
The day you want AI sitting on a public page for other people, move to a hosted path. The previous post walks through the whole build.

That investing agent still runs every day, and the bill still climbs a few cents at a time. The 23 free models stay in my toolbox as a practice field, not the real game. The best free tier is not the biggest one. It is the one where you know exactly what part is free.

Sources & references

Free-model rules and daily caps: API Rate Limits (OpenRouter Docs), checked Jul 4, 2026: :free variants have a per-minute cap; accounts under $10 lifetime credits get 50 requests/day, at $10 the cap is 1,000/day
429 results and timings: measured with our own key, Jul 4, 2026 (llama-3.3-70b:free 9 attempts, qwen3-next-80b:free 3 attempts, gpt-oss-120b:free success on attempt 2 at 34.7s, 123 tokens in / 1,380 out). Account status checked via the /key endpoint (is_free_tier: false)
Free model count (23): counted via the OpenRouter model catalog API the same day
Workers AI numbers (5/5 answered, 9.8s average): from our previous post, measured the same day

23 Free Models on OpenRouter
The Quota Is Real, the Queue Is Not Yours