productize.life
TH EN
AI · Cost

23 Free Models on OpenRouter
The Quota Is Real, the Queue Is Not Yours

I already had an OpenRouter key running a personal investing agent, so I put the free models through a real test: what is actually free, where it breaks, and which jobs each path fits. Every number here is first-hand.

Yim· written with Dobby (AI Oracle)/Jul 4, 2026

I have a personal investing agent running every day, and its brain has been wired through OpenRouter from day one. When I opened the usage page while writing this post, the lifetime bill read $0.47. Not a typo. Forty-seven cents, for a job that runs daily.

In the previous post, about putting an AI box on a static site, I briefly explained why I did not pick OpenRouter's free models, quoting numbers from their docs. This time, with a real key in hand, I did not have to trust the docs. I measured everything myself: the free catalog, the quota rules, and the question that matters more than anything else: when you call it, do you actually get an answer?

Part 123 free models, and the rule hiding in one number

OpenRouter lists hundreds of models, and 23 of them are free (counted through their own API on Jul 4, 2026). They are not small ones either. The current highlights:

The free-usage rules fit on one docs page (checked the same day). Three lines:

  1. Free models are the variants whose ID ends in :free, with a per-minute cap on top
  2. An account that has never bought credits gets 50 requests a day
  3. Once your lifetime top-up reaches $10, the cap rises to 1,000 requests a day

My account topped up once (the $0.47 bill drew from that credit), and you can check the status on their /key endpoint: my is_free_tier says false, so my cap is 1,000 a day. Reading this far, it looks like a great deal. Pay $10 once, call 550B-class models a thousand times a day.

But there is something the number 1,000 never promised. The quota belongs to your account; the machines belong to everyone. And this is exactly where the docs and reality part ways.

Part 2The real rate limit: full quota, nine 429s

On the afternoon of Jul 4, I fired at three free models with a real key, on a day when none of the quota had been used. Here is what came back:

Model (:free variant)AttemptsResult
llama-3.3-70b9 in ~5 minutes429 on all 9, no answer at all
qwen3-next-80b3429 on all 3
gpt-oss-120b2answer on attempt 2, in 34.7 seconds

The error message is more interesting than the numbers. It reads temporarily rate-limited upstream. The one saying "can't take it" is not OpenRouter but the downstream provider donating machines to run that free model. We got 429 nine times while the day's 1,000-request quota sat untouched, because the queue that filled up was not our account's queue. It was the shared queue of everyone on the internet calling that same free model that minute.

As for gpt-oss-120b, the one that answered, two things are worth telling. First, Thai output worked. Second, the 34.7 seconds were not queue time: this is a reasoning model that thinks before it answers, and the 1,380 output tokens include the thinking (input was only 123). If you plan to use this class of model, remember that the thinking is something you wait for, and on paid tiers it is also something you pay for.

Part 3Hosted free vs shared-queue free

The clearest picture comes from llama-3.3-70b, because we hold the same model in two places: the :free variant on OpenRouter that just returned nine 429s, and Workers AI, which we measured in the previous post. Same day, same model.

PathMeasured result (Jul 4, 2026)
OpenRouter llama-3.3-70b:free9 attempts, zero answers (429 every time)
Workers AI llama-3.3-70b (fast)5 attempts, 5 answers, 9.8s average, no 429s

The difference is not the model. It is who owns the queue. Seen from that angle, free AI today comes in two kinds:

The first kind suits anything with a person waiting at the screen. The second suits work that can wait, can retry, or just wants a taste.

Part 4Which job, which path

Your jobBest pathWhy
Trying new models, comparing severalOpenRouter :freeOne catalog, 23 models up to 550B, one key, switching is a one-line change
Background scripts that can wait and retryOpenRouter :free + retry logic429s are the nature of a free pool; design for them and it works
A web endpoint someone waits onWorkers AI or any hosted quotaThe queue is yours. Slow is fine, absent is not. The full how-to is in the previous post
Traffic outgrowing the free quotasOpenRouter pay-per-tokenAlmost no code change, and the first $10 top-up also unlocks the 1,000-a-day :free cap

If you keep one principle from this post, make it this: before you use anything free, ask who owns the queue. If the queue is yours, free means cheap. If the queue is everyone's, free means unpredictable, and the job has to be designed to live with that.

Where to start

  1. Sign up for OpenRouter, create a key, and fire a :free model at your real workload. No card required.
  2. If you hit 429, do not conclude it is unusable. That is the nature of a free queue. Write a retry and let it run in the background.
  3. The day you want AI sitting on a public page for other people, move to a hosted path. The previous post walks through the whole build.

That investing agent still runs every day, and the bill still climbs a few cents at a time. The 23 free models stay in my toolbox as a practice field, not the real game. The best free tier is not the biggest one. It is the one where you know exactly what part is free.

Sources & references
Follow along

Get new posts and free resources first

Leave your email. New posts and the occasional free resource land in your inbox. No spam.

Email only, for updates.

Comments

Join the conversation

Share a thought.

Name is shown publicly. Email stays private and is never shown.

Loading comments…