Is Cloudflare Workers AI free?

Every account gets 10,000 free neurons a day. Measured on real traffic, one ~550-token answer from Llama 3.3 70B costs about 124 neurons, so the free tier covers roughly 80 answers a day. Beyond that it is $0.011 per 1,000 neurons, about $0.0014 per answer.

Do I need an API key to use Workers AI?

No. Model access is bound to your Cloudflare account at the platform level through a binding called env.AI, declared in two lines of wrangler.toml. There is no secret in your code to store, rotate, or leak.

How do I add AI to a static site?

Through the Cloudflare Worker that already serves the site: add one POST endpoint that calls env.AI.run() with your prompt, and have the page call it with a plain fetch. The HTML files themselves stay fully static.

How does Workers AI compare to OpenRouter free models?

OpenRouter’s flagship free model is the same Llama 3.3 70B, but it requires an account and an API key, and accounts that never topped up get 50 free calls a day, less than the ~80 answers Workers AI covers free with no key at all. A one-time $10 top-up raises OpenRouter to 1,000 calls a day, which makes it the cheapest expansion path once traffic outgrows the free quota.

How fast is Llama 3.3 70B on Workers AI?

Measured over 5 real calls with ~550-token answers: 9.8 seconds average inference time, about 15 seconds end to end at the page. Good for long answers people will wait for, not for snappy short-turn chat.

Cloudflare Workers AI: Add a Free LLM to a Static Site, No Backend Needed

Quick answer

A static site behind Cloudflare can get AI without a backend and without storing an API key: Workers AI binds a model like Llama 3.3 70B to your worker through a two-line env.AI config binding. The free tier is 10,000 neurons a day. Measured for real, that is about 80 answers a day at ~124 neurons and ~15 seconds each.

This happened in a single day. In the morning, our PRD consulting landing page was an ordinary static site: text, images, a mailto button. By the afternoon, that same page had a box where a reader can paste their app idea and get back an eight-section PRD skeleton with the risks called out, in about fifteen seconds.

Here is what we did NOT add: a server. There is still no backend of our own, no VM, no container, and not a single API key anywhere in the code. The site itself is still plain static HTML.

One thing makes this possible: Cloudflare Workers AI. This post explains how it works, plus the thing posts like this usually skip: numbers measured from the real thing. Neurons per answer, latency, and the actual bill, pulled fresh right before writing.

Part 1The real thing that just shipped: an AI box on a landing page

Our goal was concrete. The landing page sells product-requirement consulting, and we wanted readers to try the thinking before reaching out. So we built two things. The first is a seven-question quiz that scores how ready your requirements are; that one is pure JavaScript, no AI. The second is the star of this post: a box that takes an idea and answers back with a PRD skeleton. The reader describes their idea in a few sentences, and the system returns an eight-section outline, from problem and users through scope to acceptance criteria, closing with the risks worth answering before telling an AI to build.

The model answering is Llama 3.3 70B (the instruct fp8 fast variant), running on Cloudflare's network, not our machine. You can try the real thing at productize.life/services/prd-en.

The path of one question: everything lives on Cloudflare's network, not a single server of ours

Part 2Why no backend and no API key

Sites like this usually get stuck on one question: where does the AI live? Call a model straight from the page and you have to embed an API key in the HTML, which means handing your key to the whole internet. Avoid that and you need a backend in the middle, which means a machine to run, maintain, and pay for monthly. For a static site that wants to stay light, neither option is pretty.

Workers AI cuts the knot with one idea: the model lives where the worker lives, and access is bound to the account, not to a key. If your site is already served through a Cloudflare Worker (ours already used one as a reverse proxy and membership gate), adding AI is a two-line binding in wrangler.toml:

[ai]
binding = "AI"

With that, your worker code gets an env.AI variable it can call directly. One endpoint and one call to env.AI.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", ...) returns an answer. There is no key to store, which means no key to leak, nothing to rotate, no secret manager to set up. The page side is a plain fetch to an endpoint on your own domain.

The rest of the work is not AI work at all; it is the same old web work: validate input, cap usage, and write a system prompt that answers the way you want. We distilled ours from the eight-section PRD template used in real consulting engagements, with one rule we would urge anyone to include: never invent details the user did not give; where information is missing, say what is missing instead of guessing. Otherwise the model will happily fill in things nobody said. We wrote about that failure mode in Why AI agents lie.

Part 3Measured numbers: neurons, speed, and the bill

Workers AI bills in a unit Cloudflare calls the neuron. Every account gets 10,000 free neurons a day; beyond that it is $0.011 per 1,000 neurons. What the docs cannot tell you is how many neurons one of YOUR answers actually costs. So we pulled today's real usage from Cloudflare's own analytics.

Numbers from 5 real calls (Jul 4, 2026)	Measured value
total neurons	621.5 (average ~124 per call)
input tokens (system prompt + idea)	2,202 total (~440 per call)
output tokens (the PRD skeleton returned)	2,748 total (~550 per call)
average inference time	9.8 seconds per answer
end-to-end at the page (measured with curl)	~15 seconds
free tier of 10,000 neurons/day covers	~80 answers a day
price per answer beyond the free tier	~$0.0014
today's bill	$0

These numbers cross-check, too. The pricing page lists Llama 3.3 70B fast at 26,668 neurons per million input tokens and 204,805 per million output tokens. Multiply back: 2,202 input tokens gives 58.7, plus 2,748 output tokens gives 562.8, total 621.5, exactly matching what the dashboard reports.

There is a lesson hiding in that pair of numbers: almost all the cost sits in output tokens (562.8 of 621.5), because the output rate is nearly eight times the input rate. If you want to control cost, cap answer length with max_tokens before you bother squeezing the prompt.

As for speed: nearly ten seconds for a ~550-token answer means this suits long answers people are willing to wait for, like turning an idea into a document outline. It does not suit short snappy chat where people expect a reply in a second or two. Set expectations right on the page; ours says plainly to expect 10-20 seconds.

What about other free options, like OpenRouter?

The question we ran into ourselves right after shipping: why not use OpenRouter's free models instead? The answer sits in three numbers (checked against OpenRouter's docs, Jul 4, 2026):

OpenRouter's flagship free model is llama-3.3-70b, the same model Workers AI runs, but it needs an account plus one more API key to store.
An account that has never topped up gets 50 free-model calls a day, less than the ~80 answers the Workers AI free tier covers. A one-time $10 top-up raises it to 1,000 calls a day.
We picked Workers AI because there is no key to leak and the model runs on the same network as the worker. The $10 OpenRouter path is the cheapest next step once traffic outgrows the free quota, before moving to a seriously pay-per-token API.

Part 4What you need before putting AI on a public page

An endpoint that can call a model is free only up to 10,000 neurons a day. Leave it open to unlimited calls and a single script can burn the whole day's quota in minutes, then start climbing into paid territory. Before shipping, we set up three layers:

An email gate before use. The AI box unlocks after the reader takes the quiz and leaves an email. The server verifies a signed cookie; it does not just hide the button with JavaScript, because an endpoint can always be hit directly. No cookie, 401. (We already use this gate pattern across the site.)
A per-IP daily cap. Ours is 5 calls: enough to genuinely try it, not enough to poke at it all day.
A global daily cap for the whole system. Ours is 60 calls, comfortably under the ~80 the free tier covers. This layer is the guarantee that even under fire from a thousand IPs, the bill stays zero.

The counters live in KV (Cloudflare's key-value store), which has one trait worth knowing: it is eventually consistent, so a value you read can lag reality by tens of seconds. That makes these counters a soft cap that can miscount a little under rapid fire. We know because it caught us during latency testing: we deleted our own IP's counter and immediately fired again, the system still saw the stale number, and we got served our own 429. Which is actually good news twice over: it proves the limiter works in production, and its looseness leans toward blocking early rather than letting excess through. For budget protection that is the right kind of loose, because the global cap sits a full layer below the free quota anyway.

Part 5Do it yourself, step by step

If your site is already on Cloudflare (or you can move the DNS), the whole thing is:

Declare the binding. Add [ai] + binding = "AI" to the wrangler.toml of the worker serving your site.
Add one endpoint. Accept a JSON POST, validate input length, and pass it to env.AI.run() with a system prompt that defines the answer structure and forbids inventing details.
Put the gate in front of the model. Check your gate (email, login, whatever fits), then count a per-IP cap and a global cap in KV. Keep the global cap below the free quota, always. If you need exact counting later, move to Durable Objects.
On the page: one textarea, one button, one fetch. Show a clear "thinking" state, because the wait is around ten seconds.
Measure before you talk about it. Fire five real calls, open analytics, see what one answer costs in neurons, then work out whether the free tier covers your traffic.

The full worker code, including the signed-cookie gate and the rate-limit counters, is being written up as a method page for blog members in an upcoming post. The concepts are all here though; you can follow the five steps without waiting.

The shortest possible summary

Static site + Cloudflare Worker + an env.AI binding = a site with AI, no server, no key. And if the global daily cap sits under the free quota, the bill is zero by proof, not by hope.

Want to see it live before building your own? Try the AI box at productize.life/services/prd-en: paste your idea and see what PRD skeleton comes back.

Sources & references

Neurons, tokens, and inference time: measured ourselves from our Cloudflare account's analytics (GraphQL dataset aiInferenceAdaptiveGroups) over 5 real calls on Jul 4, 2026 · end-to-end latency measured with curl
Pricing and free tier: Workers AI Pricing (Cloudflare Docs), checked Jul 4, 2026: 10,000 free neurons/day, $0.011 per 1,000 neurons, Llama 3.3 70B fast = 26,668 neurons/M input tokens + 204,805 neurons/M output tokens
The demo described: the AI box that turns an idea into a PRD skeleton (the real thing, running exactly this setup)

Your static site can have AI.
No backend. No API key.