productize.life
TH EN
AI · Static Sites

Your static site can have AI.
No backend. No API key.

Yesterday our page was a plain HTML file. Today it takes a reader's app idea and answers back with a draft PRD skeleton. No server of our own, no API key anywhere in the code, and the bill is still zero. Here is how, with every number actually measured.

Yim· written with Dobby (AI Oracle)/Jul 4, 2026

This happened in a single day. In the morning, our PRD consulting landing page was an ordinary static site: text, images, a mailto button. By the afternoon, that same page had a box where a reader can paste their app idea and get back an eight-section PRD skeleton with the risks called out, in about fifteen seconds.

Here is what we did NOT add: a server. There is still no backend of our own, no VM, no container, and not a single API key anywhere in the code. The site itself is still plain static HTML.

One thing makes this possible: Cloudflare Workers AI. This post explains how it works, plus the thing posts like this usually skip: numbers measured from the real thing. Neurons per answer, latency, and the actual bill, pulled fresh right before writing.

Part 1The real thing that just shipped: an AI box on a landing page

Our goal was concrete. The landing page sells product-requirement consulting, and we wanted readers to try the thinking before reaching out. So we built two things. The first is a seven-question quiz that scores how ready your requirements are; that one is pure JavaScript, no AI. The second is the star of this post: a box that takes an idea and answers back with a PRD skeleton. The reader describes their idea in a few sentences, and the system returns an eight-section outline, from problem and users through scope to acceptance criteria, closing with the risks worth answering before telling an AI to build.

The model answering is Llama 3.3 70B (the instruct fp8 fast variant), running on Cloudflare's network, not our machine. You can try the real thing at productize.life/services/prd-en.

Browser static page + fetch Cloudflare Worker checks the email gate counts rate limits in KV env.AI.run() Workers AI Llama 3.3 70B
The path of one question: everything lives on Cloudflare's network, not a single server of ours

Part 2Why no backend and no API key

Sites like this usually get stuck on one question: where does the AI live? Call a model straight from the page and you have to embed an API key in the HTML, which means handing your key to the whole internet. Avoid that and you need a backend in the middle, which means a machine to run, maintain, and pay for monthly. For a static site that wants to stay light, neither option is pretty.

Workers AI cuts the knot with one idea: the model lives where the worker lives, and access is bound to the account, not to a key. If your site is already served through a Cloudflare Worker (ours already used one as a reverse proxy and membership gate), adding AI is a two-line binding in wrangler.toml:

[ai]
binding = "AI"

With that, your worker code gets an env.AI variable it can call directly. One endpoint and one call to env.AI.run("@cf/meta/llama-3.3-70b-instruct-fp8-fast", ...) returns an answer. There is no key to store, which means no key to leak, nothing to rotate, no secret manager to set up. The page side is a plain fetch to an endpoint on your own domain.

The rest of the work is not AI work at all; it is the same old web work: validate input, cap usage, and write a system prompt that answers the way you want. We distilled ours from the eight-section PRD template used in real consulting engagements, with one rule we would urge anyone to include: never invent details the user did not give; where information is missing, say what is missing instead of guessing. Otherwise the model will happily fill in things nobody said. We wrote about that failure mode in Why AI agents lie.

Part 3Measured numbers: neurons, speed, and the bill

Workers AI bills in a unit Cloudflare calls the neuron. Every account gets 10,000 free neurons a day; beyond that it is $0.011 per 1,000 neurons. What the docs cannot tell you is how many neurons one of YOUR answers actually costs. So we pulled today's real usage from Cloudflare's own analytics.

Numbers from 5 real calls (Jul 4, 2026)Measured value
total neurons621.5 (average ~124 per call)
input tokens (system prompt + idea)2,202 total (~440 per call)
output tokens (the PRD skeleton returned)2,748 total (~550 per call)
average inference time9.8 seconds per answer
end-to-end at the page (measured with curl)~15 seconds
free tier of 10,000 neurons/day covers~80 answers a day
price per answer beyond the free tier~$0.0014
today's bill$0

These numbers cross-check, too. The pricing page lists Llama 3.3 70B fast at 26,668 neurons per million input tokens and 204,805 per million output tokens. Multiply back: 2,202 input tokens gives 58.7, plus 2,748 output tokens gives 562.8, total 621.5, exactly matching what the dashboard reports.

There is a lesson hiding in that pair of numbers: almost all the cost sits in output tokens (562.8 of 621.5), because the output rate is nearly eight times the input rate. If you want to control cost, cap answer length with max_tokens before you bother squeezing the prompt.

As for speed: nearly ten seconds for a ~550-token answer means this suits long answers people are willing to wait for, like turning an idea into a document outline. It does not suit short snappy chat where people expect a reply in a second or two. Set expectations right on the page; ours says plainly to expect 10-20 seconds.

What about other free options, like OpenRouter?

The question we ran into ourselves right after shipping: why not use OpenRouter's free models instead? The answer sits in three numbers (checked against OpenRouter's docs, Jul 4, 2026):

Part 4What you need before putting AI on a public page

An endpoint that can call a model is free only up to 10,000 neurons a day. Leave it open to unlimited calls and a single script can burn the whole day's quota in minutes, then start climbing into paid territory. Before shipping, we set up three layers:

  1. An email gate before use. The AI box unlocks after the reader takes the quiz and leaves an email. The server verifies a signed cookie; it does not just hide the button with JavaScript, because an endpoint can always be hit directly. No cookie, 401. (We already use this gate pattern across the site.)
  2. A per-IP daily cap. Ours is 5 calls: enough to genuinely try it, not enough to poke at it all day.
  3. A global daily cap for the whole system. Ours is 60 calls, comfortably under the ~80 the free tier covers. This layer is the guarantee that even under fire from a thousand IPs, the bill stays zero.

The counters live in KV (Cloudflare's key-value store), which has one trait worth knowing: it is eventually consistent, so a value you read can lag reality by tens of seconds. That makes these counters a soft cap that can miscount a little under rapid fire. We know because it caught us during latency testing: we deleted our own IP's counter and immediately fired again, the system still saw the stale number, and we got served our own 429. Which is actually good news twice over: it proves the limiter works in production, and its looseness leans toward blocking early rather than letting excess through. For budget protection that is the right kind of loose, because the global cap sits a full layer below the free quota anyway.

Part 5Do it yourself, step by step

If your site is already on Cloudflare (or you can move the DNS), the whole thing is:

  1. Declare the binding. Add [ai] + binding = "AI" to the wrangler.toml of the worker serving your site.
  2. Add one endpoint. Accept a JSON POST, validate input length, and pass it to env.AI.run() with a system prompt that defines the answer structure and forbids inventing details.
  3. Put the gate in front of the model. Check your gate (email, login, whatever fits), then count a per-IP cap and a global cap in KV. Keep the global cap below the free quota, always. If you need exact counting later, move to Durable Objects.
  4. On the page: one textarea, one button, one fetch. Show a clear "thinking" state, because the wait is around ten seconds.
  5. Measure before you talk about it. Fire five real calls, open analytics, see what one answer costs in neurons, then work out whether the free tier covers your traffic.

The full worker code, including the signed-cookie gate and the rate-limit counters, is being written up as a method page for blog members in an upcoming post. The concepts are all here though; you can follow the five steps without waiting.

The shortest possible summary

Static site + Cloudflare Worker + an env.AI binding = a site with AI, no server, no key. And if the global daily cap sits under the free quota, the bill is zero by proof, not by hope.

Want to see it live before building your own? Try the AI box at productize.life/services/prd-en: paste your idea and see what PRD skeleton comes back.

Sources & references
Follow along

Get new posts and free resources first

Leave your email. New posts and the occasional free resource land in your inbox. No spam.

Email only, for updates.

Comments

Join the conversation

Share a thought.

Name is shown publicly. Email stays private and is never shown.

Loading comments…