There are serious limits to the comparison to the shale boom. Notably, there were contractual reasons why companies had to drill when acquiring land rights. The land grab drove excess capacity which drove down price. There is limited demand elasticity for natural gas. The weather is the weather and utilities contract for price anyway. Again, limited value in the comparison.
“Is the rate and pace of token-consumption growth slowing down a bit? Yeah, maybe:”
To which I say: ABSOLUTELY NOT.
The chart reflects Google deliberately engineering a slowdown in token consumption because real, unsuppressed demand was on a trajectory to 3-5 quadrillion tokens a month by year-end.
From mid-June to late July 2025, Google shipped 1M-2M token context windows to both paid Gemini Advanced users and the entire free tier, while keeping Gemini 1.5 Flash API pricing at $0.35/M tokens.
By late July, TPUv5/early v6 clusters were fully spoken for, new buildings in Texas and Oklahoma were still 12-24 months from full power, and long-context inference at $0.35/M tokens was running negative gross margins (w electricity and depreciation, cash gross margin was probably fine).
On August 14 Google rolled out three demand destroying policies back to back: free-tier context slashed from 1M to 128k, API prices hiked 60-100% for prompts >128k tokens, Workspace/Gmail side-panel AI context capped at 200-300k, hard per-conversation limits were enforced, and finally strict rate limit cuts were applied in October.
Daily compound token growth halved from +0.78% (over 4/30 to 7/31), to +0.4% (from 7/31 to 10/9) due to Google forcibly rationed tokens because they didn’t have the silicon or power to continue servicing such explosive demand.
The underlying hunger for million-token reasoning prompts is still there. It’s just been priced and quota-limited out of existence for now. When the latest TPUs and the 2026-27 data-center expansions/new builds in TX and OK come online, the line will hockey-stick again. Probably even before then, if we get a bunch of efficiencies/optimizations in the meantime.
Your piece captures the Jevons dynamic beautifully—how efficiency fuels demand, not restraint. It’s a reminder that technology doesn’t just scale output; it rewrites appetites.
If I could add one small angle: Jevons isn’t fate. We can control our destiny—that’s the role of democratic governance. The Industrial Revolution made energy production more efficient and cheaper, and consumption exploded—classic Jevons. By 1900, London’s coal-fueled “pea-soupers” were already a chronic public-health disaster, with thousands dying each year from respiratory disease. Efficiency wasn’t the problem; scaling the wrong architecture was.
We didn’t stop burning—we learned to measure, regulate, and optimize it. Efficiency became disciplined rather than explosive. AI now needs that same evolution. Token throughput is the new BTU. We’ll get more power from it when we treat meaning, data, and inference as measurable commodities—auditable, standardized, reusable—rather than endless fuel to burn.
There are serious limits to the comparison to the shale boom. Notably, there were contractual reasons why companies had to drill when acquiring land rights. The land grab drove excess capacity which drove down price. There is limited demand elasticity for natural gas. The weather is the weather and utilities contract for price anyway. Again, limited value in the comparison.
Brilliant. My book collectin is Jevons in action.
Everyone’s trying to decide whether AI is the new oil or the new shale. It’s neither. It’s cognition priced like corn.
Efficiency industrializes it and lowers the marginal cost of pretending to think.
“Is the rate and pace of token-consumption growth slowing down a bit? Yeah, maybe:”
To which I say: ABSOLUTELY NOT.
The chart reflects Google deliberately engineering a slowdown in token consumption because real, unsuppressed demand was on a trajectory to 3-5 quadrillion tokens a month by year-end.
From mid-June to late July 2025, Google shipped 1M-2M token context windows to both paid Gemini Advanced users and the entire free tier, while keeping Gemini 1.5 Flash API pricing at $0.35/M tokens.
By late July, TPUv5/early v6 clusters were fully spoken for, new buildings in Texas and Oklahoma were still 12-24 months from full power, and long-context inference at $0.35/M tokens was running negative gross margins (w electricity and depreciation, cash gross margin was probably fine).
On August 14 Google rolled out three demand destroying policies back to back: free-tier context slashed from 1M to 128k, API prices hiked 60-100% for prompts >128k tokens, Workspace/Gmail side-panel AI context capped at 200-300k, hard per-conversation limits were enforced, and finally strict rate limit cuts were applied in October.
Daily compound token growth halved from +0.78% (over 4/30 to 7/31), to +0.4% (from 7/31 to 10/9) due to Google forcibly rationed tokens because they didn’t have the silicon or power to continue servicing such explosive demand.
The underlying hunger for million-token reasoning prompts is still there. It’s just been priced and quota-limited out of existence for now. When the latest TPUs and the 2026-27 data-center expansions/new builds in TX and OK come online, the line will hockey-stick again. Probably even before then, if we get a bunch of efficiencies/optimizations in the meantime.
I want my goddam 2mm context window, Google!
Thank you for coming to my TED talk
Your piece captures the Jevons dynamic beautifully—how efficiency fuels demand, not restraint. It’s a reminder that technology doesn’t just scale output; it rewrites appetites.
If I could add one small angle: Jevons isn’t fate. We can control our destiny—that’s the role of democratic governance. The Industrial Revolution made energy production more efficient and cheaper, and consumption exploded—classic Jevons. By 1900, London’s coal-fueled “pea-soupers” were already a chronic public-health disaster, with thousands dying each year from respiratory disease. Efficiency wasn’t the problem; scaling the wrong architecture was.
We didn’t stop burning—we learned to measure, regulate, and optimize it. Efficiency became disciplined rather than explosive. AI now needs that same evolution. Token throughput is the new BTU. We’ll get more power from it when we treat meaning, data, and inference as measurable commodities—auditable, standardized, reusable—rather than endless fuel to burn.
Thanks for grounding the debate in real numbers.
This is super interesting. Why are a16z graphs and images so blurry? Seems like they were extremely compressed