“Tokens are the new commodity,” said Nvidia CEO Jensen Huang, clad in his iconic leather jacket, at the company’s flagship GTC conference in San Jose.
While Nvidia is shaping the rules of a new token economy, a parallel debate is emerging globally around “token exports.” AI-generated intelligence is becoming a tradable good measured in tokens—and nations are positioning themselves across the stack: energy, compute, models, and output.
What’s Actually True
The per-token narrative is both true and completely misleading.
Enterprise data shows the average cost per million tokens has dropped from roughly $10 to $2.50 in a year. Since GPT-3, prices have fallen nearly 600-fold. Models like Gemini 2.0 Flash now deliver capabilities that cost $60 per million tokens in 2023—for as little as $0.10 per million tokens.
Then the invoice arrives. And it’s higher than last quarter.
This is not theoretical. It is happening across enterprises. The reason: consumption patterns have changed more dramatically than prices have fallen.
Cheaper tokens didn’t just reduce cost—they unlocked entirely new usage. Instead of selective, high-value queries, organizations now route entire workflows through AI: emails, summaries, code review, documentation, customer support—running continuously.
Total token consumption has increased by factors that overwhelm price reductions.
The Core Mechanism: Token Deflation ≠ Cost Deflation
Cost per token ↓
Tokens per task ↑↑↑
Total cost ↑
True cost is not unit price—it is total consumption per outcome. Hidden reasoning tokens, multi-step inference, and context expansion make per-token comparisons misleading.
A “simple” response priced at $0.01 may internally consume tens of thousands of reasoning tokens—turning into a $50 computation hidden behind a one-cent interface.
Meanwhile, inference has grown to over 60% of total compute across major providers. Capital expenditure is exploding: Alphabet alone is projected to spend $175–185 billion in 2026. These are not signs of a solved cost equation—they signal demand outpacing efficiency.
Token Explosion (Hidden Compute Growth)
- Chain-of-thought reasoning
- Agents and tool use
- RAG pipelines and retrieval loops
A single request now:
- calls multiple models
- generates hidden reasoning tokens
- loops across tools and agents
1 request is no longer 1 request.
Jevons Paradox: Efficiency Drives Consumption
When something becomes cheaper → people use more of it
Applied to AI:
- Cheaper tokens → more features
- More features → more queries
- More queries → higher total spend
This is already visible: massive cost reductions have led to multiplicative increases in usage, not savings.
Capability-Driven Demand Inflation
Better models don’t reduce usage—they create new workloads:
- Agents → autonomous loops
- Copilots → continuous background execution
- Enterprise AI → embedded everywhere
In some cases, AI usage bills now rival or exceed employee costs. Token consumption is scaling toward trillions per day.
Cost per Token vs Cost per Outcome
Most organizations think:
“Model is cheaper → costs go down”
Reality:
“Model is more capable → it does more work → cost per task increases”
Even cheaper models can cost more due to unpredictable reasoning token usage and execution paths.
The Specific Mechanism
Jevons Paradox applied to compute: efficiency expands the use-case universe faster than cost declines.
Hidden token multiplier: reasoning, planning, and verification consume tokens invisible to users.
Multi-agent cascade: one task triggers chains of model calls, multiplying token usage.
A task that once used 1,000 tokens can now consume 50,000+ tokens in an agent-based system.
Agents Compound Token Usage
- Planning
- Verification
- Retries
- Tool calls
One query can become 10–100 model calls.
The Reasoning Premium
- Advanced reasoning models cost significantly more
- They also consume more tokens per response
Better does not mean cheaper.
Infrastructure Sets a Cost Floor
Even with falling token prices:
- GPU costs
- Energy consumption
- Data center CAPEX
…prevent costs from approaching zero.
Pricing ≠ Reality
Real cost = tokens × model behavior
And behavior is stochastic, variable, and difficult to predict.
The Deeper Shift: AI as a Consumption Economy
Old software model:
- Fixed subscription cost
- Predictable usage
New AI model:
- Variable cost (per token)
- Unbounded usage
AI is becoming closer to:
- Cloud compute
- Electricity
- API consumption at scale
What Needs to Exist
Total Cost of AI Ownership (TCOAI) frameworks must emerge.
These should include:
- Visible tokens
- Hidden reasoning tokens
- Agent orchestration overhead
- Human review costs
- Error and failure costs
- Infrastructure amortization
The same discipline that created FinOps for cloud must now exist for AI.
The organizations that understand this early will have a structural advantage. The rest will keep discovering the same thing—AI is getting cheaper, and their bills are still rising.