InsightsMay 4, 2026

Token Prices Are Falling. Your AI Bill Is Rising. Both Are True.

Per-token costs are collapsing, but enterprise AI bills keep rising. This piece explores why token deflation does not translate to cost deflation—and the hidden mechanics driving AI spending upward.

Token Prices Are Falling. Your AI Bill Is Rising. Both Are True.
Neural Research // Field Entry 10M

“Tokens are the new commodity,” said Nvidia CEO Jensen Huang, clad in his iconic leather jacket, at the company’s flagship GTC conference in San Jose.

While Nvidia is shaping the rules of a new token economy, a parallel debate is emerging globally around “token exports.” AI-generated intelligence is becoming a tradable good measured in tokens—and nations are positioning themselves across the stack: energy, compute, models, and output.

What’s Actually True

The per-token narrative is both true and completely misleading.

Enterprise data shows the average cost per million tokens has dropped from roughly $10 to $2.50 in a year. Since GPT-3, prices have fallen nearly 600-fold. Models like Gemini 2.0 Flash now deliver capabilities that cost $60 per million tokens in 2023—for as little as $0.10 per million tokens.

Then the invoice arrives. And it’s higher than last quarter.

This is not theoretical. It is happening across enterprises. The reason: consumption patterns have changed more dramatically than prices have fallen.

Cheaper tokens didn’t just reduce cost—they unlocked entirely new usage. Instead of selective, high-value queries, organizations now route entire workflows through AI: emails, summaries, code review, documentation, customer support—running continuously.

Total token consumption has increased by factors that overwhelm price reductions.

The Core Mechanism: Token Deflation ≠ Cost Deflation

Cost per token ↓
Tokens per task ↑↑↑
Total cost ↑

True cost is not unit price—it is total consumption per outcome. Hidden reasoning tokens, multi-step inference, and context expansion make per-token comparisons misleading.

A “simple” response priced at $0.01 may internally consume tens of thousands of reasoning tokens—turning into a $50 computation hidden behind a one-cent interface.

Meanwhile, inference has grown to over 60% of total compute across major providers. Capital expenditure is exploding: Alphabet alone is projected to spend $175–185 billion in 2026. These are not signs of a solved cost equation—they signal demand outpacing efficiency.

Token Explosion (Hidden Compute Growth)

  • Chain-of-thought reasoning
  • Agents and tool use
  • RAG pipelines and retrieval loops

A single request now:

  • calls multiple models
  • generates hidden reasoning tokens
  • loops across tools and agents

1 request is no longer 1 request.


Jevons Paradox: Efficiency Drives Consumption

When something becomes cheaper → people use more of it

Applied to AI:

  • Cheaper tokens → more features
  • More features → more queries
  • More queries → higher total spend

This is already visible: massive cost reductions have led to multiplicative increases in usage, not savings.

Capability-Driven Demand Inflation

Better models don’t reduce usage—they create new workloads:

  • Agents → autonomous loops
  • Copilots → continuous background execution
  • Enterprise AI → embedded everywhere

In some cases, AI usage bills now rival or exceed employee costs. Token consumption is scaling toward trillions per day.

Cost per Token vs Cost per Outcome

Most organizations think:

“Model is cheaper → costs go down”

Reality:

“Model is more capable → it does more work → cost per task increases”

Even cheaper models can cost more due to unpredictable reasoning token usage and execution paths.

The Specific Mechanism

Jevons Paradox applied to compute: efficiency expands the use-case universe faster than cost declines.

Hidden token multiplier: reasoning, planning, and verification consume tokens invisible to users.

Multi-agent cascade: one task triggers chains of model calls, multiplying token usage.

A task that once used 1,000 tokens can now consume 50,000+ tokens in an agent-based system.

Agents Compound Token Usage

  • Planning
  • Verification
  • Retries
  • Tool calls

One query can become 10–100 model calls.

The Reasoning Premium

  • Advanced reasoning models cost significantly more
  • They also consume more tokens per response

Better does not mean cheaper.

Infrastructure Sets a Cost Floor

Even with falling token prices:

  • GPU costs
  • Energy consumption
  • Data center CAPEX

…prevent costs from approaching zero.

Pricing ≠ Reality

Real cost = tokens × model behavior

And behavior is stochastic, variable, and difficult to predict.

The Deeper Shift: AI as a Consumption Economy

Old software model:

  • Fixed subscription cost
  • Predictable usage

New AI model:

  • Variable cost (per token)
  • Unbounded usage

AI is becoming closer to:

  • Cloud compute
  • Electricity
  • API consumption at scale

What Needs to Exist

Total Cost of AI Ownership (TCOAI) frameworks must emerge.

These should include:

  • Visible tokens
  • Hidden reasoning tokens
  • Agent orchestration overhead
  • Human review costs
  • Error and failure costs
  • Infrastructure amortization

The same discipline that created FinOps for cloud must now exist for AI.

The organizations that understand this early will have a structural advantage. The rest will keep discovering the same thing—AI is getting cheaper, and their bills are still rising.

Author: Neural Research Lab
Reading Time: 10 Minutes