A $7 Million Problem: Why Every AI Request Needs an Intelligent Router

Written by Starseer | May 3, 2026 11:16:55 PM

In April 2026, Uber's CTO Praveen Neppalli Naga said the quiet part out loud: his company had already burned through its entire annual AI budget, with three quarters of the year still ahead. He wasn't describing a failure of technology. He was describing a failure of governance. And Uber is far from alone.

The Runaway Train Nobody Budgeted For

Uber rolled out Anthropic's Claude Code to its 5,000-person engineering team in December 2025. By February, adoption had jumped from 32% to 63%. By March, 84% of engineers were classified as agentic coding users. The tool worked exactly as promised, and that was precisely the problem. Every token processed drew from a consumption-based billing model, and at that scale, the compounding was devastating. Individual engineers reported monthly API costs between $500 and $2,000, and Uber's R&D budget of $3.4 billion couldn't absorb the acceleration.

But this isn't an Uber story. It's an industry story.

60.2T

Tokens consumed at Meta
in a single 30-day period

$113K

Monthly AI spend by a
4-person startup (Swan AI)

99%

Of companies reporting
AI financial losses (EY)

Meta employees consumed over 60 trillion tokens in a single month, competing on an internal “Claudeonomics” leaderboard for titles like “Token Legend.” That leaderboard was dismantled within 48 hours of being leaked, but the damage, potentially hundreds of millions of dollars in wasted compute, was already done. At Visa, token consumption doubled month-over-month, reaching 1.9 trillion tokens in March alone. At Microsoft, engineers admitted to deliberately inflating their AI usage metrics, running queries against documentation they could have read directly, and prototyping features they never intended to build.

The phenomenon has been dubbed “tokenmaxxing,” and it's spreading across the industry. Companies like Meta, Microsoft, and Salesforce have made AI usage a visible performance metric, and employees have responded by gaming the numbers. Some run multiple parallel agents overnight. Others submit deliberately verbose prompts or ask AI to re-derive information that already exists in internal documentation. The result is a perverse incentive loop: the more tokens burned, the better the performance signal, regardless of whether any of that compute produced meaningful output. JPMorgan is now using internal dashboards to track usage of tools like GitHub Copilot and Anthropic models across engineering groups. Nvidia's Jensen Huang has suggested that engineers earning $500,000 should consume at least $250,000 in tokens annually. The industry is normalizing massive AI expenditure without any standard mechanism to evaluate whether that spending is productive.

Meanwhile, a Goldman Sachs survey found that large enterprises are overrunning their AI budgets by orders of magnitude. An Ernst & Young study reported that 99% of surveyed companies experienced financial losses from AI, averaging $4.4 million per organization. The FinOps Foundation's 2026 report identifies AI as the fastest-growing enterprise spend category, with average annual AI budgets ballooning from $1.2 million in 2024 to $7 million in 2026.

“The biggest budgeting risk in 2026 isn't overspending; it's spending invisibly, without monitoring or attribution.”

Zylo, 2026 SaaS Management Report

Why Traditional Controls Don't Work

The instinct is to reach for familiar levers: usage caps, seat-based licensing, approval workflows. But AI token consumption doesn't behave like traditional SaaS. It's variable, unpredictable, and tightly coupled to the value of the work being done. A developer doing a complex multi-step refactor may legitimately burn 50x the tokens of a colleague writing a simple utility function. Cap them both equally and you throttle your most productive engineers while leaving inefficiency unchecked.

This is the cloud compute sprawl problem all over again, but with a critical difference: cloud resources were at least provisioned with intent. Someone requested a VM or a cluster for a specific workload. Token consumption, by contrast, is granular and continuous. It happens inside every conversation, every code review, every agent loop. There is no provisioning step, no approval gate, no natural moment where cost becomes visible before it's incurred. By the time a monthly invoice arrives, the spending is weeks old and the patterns are already entrenched.

The fundamental problem is that organizations are treating every AI request as equal. A simple lookup question gets routed to the same premium model as a complex architectural analysis. A prompt that could be handled by an internally hosted model hits an expensive external API. A malicious or off-policy prompt consumes the same resources as a legitimate business request. Without an intelligent layer that understands what is being asked, why it's being asked, and where it should go, every token is a coin toss between value and waste.

Deloitte has flagged AI as the fastest-growing IT expense, consuming up to half the IT budget at some firms. And as Goldman Sachs noted in its latest research, the absence of an orchestration layer that routes simple requests to cheaper models while reserving advanced LLMs for complex work is a primary driver of enterprise AI budget overruns.

The Case for an Intelligent Router

What organizations need isn't less AI. It's smarter AI governance. An intelligent intermediary that sits between every user request and every model endpoint, making real-time decisions about classification, security, and routing. This is the architecture that Starseer Intelligent Router was built to provide.

Intelligent Router Architecture

user_input

“Draft employee onboarding checklist for engineering...”

Submit →

Sent: 14 ✓ 11 × 3

Intelligent Router

Starseer Platform

△ AI ROUTER

Classifying • Scanning • Routing

● AI-EDR MONITORING

Claude
Anthropic

GPT-4o
OpenAI

Llama
Meta

Granite
IBM

• • • Model routing is configurable

Pillar 1

Intent-Based Auto-Classification

Intelligent Router classifies every request by its actual intent, regardless of how the prompt is worded. A simple data lookup phrased as an elaborate query is still a simple data lookup. A knowledge retrieval task dressed up in complex language still gets identified for what it is. This means organizations stop paying premium-model prices for commodity-level work, and engineers get the right model for the right task without having to think about it.

Pillar 2

Malicious & Out-of-Policy Prompt Scanning

When employees across an organization use AI tools without centralized oversight, the risks go beyond cost. Shadow AI introduces data leakage, compliance violations, and security vulnerabilities. The Intelligent Router scans every prompt against organizational policy in real time, catching prompt injection attempts, sensitive data exposure, and usage patterns that fall outside acceptable boundaries, before they ever reach a model.

Pillar 3

Intelligent Model & Service Routing

This is where the cost calculus transforms. The Intelligent Router automatically routes each request to the optimal model or service based on both performance requirements and cost efficiency. Simple queries go to lighter, cheaper models. Routine tasks get directed to internally hosted models at a fraction of external API costs. Only genuinely complex, high-value requests reach premium endpoints. For organizations running thousands of engineers, this single capability can translate to millions of dollars in annual savings.

The Math That Changes Everything

Consider the numbers that are already public. Uber's 5,000 engineers, averaging $500 to $2,000 per month in API costs, represent a potential annual AI spend of $30 million to $120 million, on coding tools alone. If intelligent routing redirected even 40% of those requests to internally hosted models or lighter-weight alternatives, the savings would be measured in tens of millions per year. And that calculation only covers one tool at one company. Across the industry, the FinOps Foundation reports that AI inference now represents 85% of the enterprise AI budget, up from a negligible share just two years ago.

The same arithmetic applies across every organization in this position. Jason Calacanis, a prominent Silicon Valley investor, publicly described agent costs of $300 per day for work that delivered a fraction of a single employee's output. Swan AI's four-person team spent $217,000 in four months. GitHub is shifting to usage-based pricing in mid-2026 precisely because the subscription model collapsed when users began burning $11 in compute on a single premium request. The era of flat-rate AI is ending, and organizations without cost intelligence will be left exposed.

The pattern is consistent: organizations are spending aggressively on AI with no intelligent layer governing allocation. The ones that solve this problem first will operate at a structural cost advantage over every competitor that doesn't.

From Cost Center to Strategic Advantage

The enterprises that will win the AI era aren't the ones that spend the most on tokens. They're the ones that extract the most value per token. That requires moving beyond blunt instruments like usage caps and leaderboards toward a system that understands intent, enforces policy, and optimizes routing at the request level.

This is the architectural shift that Starseer Intelligent Router enables. Not by restricting AI adoption, but by making every AI interaction smarter, safer, and more cost-effective. Classify the intent. Scan for risk. Route to the right model. That's how organizations turn a $7 million problem into a strategic advantage.

Stop Burning Tokens. Start Routing Intelligently.

See how Starseer Intelligent Router gives your engineering and AI teams the classification, security, and routing layer they need to control AI costs without limiting innovation.

Request a Demo →

View full post