The Leaderboard That Backfired
Amazon wanted proof its workforce was embracing AI. What it got instead was a masterclass in Goodhart's Law — the moment a measure becomes a target, it ceases to be a good measure.
According to reporting from the Financial Times, Amazon employees have been deliberately automating pointless tasks using the company's internal agentic tool, "MeshClaw," not because those tasks needed doing, but because the activity inflates their token consumption scores. Token usage — the raw count of data units processed by AI models — has become a de facto performance signal at Amazon, tracked on internal leaderboards visible to management.
Amazon says the numbers won't factor into performance reviews. Its employees don't believe that. And honestly, why would they?
What MeshClaw Actually Does (and What It's Being Used For)
MeshClaw is Amazon's in-house agent platform, rolled out broadly in recent weeks. On paper, it's a serious productivity tool — employees can build AI agents that handle code deployments, triage email, interact with Slack, and generally act as software-layer assistants for repetitive work.
That's the pitch. The reality, according to multiple Amazon insiders, is that some colleagues are using MeshClaw to fire off unnecessary AI activity in a loop — consuming tokens for the sake of consuming tokens. It's the enterprise equivalent of leaving a YouTube video running to inflate view counts.
One employee told the FT directly: "Some people are just using MeshClaw to maximise their token usage." Another flagged the structural problem even more bluntly: tracking creates competitive pressure, and competitive pressure creates perverse incentives.
This matters because Amazon has set a hard adoption target — over 80 percent of developers should be using AI tools weekly. That target, combined with visible leaderboards, created exactly the conditions where gaming becomes rational behavior for any career-conscious employee.

A $200 Billion Bet That Needs a Better Scorecard
Here's the uncomfortable context: Amazon is expected to spend roughly $200 billion in capital expenditure this year, the overwhelming majority directed at AI infrastructure and data centers. That is a number that demands justification — to investors, to boards, to the market.
When you're betting that scale of capital on AI transformation, you need evidence the workforce is actually using the tools. Token consumption is a metric that's easy to capture, easy to display on a leaderboard, and almost completely useless as a signal of genuine productivity improvement. It measures activity, not output. It's the AI-era equivalent of tracking lines of code written — a metric the industry abandoned decades ago precisely because it incentivizes bloat over quality.
The deeper issue is that Amazon isn't alone. The Decoder notes that Meta employees have engaged in similar tokenmaxxing behavior. This is becoming an industry pattern, not a single company's management failure. Wherever enterprises bolt numerical AI adoption targets onto employee evaluations — formally or informally — the same dynamic will emerge.
The Measurement Problem Nobody Wants to Admit
Measuring genuine AI-driven productivity is genuinely hard. Does a developer who uses AI to write 30 percent more code faster count as more productive if half that code needs rework? Does an analyst who prompts an AI agent to summarize 50 reports actually extract useful signal, or just process words? These questions don't resolve cleanly into a number.
Token consumption sidesteps all of that complexity — which is exactly why it's appealing to leadership and exactly why it fails as a metric. It turns a question about value creation into a question about usage volume. Employees, being rational actors, optimize for the number they're being watched on.
The irony is that MeshClaw, used legitimately, could be a genuinely powerful tool for Amazon's engineering culture. Agentic AI that handles real operational overhead — deployment pipelines, triage workflows, cross-system coordination — is exactly where enterprise AI starts generating measurable return. That value gets buried when the tool's reputation becomes synonymous with gaming scores.
What This Means
This story is a leading indicator for every enterprise AI rollout happening right now, not a quirky Amazon anecdote.
- For developers: If your company starts tracking AI usage metrics and making them visible to management, expect tokenmaxxing to follow. Push back early on what the metrics actually measure — and what they incentivize.
- For founders building enterprise AI tools: Usage volume is a vanity metric. The companies that will win long-term vendor relationships are those that help customers measure genuine outcomes — time saved on specific workflows, reduction in defect rates, revenue impact — not raw consumption numbers.
- For engineering and product leaders: Amazon's leaderboard approach is a cautionary tale, but the instinct behind it is understandable. The fix isn't to stop measuring; it's to measure things that can't be easily gamed. Qualitative peer surveys, output quality reviews, and time-to-completion benchmarks on specific task types are harder to fake than token counts.
- For the broader AI industry: The tokenmaxxing phenomenon puts pressure on every company positioning AI adoption metrics as proof of ROI. If the numbers are inflated by artificial activity, the productivity narrative that's currently justifying hundreds of billions in enterprise AI infrastructure spending starts to look shaky. Investors and analysts should be asking much harder questions about what "AI adoption" actually means in the data they're seeing.
The real question Amazon — and every large organization racing to demonstrate AI returns — needs to answer isn't "how many tokens did our employees consume?" It's whether any of that consumption made the work better. Right now, the leaderboard can't tell you that. And until it can, the gap between AI investment and AI value will keep getting papered over with automated busywork.