Developers Can’t Work Without AI Anymore. That Might Be the Problem.

May 30, 2026

Developers have become so dependent on AI coding tools that a leading AI research lab could not run a controlled experiment without them — and the productivity gains they believe they are getting may be largely illusory, according to a convergence of recent research and corporate disclosures.

In February 2026, METR — the machine-learning evaluation and research lab — attempted to update a landmark 2025 study measuring how much time open-source developers took to complete tasks with and without AI assistance. The update never happened. Developers refused to participate “because they do not wish to work without AI,” the researchers acknowledged, effectively making a controlled comparison impossible.

The assumption: AI makes developers more productive. The complication: the same researchers who proved AI slows some developers down can no longer find developers willing to prove it again — because they won’t turn it off.

What the Data Actually Shows

METR’s original 2025 study produced a result that surprised even its authors: developers who believed AI was accelerating their work were, in measurable terms, slower. The speed at which AI generated code was more than offset by the time spent steering the model, waiting on completions, and — critically — hunting down and correcting errors. The net effect was negative.

Unable to replicate those conditions in 2026, METR pivoted to a self-reported survey published in May. Technical employees perceived that AI made them roughly twice as valuable to their organizations. Self-reported productivity surveys, however, are among the weakest instruments in empirical research; perception and measurement routinely diverge, particularly when workers are enthusiastic adopters of the technology being evaluated.

Corporate budget data tells a more sobering story. Uber exhausted its entire 2026 AI budget within the first four months of the year, according to reporting by The Information. COO Andrew Macdonald said on a recent podcast that the expenditure had not produced a measurable increase in completed projects or overall productivity. Separately, Amazon shut down an internal token-tracking leaderboard called Kirorank after employees gamed it by running AI agents excessively and driving up costs, the Financial Times reported. Both cases illustrate the same dynamic: AI use does not automatically translate into output.

That dynamic has acquired a name in 2026: tokenmaxxing — using token consumption as a proxy for productivity. The practice, according to the same reporting, may already be in retreat as finance teams scrutinize the bills.

Code quality metrics compound the budget concern. CodeRabbit, which makes an AI-powered code-review tool, analyzed open-source pull requests and found that AI-generated code introduced 1.7 times more problems than human-written code. Entelligence AI founder and CEO Aiswarya Sankar has claimed that companies are spending 44 percent of their tokens on fixing bugs that AI itself generated. Both statistics come from vendors with a commercial interest in AI code review — a limitation the source material explicitly flags — but they are directionally consistent with independent academic findings.

Researchers from Singapore Management University published a report in April 2026 warning that “AI-generated code can introduce long-term maintenance costs into real software projects.” The SMU paper adds institutional weight to arguments that had previously circulated mainly in blog posts and developer forums. One such post, by programmer and author James Shore, went viral on Hacker News. “You write code twice as quick now?” Shore wrote. “Better hope you’ve halved your maintenance costs. Otherwise, you’re screwed. You’re trading a temporary speed boost for permanent indenture.”

Taken together, METR’s failed replication attempt and the corporate budget overruns point to a structural shift that the productivity debate has largely missed: the question is no longer whether AI speeds up individual code generation — it clearly does in narrow, task-level benchmarks — but whether the downstream maintenance burden accrues faster than the upstream speed gain. If developers are generating code at 2× the rate while maintenance costs grow at 1.7× or more, the net effect on engineering capacity could be flat or negative even as token spend climbs. This is a systems-level accounting problem, not a tool-evaluation problem, and it is unlikely to be solved by adding another AI layer on top.

The pattern echoes earlier debates in how AI is reshaping hiring and engineering workflows — adoption outpaces measurement, and organizations discover the real costs only after budgets have been committed.

The Strongest Counterargument

The most credible objection to this framing comes from within the AI coding industry itself, most explicitly from Cognition founder and CEO Scott Wu, whose company makes Devin, an autonomous AI coding agent. Wu’s position — shared by others in the agent space — is that the maintenance burden created by AI code generation is itself automatable: AI coding agents can fix AI-generated bugs as fast as they are produced, effectively closing the loop without additional human labor.

It is a coherent argument, but Wu himself immediately qualifies it. He rates Devin’s current capability at somewhere between a junior and a mid-level programmer, depending on the task. Delegating code review and maintenance to a system operating at junior-developer proficiency does not obviously reduce risk; it may redistribute it in ways that are harder to observe. The SMU researchers and Wu agree on one point: humans should retain ownership of high-level decisions — software architecture, security design, and system-level reasoning — because these are precisely the areas where current AI models perform least reliably. That consensus limits how much of the maintenance loop can be safely automated today, regardless of how the tooling improves.

Research into how machine learning models can introduce privacy and security vulnerabilities reinforces the case for keeping humans in the architectural loop, particularly when AI-generated code touches sensitive data paths.

The SMU team’s practical guidance points in the same direction: developers need to understand, at a granular level, which tasks AI handles reliably and which it does not — analogous to knowing a programming language’s edge cases. They also need quality-assurance pipelines explicitly designed for AI output, and they should review AI-generated code with the same scrutiny applied to a junior developer’s pull request. None of that is a rejection of AI tooling; it is a framework for using it without accumulating hidden technical debt.

The broader pattern — where enthusiasm for AI adoption precedes rigorous measurement — has been documented across domains, from NASA’s use of machine learning for wildfire prediction to academic efforts at applying deep learning to scientific classification problems. In each case, performance gains in narrow benchmarks coexist with unresolved questions about reliability at scale.

Where This Ends Up

The most probable near-term outcome is that enterprise engineering teams begin treating AI-generated code the way regulated industries treat third-party dependencies: with mandatory review gates, automated quality checks, and explicit accounting of maintenance liability before code is merged. Tokenmaxxing will fade as a performance metric as CFOs demand output-based justification for AI infrastructure spend, and the tooling market will consolidate around products that can demonstrate measurable defect-rate improvements rather than raw generation speed.

The second-most-likely scenario is that agentic systems — AI agents that autonomously write, test, and fix code — mature fast enough to genuinely close the maintenance gap before the debt becomes unmanageable. That outcome depends on whether models at the frontier can move from junior-developer-equivalent capability to reliable mid-senior-level reasoning on open-ended engineering tasks within the next 12 to 18 months. If the current generation of AI agents is indeed a practice run for more capable successors, that timeline is plausible — but it has not yet been demonstrated in production at scale, and the organizations running up budget overruns today are not waiting for it.

Developers Can’t Work Without AI Anymore. That Might Be the Problem.

What the Data Actually Shows

The Strongest Counterargument

Where This Ends Up

Related

Most Popular

AI Is Changing Jobs So Fast That Hiring Can’t Keep Up

Elizabeth Warren Warns AI Could ‘Break Society,’ Demands Automation Tax

How String Theory Is Cracking the Code of Natural Networks

Vance Is Right That AI Shouldn’t Outrank Humans in War — But That’s Not Enough

Mark Cuban Says OpenAI Will Never Recoup Its Massive AI Spending

Anthropic’s $965 Billion Valuation Reshapes the Frontier AI Race

Follow Us

POPULAR POSTS

Michael Burry Warns on Nvidia Stock and AI ‘Tokenmaxxing’ Hype

Tech Giants Fueling a New AI IPO Wave — But the Real Race Is Structural

Anthropic Thinks Decades of Evil-Robot Stories Are Warping AI Behavior

The AI Layoff Trap: Why Cutting Engineers Before AI Was Ready Is Backfiring

POPULAR CATEGORY

AI Is Changing Jobs So Fast That Hiring Can’t Keep Up

Developers Can’t Work Without AI Anymore. That Might Be the Problem.

What the Data Actually Shows

The Strongest Counterargument

Where This Ends Up

Related

RELATED ARTICLES

Most Popular

Follow Us

POPULAR POSTS

POPULAR CATEGORY