An AI Coding Contest has released Surprising Results

July 26, 2025

The first winner of a new AI coding contest has been announced, raising the standard for software developers using AI.

The nonprofit Laude Institute revealed the inaugural winner of the K Prize, a multi-round AI coding competition started by Andy Konwinski, co-founder of Databricks and Perplexity, on Wednesday at 5 p.m. PST. A Brazilian prompt engineer named Eduardo Rocha de Andrade emerged victorious and will get $50,000 as compensation. His final score was more unexpected than the victory, though, as he only answered 7.5% of the test’s questions correctly.

“They’re happy that they created a really challenging benchmark,” Konwinski added. If benchmarks are to be significant, they must be challenging. $1 million has been promised by Konwinski to the first open-source model to pass the test with a score of 90% or above.

The K Prize evaluates models against GitHub issues that have been highlighted as an indicator of how well they can handle actual programming challenges, much as the popular SWE-Bench method. The K Prize is intended to be a “contamination-free version of SWE-Bench,” employing a timed entry method to prevent any benchmark-specific training, in contrast to SWE-Bench, which is based on a predetermined set of tasks that models can train against. Models for round one were to be submitted by March 12th. Using only GitHub bugs reported after that date, the K Prize organizers proceeded to construct the test.

The 7.5% highest score stands in stark contrast to SWE-Bench, which presently has a 75% top score on its simpler ‘Verified’ exam and 34% on its more difficult ‘Full’ test. Konwinski is still unsure if the gap is due to contamination on SWE-Bench or just the difficulty of gathering new issues from GitHub, but he expects the K Prize project to respond shortly.

“As we get more runs of the thing, we’ll have a better sense,” he told the tech publication, “because we expect people to adapt to the dynamics of competing on this every few months.”

It may appear to be an unusual area to fall short, considering the huge range of AI coding tools already available to the public – but with benchmarks becoming too simple, many critics regard efforts like the K Prize as an essential step toward fixing AI’s increasing assessment problem.

A similar approach was recently proposed by Princeton researcher Sayash Kapoor, who states, “I’m quite bullish about building new tests for existing benchmarks.” “We can’t really tell if the problem is contamination or even just aiming at the SWE-Bench leaderboard with a human in the loop without such experiments.”

Konwinski sees it as an open challenge to the rest of the business as well as a better standard. According to him, “if you follow the hype, it’s like we should be seeing AI software engineers, AI doctors, and AI lawyers, and that’s just not true.” “It’s a reality check for me if we can’t even achieve more than 10% on a contamination-free SWE-Bench.”

Source link

An AI Coding Contest has released Surprising Results

Related

Most Popular

Why AI’s Richest Builders Are Starting to Fear What They’ve Built

Trump Signs Executive Order Requiring Pre-Launch Access to New AI Models

Andrew Yang Is Right: AI Job Displacement Has Reached a Tipping Point

Mathematicians Sound a Formal Alarm Over AI’s Encroachment on Their Field

Bitcoin Slides to $67,000 as Strategy Sells and ETF Flows Dry Up

How AI Is Crushing the Generation of Startups Built Before ChatGPT

Follow Us

POPULAR POSTS

AI Chatbots Don’t Just Spread Misinformation — They Reinforce It

Trump Signs Executive Order Requiring Pre-Launch Access to New AI Models

Developers Can’t Work Without AI Anymore. That Might Be the Problem.

How AI Found Hidden Physics Inside Fusion Plasma That Humans Missed

POPULAR CATEGORY

Why AI’s Richest Builders Are Starting to Fear What They’ve Built

An AI Coding Contest has released Surprising Results

Related

RELATED ARTICLES

Most Popular

Follow Us

POPULAR POSTS

POPULAR CATEGORY