AlphaCode can match programming prowess of average coders

Artificial intelligence software programs are becoming astonishingly skilled at conversing, winning board games, and creating artwork — but what about writing software? In a recently published paper, Google DeepMind researchers claim that their AlphaCode program can compete with the average human coder in standardized programming contests.

The researchers report their findings in this week’s issue of the journal Science. This result represents the first time an artificial intelligence system has performed competitively in programming contests.

In simulated evaluations of recent programming competitions on the Codeforces platform, DeepMind’s code-generating system earned an average ranking in the top 54.3%, which is a very “average” average, so there’s no need to panic about Skynet just yet.

According to Yujia Li, a research scientist at DeepMind and one of the paper’s main authors, competitive programming is an extremely difficult challenge, and between where they currently stand (solving roughly 30% of problems in 10 submissions) and top programmers (solving >90% of problems in one submission), there is a significant gap. Additionally, the remaining issues are much more challenging than the ones they have already resolved.

However, the experiment suggests a fresh field for AI applications. With GitHub-available Copilot, a code-suggesting tool from Microsoft, the frontier is also being explored. CodeWhisperer, a piece of software from Amazon, is comparable.

The newly released research, according to Oren Etzioni, technical director of the AI2 Incubator and founding CEO of Seattle’s Allen Institute for Artificial Intelligence, emphasizes DeepMind’s position as a major player in the use of AI tools known as large language models, or LLMs.

Etzioni wrote in an email, This is an impressive reminder that OpenAI and Microsoft don’t have a monopoly on the impressive feats of LLMs. In fact, AlphaCode outperforms both Microsoft’s Github Copilot and GPT-3.

AlphaCode can match programming prowess of average coders 2

It can be argued that AlphaCode is notable for both how well it programs and how it programs. The system’s lack of doing something is perhaps what surprises people the most: AlphaCode doesn’t have any explicit built-in knowledge of how computer code is structured. Rather, AlphaCode uses a purely “data-driven” method of writing code, learning the structure of computer programs by merely looking at a tone of existing code, J. Zico Kolter, a computer scientist at Carnegie Mellon University, commented on the study in Science.

When a problem is described in natural language, AlphaCode uses a large language model to create code. The program makes use of a sizable data set of programming challenges and answers, as well as a collection of unstructured code from GitHub. In order to solve the given problem, AlphaCode generates thousands of potential solutions, filters them to eliminate the invalid ones, groups the viable solutions, and then chooses one example from each group to submit.

It might seem strange that this process has any chance of producing accurate code, according to Kolter.

Kolter suggested integrating AlphaCode’s method with more structured machine language techniques to boost the system’s functionality.

Let them try, he wrote, if ‘hybrid’ ML methods that combine data-driven learning with engineered knowledge can perform better on this tasks. The die was cast by AlphaCode.

Li stated to GeekWire that AlphaCode is still being improved by DeepMind. Although AlphaCode represents a significant improvement from 0% to 30%, he acknowledged that much work remained.

Etzioni concurred that the effort to develop code-generating software has plenty of headroom. He anticipates quick iterations and improvements.

The generative AI ‘big bang’ is just 10 seconds away, said Etzioni. Many more impressive products on a wider variety of data, both textual and structured, are coming soon. They are frantically attempting to determine the limits of this technology.

As the project develops, AlphaCode may ignite the ongoing discussion about the benefits and dangers of artificial intelligence, much like DeepMind’s AlphaGo program did when it showed machine-based mastery of the age-old game of Go. And programming isn’t the only domain where AI’s advancement is causing controversy:

  • The ability of an OpenAI program called ChatGPT to respond to information requests with detailed answers and documents ranging from term papers to out-of-this-world resignation letters has generated a flood of buzz in the tech sector.
  • A discussion about whether AI-based art generation programs like Lensa, DALL-E, and Stable Diffusion are unfairly exploiting the millions of archived works of art made by human hands and whether they might eliminate future markets for working, breathing artists has been sparked by these programs.
  • Recently, robots have competed against human players in strategy games that, unlike chess or checkers, rely on judgments of incomplete information about the other players. The Stratego board game is the focus of DeepMind’s DeepNash program, while the Diplomacy game is the focus of Meta’s Cicero program. Some wonder if these developments will allow AI to be used to assist real-world policymakers (or scammers).

When we asked Li if DeepMind had any doubts about the work it was producing, he thoughtfully responded:

AI has the potential to help with some of humanity’s biggest problems, but it must be developed responsibly, safely, and for everyone’s benefit. Depending on how we deploy it, how we use it, and the types of things we decide to use it for, it will either be helpful or harmful to us and society.

DeepMind develops AI with care, allowing others to critique their work and delay the release of new technology until risks and potential effects have been carefully considered. Their culture of responsible pioneering is driven by their values and focused on responsible governance, responsible research, and responsible impact

The Allen Institute for Artificial Intelligence’s Sam Skjonsberg, a principal engineer who oversees the creation of Beaker, AI2’s internal platform for AI experimentation, offered his thoughts on AlphaCode:

It is not shocking that LLMs are used in code synthesis. With initiatives like DALL-E, OpenAI Codex, Unified-IO, and of course ChatGPT, the generalizability of these large-scale models is increasingly becoming apparent.

An intriguing feature of AlphaCode is the post-processing step that filters the solution space to weed out any that crash or are blatantly incorrect. This highlights a crucial point: these models work best when they complement our abilities rather than attempt to replace them.

He is curious to know how AlphaCode stacks up against ChatGPT as a source of coding advice. AlphaCode was assessed using a competitive coding exercise, which is an objective performance indicator but says nothing about the readability of the generated code. The results that ChatGPT has produced have impressed him. Although the code is readable and simple to modify, they frequently have minor flaws and errors. This is a difficult but crucial aspect of these models that we will need to figure out how to measure.

Separately, he commend Google and the AlphaCode research team for making the paper dataset and energy needs public. ChatGPT ought to do the same. Due to the high cost of their operation and training, these LLMs already favor large organizations. In order to offset this, open publishing promotes scientific collaboration and further evaluation, both of which are crucial for progress and equity.

Source link