By making it watch 70,000 hours of film of individuals playing the well-known video game, OpenAI has created the best-playing bot ever for Minecraft. It demonstrates a potent new method that may be applied to teach computers to perform a variety of jobs by binge-watching content on websites like YouTube, a massive and underutilized source of training data.
For in-game operations like felling trees and making tools, the Minecraft AI learned to execute intricate sequences of keyboard and mouse clicks. It’s the first bot that can create so-called diamond tools, which normally need good human players to click at a high rate of speed for 20 minutes or roughly 24,000 actions.
The outcome represents a milestone for imitation learning, a method that trains neural networks to accomplish tasks by observing humans perform them. AI may be taught to operate robot arms, drive automobiles, or navigate websites using imitation learning.
Online, there is plenty of videos of people performing various jobs. The researchers want to achieve for imitation learning what GPT-3 accomplished for big language models by utilizing this resource. According to Bowen Baker at OpenAI, a member of the team responsible for the new Minecraft bot, “in the last few years, they have seen the birth of this GPT-3 paradigm where they see extraordinary capabilities arise from huge models trained on enormous swathes of the internet. They are simulating what people do when they utilize the internet in a big portion of it, the researcher said.
The issue with current imitation learning methods is that each step of the video demonstrations needs to be labelled: taking this action causes this to happen, doing that action causes that to happen, and so on. Since this type of manual annotation requires a lot of labour, such datasets are frequently small. Finding a method to create a new dataset out of the millions of web films was Baker and his team’s goal.
The team’s method, known as Video Pre-Training (VPT), circumvents the imitation learning bottleneck by teaching a different neural network to identify videos automatically. To begin with, they paid crowd workers to play Minecraft while they recorded the keyboard and mouse clicks in addition to the screenshots. In order to train a model to correlate actions to onscreen results, the researchers used 2000 hours of annotated Minecraft play as input. The character, for instance, swings its axe when the mouse button is clicked in a specific circumstance.
The next stage was to train the Minecraft bot on this larger dataset by using this model to create action labels for 70,000 hours of unlabeled online video.
Having previously worked on imitation learning, Peter Stone, executive director of Sony AI America, believes that video is a training resource with a lot of potential.
In imitation learning, a neural network learns to do a task from scratch through trial and error as an alternative to reinforcement learning. Many of the most significant advances in AI over the past few years have been made using this method. It has been used to develop models that can outperform people at video games, run a fusion reactor, and find a quicker method of doing simple math operations.
The issue is that reinforcement learning functions best for activities with a well defined goal, where accidental success can result from random acts. Algorithms for reinforcement learning reward these unintended triumphs to increase the likelihood that they will occur again.
But the game Minecraft lacks a defined objective. Players are free to do as they like while exploring a virtual world, mining various resources, and fusing them to create various creations.
The open-endedness of Minecraft makes it a suitable setting for AI training. One of the researchers who worked on the Hide & Seek project, which involved letting robots loose in a virtual playground to learn how to collaborate and use tools to win simple games, was Baker. The bots, however, quickly outgrew their environment. According to Baker, The agents kind of took over the universe since there was nothing else for them to do. They wanted to broaden it, and they thought a nice area to work in would be Minecraft.
It’s not just them. New AI methods are increasingly being tested in Minecraft. One of the biggest AI conferences, NeurIPS, presented an award to MineDojo, a Minecraft environment featuring a variety of premade challenges.
The OpenAI bot was able to create planks and turn them into a table, which requires about 970 consecutive actions, using VPT, which would not have been achievable with reinforcement learning alone. However, they discovered that combining imitation learning and reinforcement learning produced the best outcomes. A VPT-trained bot that has been refined with reinforcement learning was able to complete challenges requiring more than 20,000 consecutive actions.
According to the researchers, AI might be trained to perform additional tasks using their methodology. It may initially be applied to bots that utilise a keyboard and mouse to browse websites, reserve flights, or make online grocery purchases. However, in theory, it might be used to copy first-person videos of actual people performing those jobs in order to train robots to perform physical, real-world chores. It makes sense, according to Stone.
However, according to Matthew Gudzial from the University of Alberta in Canada, who has taught AI how to play games like Super Mario Bros. using films, it is unlikely to happen any time soon. In video games like Super Mario Bros. and Minecraft, actions are carried out by pushing buttons. Physical world actions are much more intricate and challenging for machines to learn. Gudzial claims that it “unlocks a whole mess of new research concerns.
Working on multi-agent reinforcement learning at Google and the University of California, Berkeley, Natasha Jaques says, This work is another testament to the ability to scale up models and training on enormous datasets to attain good performance.
According to Jaques, large internet-sized data sets will undoubtedly open new AI capabilities. “That’s something we’ve seen time and time again, and it’s a wonderful strategy.” “Personally, I’m a little more skeptical that data can fix any problem,” she adds of OpenAI’s faith in the potential of massive data sets alone.
Even yet, Baker and his coworkers believe that amassing more than a million hours of Minecraft videos will improve their AI. According to Baker, it’s currently the best Minecraft-playing bot available. “However, with more data and more sophisticated models, I would anticipate that the experience would resemble watching a person play the game rather than a young AI attempting to imitate a person.