AI is not a good team player

People perceive AI as a frustrating team member when they play a cooperative game together, which challenges “team intelligence”, shows a study.

When it comes to games like chess or go, artificial intelligence (AI) programs have far surpassed the best players in the world. These “superhuman” AIs are unmatched competitors, but perhaps more difficult than competing with humans is working with them. Can the same technology get along with people?

In a new study, researchers at MIT’s Lincoln Laboratory tried to find out how well people can play the cooperative card game Hanabi using an advanced AI model trained to play with teammates who had never met it before. Game series: one with the AI ​​agent as a teammate and the other with a rules-based agent, a bot that is manually programmed to play in a predefined way.

The results surprised the researchers. Not only were the results with the AI ​​teammate no better than with the rules-based agent, but people also constantly hated playing with their AI teammate. They found it unpredictable, unreliable, and unreliable, and made them feel negative even when the team was getting good results. A paper detailing this study was adopted at the Conference on Neural Information Processing Systems (NeurIPS) 2021.

It really highlights the nuanced difference between creating AI that works objectively well and creating AI that is subjectively reliable or preferred, ”says Ross Allen, co-author of the paper and researcher at Artificial Intelligence Technology Group . “It may seem like these things are so close that there is really no daylight between them, but this study has shown that these are, in fact, two different problems. We have to work to unravel them.

People who hate their AI teammates could be of concern to researchers developing this technology, someday working with humans on real challenges, like defending against missiles or performing complex operations, uses a specific type of AI called reinforcement learning.

A reinforcement learning AI is not told what action to take, but instead finds out through repeated testing of scenarios which actions produce the largest “payoff” in terms of numbers. It is this technology that has spawned superhuman chess and go players. Unlike rule based algorithms, these AIs are not programmed to follow “if / then” instructions because the possible outcomes of the human tasks they must accomplish, such as driving a car, are too many to encode.

Reinforcement learning is a much more general way of developing AI. If you can teach him to learn chess, that agent isn’t necessarily going to drive a car. But you can use the same algorithms to train another player. Agents to drive when you have the right data, says Allen. The sky is the limit of what it could theoretically do.

Bad hints, bad plays

Today researchers use Hanabi to test the performance of reinforcement learning models designed for collaboration, just as chess has served as a benchmark for testing competitive AI for decades.

Hanabi’s game is similar to a multiplayer form of solitaire. Players work together to stack cards of the same suit in turn. However, players cannot see their own cards, only the cards of their teammates. Each player is strictly limited in what you can communicate with your teammates to let them choose the best card from your own hand to stack next.

Lincoln Laboratory researchers did not develop the AI ​​or rule-based agents used in this experiment. Both agents represent the best in their field for hanabi performance.In fact, when the AI ​​model was previously paired with an AI teammate it had never played with before, the team scored the highest score for Hanabi’s game between two unknown AI agents.

That was an important result, says Allen. “We thought if these AIs, who had never met before, could get together and play really well, we should be able to bring people in who were also very good at playing alongside the AI, and they would be very good too to play. So we think the AI ​​team would objectively play better, and we also believe that people would prefer it because, overall, we’ll like something better if we get it right.

None of these expectations came true. There was no statistical difference in the results between the AI ​​and the rule-based agent. Subjectively, all 29 participants in the surveys stated a clear preference over the rule-based teammate. They played for which games.

One participant said he was so stressed by the AI ​​agent’s poor move that he actually had a headache, said Jaime Peña, researcher at AI Systems and Technology Group and author of the paper. Another said they thought the rules-based agent was stupid but doable, while the AI ​​agent showed he understood the rules, but his movements didn’t match what a team looked like. For them he gave bad advice, played bad games.

Inhuman creativity

This perception of AI making “bad plays” links to surprising behavior researchers have observed previously in reinforcement learning work. For example, in 2016, when DeepMind’s AlphaGo first defeated one of the world’s best Go players, one of the most widely praised moves made by AlphaGo was move 37 in game 2, a move so unusual that human commentators thought it was a mistake. Later analysis revealed that the move was actually extremely well-calculated, and was described as “genius.”

Such moves can be praised when performed by an AI opponent, but they are less likely to occur in a team setting. Researchers at Lincoln Laboratory found that bizarre or seemingly illogical movements were the worst culprits for breaking human trust, reducing not only how well players perceived how well they and their AI teammate were working together, but also how much they were using AI wanted to work, especially when potential benefits weren’t apparent from now on.

There was a lot of commentary about giving up, comments like ‘I hate working with this thing,’” adds Hosea Siu, also an author of the paper and a researcher in the Control and Autonomous Systems Engineering Group.

participants who rated themselves as Hanabi experts, which the majority of the participants in this study did, left the AI ​​player more often, which is a concern of Siu AI developers, as the users who are critical to this technology are likely to be experts on it this area will be.

Suppose you train a super-intelligent AI supervisor for a missile defense scenario. You don’t hand it over to a trainee, but to your experts on their ships who have been doing this for 25 years – there is a strong bias among experts against it in game scenarios which will likely show up in real life, he adds.

Squishy humans

The researchers note that the AI ​​used in this study wasn’t designed around human preference, but that’s part of the problem – not many are. Like most collaborative AI models, this model is designed for the highest possible score and success. It was rated for its objective performance.

If researchers don’t focus on the question of subjective human preferences, “then we’re not going to develop AI that people really want to use, says Allen. “It’s easier to work on an AI that improves on a very clean number. It’s much more difficult to work on an AI that works in this ever-changing world of human preferences.

The solution to this more difficult problem is the goal of the MeRLin project (Mission-Ready Reinforcement Learning), as part of which this experiment at the Lincoln Laboratory Office of Technology in collaboration with the USAir Force Artificial Intelligence Accelerator and the Department of Electrical Engineering and Computer science was funded by MIT. The project examines what has prevented collaborative artificial intelligence from leaping out of the game and becoming a messier reality.

Researchers believe that AI’s ability to explain its actions creates trust. This will be the focus of your work for the next year.

You can imagine us doing the experiment again, but in hindsight, and much easier said than done, the person might ask, “Why did you take this step? I didn’t get it.” If the AI ​​could give some insight into what it thought would happen based on its actions, our hypothesis is that people would say, “Oh, weird way of thinking about it, but I get it now,” and they would trust. Our results would totally change, even though we haven’t changed the underlying AI decision-making, ”says Allen.

As a group after a game, this type of sharing is often what helps people build camaraderie and collaboration as a team.

Perhaps it’s also a personal bias. Most AI teams don’t have people who want to work on these soft people and their soft problems, Siu adds with a laugh. And that’s the foundation, but that’s not enough.

Mastering a game like Hanabi between AI and humans could open up a universe of possibilities for team intelligence in the future, but until researchers bridge the gap between the performance of an AI and the how humans like it, technology can do it. Well, we remain machine versus human.

Source link