Self-Driving Vehicles to Securely Navigate City Streets

Humans may be one of the most significant impediments to fully autonomous vehicles operating on city streets.

To safely navigate a vehicle through downtown Boston, a robot must be able to predict what nearby drivers, cyclists, and pedestrians will do next.

However, behavior prediction is a difficult problem, and current artificial intelligence solutions are either too simplistic (they may assume pedestrians always walk in a straight line), too conservative (to avoid pedestrians, the robot simply leaves the car in park), or can only predict the next moves of one agent (roads typically carry many users at once.)

Researchers at MIT have devised a deceptively simple solution to this complex problem. They divide a multiagent behavior prediction problem into smaller pieces and tackle each one separately, allowing a computer to solve this complex task in real-time.

Their behavior-prediction framework first hypothesizes the relationships between two road users – which car, cyclist, or pedestrian has the right of way, and which agent will yield – and then uses those hypotheses to predict future trajectories for multiple agents.

When compared to real traffic flow in an enormous dataset compiled by autonomous driving company Waymo, these estimated trajectories were more accurate than those from other machine-learning models. Waymo’s recently published model was even outperformed by the MIT technique. Furthermore, because the researchers broke the problem down into smaller pieces, their technique required less memory.

This is a very intuitive idea, but it has never been fully explored, and it works quite well. The ease of use is unquestionably a plus. We are comparing our model to other cutting-edge models in the field, including one from Waymo, the industry leader in this field, and our model outperforms them on this difficult benchmark. This has a lot of future potentials, says co-lead author Xin “Cyrus.” Huang is a graduate student in the Department of Aeronautics and Astronautics, as well as a research assistant in the lab of Brian Williams, professor of aeronautics and astronautics, and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL).

Three Tsinghua University researchers collaborated on the paper, including co-lead author Qiao Sun, a research assistant; Junru Gu, a graduate student; and senior author Hang Zhao Ph.D. ’19, an assistant professor. The findings will be presented at the International Conference on Computer Vision and Pattern Recognition.

Multiple Small Models

M2I, the researchers’ machine-learning method, takes two inputs: past trajectories of cars, cyclists, and pedestrians interacting in traffic setting such as a four-way intersection, and a map with street locations, lane configurations, and so on.

A relation predictor uses this information to determine which of two agents has the right of way first, classifying one as a passer and the other as a yielder. The passing agent’s trajectory is then predicted by a prediction model known as a marginal predictor because this agent behaves independently.

A conditional predictor, a second prediction model, then guesses what the yielding agent will do based on the actions of the passing agent. The system predicts a variety of yielder and passer trajectories, computes the probability of each one individually, and then chooses the six joint results with the highest likelihood of occurring.

M2I predicts how these agents will move through traffic over the next eight seconds. In one case, their method caused a vehicle to slow down to allow a pedestrian to cross the street, then accelerate when they cleared the intersection. In another instance, the vehicle waited for several cars to pass before turning from a side street onto a busy main road.

While this preliminary study focuses on interactions between two agents, M2I could infer relationships among many agents and then predict their trajectories by linking multiple marginal and conditional predictors.

Real-world Driving Tests

The models were trained using the Waymo Open Motion Dataset, which contains millions of real-world traffic scenes involving vehicles, pedestrians, and cyclists captured by lidar (light detection and ranging) sensors and cameras mounted on the company’s self-driving vehicles. They concentrated on cases involving multiple agents.

To determine accuracy, they compared the six prediction samples from each method, weighted by their confidence levels, to the actual trajectories taken by the cars, cyclists, and pedestrians in a scene. Their method proved to be the most accurate. It also outperformed the baseline models on an overlap rate metric; when two trajectories overlap, it indicates a collision. M2I had the lowest rate of overlap.

Rather than simply building a more complex model to solve this problem, we took a more human-like approach by reasoning about interactions with others. A human does not consider all of the hundreds of possible future behaviors. We make quick decisions Huang explains.

Another advantage of M2I is that by breaking the problem down into smaller pieces, a user can better understand the model’s decision-making. According to Huang, this could help users put more trust in self-driving cars in the long run.

However, the framework does not account for situations in which two agents are mutually influencing each other, such as when two vehicles each nudge forward at a four-way stop because the drivers are unsure who should yield.

They intend to address this constraint in future work. They also want to use their method to simulate realistic interactions between road users, which could be used to test self-driving car planning algorithms or to generate massive amounts of synthetic driving data to enhance the performance of the model.

The Qualcomm Innovation Fellowship is helping to fund some of this research. Toyota Research Institute also contributed funds to this project.

Source link