The CEO of Google DeepMind demonstrates Genie 2

Scott Pelley, a correspondent for 60 Minutes, covered a significant advancement in artificial intelligence technology this week, led by Google DeepMind, the company’s AI research center.

In the streets of London, close to DeepMind’s offices in the United Kingdom, Pelley used cameras and microphones to test drive Astra, Google DeepMind’s AI assistant that can see and hear.

What can you tell me about the building I’m looking at? He inquired, wearing a pair of dark glasses fitted with Astra, microphones, and a camera. “This is the Coal Drops Yard, a shopping and dining district,” the AI representative explained. Pelley held a smartphone in a gallery filled with 60 Minutes-curated artwork and asked Astra what painting he was standing in front of.

The painting was identified by the AI agent as “Automat” by Edward Hopper. Pelley questioned Astra about the feelings conveyed by the woman seated by herself in a restaurant, the subject of the picture. According to Astra, she “appears pensive and contemplative,” and her countenance conveys “a sense of solitude.”

And with a little push, Astra could do even more: it could tell a tale about the painting. “It is a chilly evening in the city. A Tuesday, perhaps? “The woman, possibly named Eleanor, sits alone in the diner, enjoying a warm cup of coffee,” Astra stated.

“She has found herself thinking about the future, wondering if she should pursue her dreams.” Pelley questioned Google DeepMind CEO and co-founder Demis Hassabis if he had ever seen an AI agent do something unexpected. “That has happened many times…since the beginning of DeepMind,” Pelley was telling him.

[With] modern systems such as Astra The ability to comprehend the physical environment was not something we expected it to be proficient at so rapidly.

60 Minutes gained additional knowledge on generative AI developments that create images, videos, and even 3D interactive worlds while covering this topic. Pelley and a 60 Minutes team witnessed a demonstration of an AI model two years ago that could create brief videos with basic text commands.

After entering a text prompt to create a “golden retriever with wings,” a number of pictures of a golden-haired puppy with wings strolling across grass emerged on the screen; nevertheless, the image was somewhat deformed and hazy.

The technology has advanced remarkably in the past two years.

Tom Hume, the director of product development, demonstrated Veo 2, an AI model that generates videos, to 60 Minutes associate producer Katie Brennan.

Even more text was added to a similar request, and the result was a photorealistic film of a golden retriever puppy with wings racing across a field of flowers and grass. As it rushed, its wings flapped like birds, letting the sunlight shine through. Sharp and detailed, it appeared to have been captured on a movie camera during a live-action scene.

Genie 2, an AI model, was presented to Pelley by Hassabis and Jack Parker-Holder, a research scientist at DeepMind. From a single static photograph, Genie 2 can generate a 3D world that an AI agent or human player can explore.

Parker-Holder gestured to an employee’s photo on a screen, which showed the horizon from the top of a California waterfall. Genie transforms this image, which isn’t game-like, into a game-like environment that you can subsequently interact with, he explained. At the top of the waterfall in the picture, a film that appeared to be a first-person video game suddenly began to play.

The avatar strolled around the waterfall’s top pool, where water droplets were splattering into the air. They went right and saw a scenery that had not been captured in the initial shot. Another instance was a paper airplane flying over the Western landscape. As the plane shot onward, more features appeared.

“Every subsequent frame is generated by the AI,” Parker-Holder clarified.

Hassabis as well as ParkerHolder informed Pelley that job-performing AI “agents” can likewise be trained in these simulated 3D environments.

A picture of a knight with a torch appeared on the screen, standing in front of three doorways. There are stairs leading up from the doorway on the right. They requested one of their “most capable AI agents” to go up the stairs, Parker-Holder explained.

With fresh walls emerging around him and blue light streaming over the stairway, the AI-controlled knight ascended the stairs. “The Genie world model is creating the world around it on the fly and sort of imagining what’s up there,” Parker-Holder stated.

Pelley questioned Hassabis about the technology’s potential applications. According to Hassabis, there are numerous ramifications for entertainment, including the creation of games and videos.

The larger objective, however, is to create a world model that can comprehend our reality. Future iterations of this technology, according to Hassabis, might produce an endless number of simulated worlds in which AI agents may perform activities, pick up new abilities, and engage with objects and people. According to Hassabis, robots could benefit from this training as well.

Data collection in the real world is significantly more difficult, costly, and time-consuming. “For instance, robotics data,” Hassabis said.

In the real world, that is something you can only gather in small quantities. On the other hand, you can gather nearly infinite amounts in virtual environments. As a result, you would initially learn using the robot in virtual environments. At the end, you would adjust it based on a small amount of actual data.

Pelley was curious about the possibility of using Google’s vast collection of geographic data—gathered for Google Earth, Maps, and Street View—to train artificial intelligence.

Actually, that’s what they’re investigating right now. According to Hassabis, they may use data from Street View to help their AI systems grasp the real world and geography.

However, you can envision things like bringing to life still shots of genuine locations, whether they are from Street View or your own vacation pictures.[and] making them 3D and interactive so you can explore.

Source link