The newest AI model from Google can manage a robot, so forget about AI that can draw drawings.
A vision-language-action (VLA) model that can convert text and images into robotic activities was unveiled by Google on Friday as Robotics Transformer 2 (RT2).
According to Vincent Vanhoucke, Head of Robotics for Google DeepMind, RT-2 transfers knowledge from web data to guide robot behavior, just like language models do when they are trained on text from the internet to understand general ideas and concepts. RT-2 is able to speak robot, in other words.
Vanhoucke contends that while chatbots can be taught by providing them with information on a subject, they must go a step further and gain “grounding” in the real world. He uses a red apple as his illustration. A chatbot may be taught the definition of an apple by just telling it what it is, but a robot would need to know all of its details, be able to tell it apart from something similar (such a red ball), and know how to pick up an apple.
RT-2 advances the situation beyond Google’s RT-1(Opens in a new window) and other models by utilizing online data. You would need to teach a former model what trash is and how to utilize it, for example, if you wanted them to toss anything away. Even if you haven’t explicitly said what trash is or how to use it with RT-2, the robot can determine these details on its own utilizing web data.
Robots are now capable of learning and applying what they have learned to new circumstances thanks to RT-2. However, Google cautions that RT-2 can currently only assist a robot in improving at physical tasks it is already proficient at, not in learning them from beginning.
Even so, it represents a significant advancement and demonstrates what may be feasible in the future. On its DeepMind blog, Google provides additional information about how RT-2 functions.