Already accused of providing false information regarding its performance, Google has recently unveiled Gemini, its most potent suite of AI models to date.
Bloomberg claims in a recent opinion piece that Google erroneously portrayed the strength of Gemini. Ahead of this week’s announcement, Google displayed an impressive hands-on “what the quack” video; columnist Parmy Olson says the device appeared remarkably capable in the video, perhaps too capable.
The six-minute video demonstrates the multimodal capabilities of Gemini, including the integration of image recognition with spoken conversational prompts. Even for connect-the-dots images, Gemini appears to recognise them instantly; it responds in seconds and can track a wad of paper in a cup-and-ball game in real time. Although humans are capable of performing each of those tasks, this AI is capable of identifying and forecasting future events.
However, upon viewing the video description on YouTube, Google presents a significant disclaimer:
Latency has been decreased and Gemini outputs have been condensed for the sake of this demonstration.
That’s what Olson finds offensive. In response to a request for comment, Google reportedly acknowledged that the video demo was not conducted in real time with spoken prompts, but rather involved the use of still image frames from raw footage and written text prompts that Gemini had to answer. Olson notes that this is very different from what Google appeared to be implying, which was that a person could have a natural voice conversation with Gemini while it observed and responded to its surroundings in real time.
Companies edit demo videos frequently, especially because many want to avoid any technical hiccups that live demos bring. To be fair to Google. It is typical to make minor adjustments. However, Google has a track record of using dubious video demos. There was a noticeable lack of background noise and overly helpful staff, leading some to question whether Google’s Duplex demo—you remember, the AI voice assistant that called restaurants and hair salons to make reservations—was real. Additionally, recorded videos of AI models tend to raise people’s suspicions even further. Do you recall when Baidu’s shares crashed after releasing its Ernie Bot with edited videos?
Olson claims that Google is “showboating” in this kind of scenario to divert attention from the reality that Gemini still lags behind OpenAI’s GPT.
Google is not convinced. In response to a question concerning the demo’s validity, it directed The Verge to a post by Oriol Vinyals, Google’s vice president of research and deep learning lead (who also serves as co-lead for Gemini), detailing the process of creating the video.
According to Vinyals, every user prompt and output in the video is authentic, albeit condensed for clarity. The multimode user experiences that could be created with Gemini are shown in the video. It was created to motivate developers.
He continued by saying that the group sent texts and images to Gemini and asked it to predict what would happen next.
That’s undoubtedly one way to handle the matter, but it might not be the best for Google, which has already given the impression that it was taken by surprise by OpenAI’s tremendous success this year, at least in the eyes of the public. It doesn’t aim to inspire developers with painstakingly crafted sizzle reels that may be misleading in their portrayal of the AI’s potential. It’s done by giving developers and journalists hands-on experience with the product. Allow users to play around with Gemini in a limited public beta. Prove its actual power to us.