Even though Google helped build the underlying technology, the unexpected surge in interest in generative AI took them off guard. The business decided to redirect its significant efforts to catch up to OpenAI as a result. Since then, multiple iterations of the multimodal Gemini models and the detail-flubbing Bard have been developed. The new 2.5 Pro (Experimental) release may help Gemini overcome its struggles to improve user experience and benchmarks. This may be the first Google model to challenge ChatGPT’s hegemony, given the significant improvements in benchmarks and vibes.
Tulsee Doshi, director of product management for Gemini at Google, recently spoke with about the upcoming release of Gemini 2.5 and the direction Google’s AI models are taking.
The development of generative AI products at Google may have started slowly, but in recent months, the Gemini team has accelerated the pace. In December, the business published Gemini 2.0, which was a slight upgrade over the 1.5 branch. In just three months, Gemini 2.0 Pro reached 2.5, indicating that it had not yet left the experimental stage. This, according to Doshi, was the outcome of Google’s sustained investments in Gemini.
“A lot of the pieces and the fundamentals they have been building are now coming together in really awesome ways,” Doshi said, adding that this is a major factor. They therefore believe they can quicken the tempo here.
In order to release a new model, many candidates must be tested. Google inspects the models using a multi-layered methodology, beginning with benchmarks, according to Doshi. They have a set of evaluations, including internal evaluations they developed for use cases they care about and external academic benchmarks, she said.
The team also uses these tests to concentrate on safety, which is still a key component of how Google develops Gemini, as Google emphasizes at every chance. Doshi pointed out that a lot of hands-on work and adversarial testing are necessary to ensure a model is secure and prepared for general distribution.
The vibes, however, are an increasingly significant component of AI models, and we cannot overlook them. There is a lot of emphasis on the outputs’ vibe—how interesting and practical they are. Vibe coding is another new trend that involves building things with AI suggestions rather than inputting the code yourself. These ideas are related to each other for the Gemini team. Whether it’s code or simply a response to a query, the team uses user and product input to determine the “vibes” of the final product.
The fact that Gemini 2.5 is at the top of the LM Arena leaderboard has been mentioned by Google a few times, indicating that users of the model much favor the output—it has positive vibes. After a difficult climb, that’s undoubtedly a fantastic place for Gemini to be. However, there is some worry in the area that if vibes are given too much weight, we may be pushed toward models that make us feel good regardless of whether the results are excellent—a trait known as sycophancy.
They’re not letting it show if the Gemini team is worried about feel-good models. In her remarks, Doshi highlighted the team’s emphasis on code generation, pointing out that it may be improved for “delightful experiences” without bragging to the user. According to Doshi, she views vibe less as a particular kind of personality attribute that they are attempting to create.
Concerns about hallucinations are also raised by generative AI models. Although Gemini and Bard have made up a lot of awkward situations for Google, the Gemini team thinks they’re headed in the right direction. It appears that the team’s factuality metrics have reached a high point with Gemini 2.5. Will the AI ever be sufficiently free of hallucinations for humans to have complete faith in it? Not a word on that front.
Don’t overthink it
When compared to other models that use simulated reasoning, Gemini 2.5 is incredibly fast, which is possibly the most intriguing thing you’ll notice. Going forward, Google claims to be incorporating this “thinking” power into all of its models, which should result in better results. The quality of these tools significantly improved in 2024 as reasoning in large language models expanded. Additionally, they became significantly more costly to operate, which exacerbated generative AI’s already significant issue.
An LLM’s operational costs increase with its size and complexity. To find out technical details like the number of parameters in Google’s more recent models, you’ll need to go back to the 1.5 branch. Gemini 2.5 is “comparable” in size to 2.0, Doshi clarified, meaning it is not significantly bigger than Google’s previous version.
The chain of thought is one important area where Gemini 2.5 excels. It’s the first model made available to the public by Google that supports Dynamic Thinking, a feature that lets the model adjust how much logic goes into an output. However, this is only the first step.
Doshi stated that she believes the 2.5 Pro model they now ship still overthinks for basic cues in a way that they hope to keep improving. Dynamic Thinking is a major area they are investing in in order to get to our [general availability] version of 2.5 Pro, which thinks even less for basic requests.
Although Google doesn’t disclose its new AI endeavors, we can presume that there isn’t any potential for profit. So far, no one has been able to make these massive LLMs a sustainable business. Despite consumers purchasing its $200 Pro plan, OpenAI, the company with the most users on ChatGPT, nevertheless loses money. Utilizing this highly costly hardware will be essential since Google plans to invest $75 billion in AI infrastructure by 2025. It might be quite beneficial to create models that avoid wasting cycles on overthinking “Hi, how are you?” questions.
Missing technical details
However, the 2.5 Pro release has provided more information than ever before about Google’s future aspirations, even though the firm keeps Gemini under wraps. But we’ll need to view the technical report in order to fully comprehend this model. This type of document was last published by Google for Gemini 1.5. We have yet to see the 2.0 version, and because 2.5 has replaced 2.0, we might never see that document.
2.5 Pro is still an experimental model, according to Doshi. Therefore, don’t anticipate receiving comprehensive evaluation results immediately. A Google representative explained that there is no set timeframe for the entire technical review report on the 2.5 branch, but it is planned. Updated model cards for Gemini 2.0, let alone 2.5, have not even been made available by Google. Brief one-page summary of a model’s training, intended usage, evaluation results, and other information are included in these documents. In essence, they are LLM nutrition labels. Though far less thorough than a technical report, it’s still preferable to nothing. Model cards for the Gemini 2.0 and 2.5 are on the way, according to Google.
Considering how quickly releases have been happening lately, Gemini 2.5 Pro might be making its way out more broadly in May around Google I/O. We sincerely hope Google would provide additional information when the 2.5 branch opens. While Gemini development is gaining momentum, transparency shouldn’t be neglected.






