Technology giant Google is pushing local languages in India, as more non-English speakers come online in the country.
“India has added over 100 million new internet users from rural India in the last two years. Every new user coming online is an Indian language user, and we are committed to play a part,” said Sanjay Gupta, country head and VP, Google India.“Today, we are calling up the industry to take a Bharat-first approach and build an internet that works for every Indian.”
On Thursday, Google has unveiled a machine learning tool for Indian languages to help researchers, students, and startups keen on building local language technologies with a common framework across several languages in the country.
Called Multilingual Representations for Indian Languages (MuRIL), the model aims to address concerns around Indian language understanding of computer systems, including all of its complexities like transliteration, spelling variations, mixed languages and other specific use cases that emerge in the Indian context. It also supports transliterated text such as writing Hindi using Roman script.
MuRIL was developed at Google’s India research unit and currently supports 16 local languages and English, which the company says is the highest coverage for Indian languages among other publicly available machine learning models of its kind.
This model has been trained using the company’s own language learning model BERT (Bidirectional Encoder Representations from Transformers), that is currently used to parse almost all English queries on its search engine.
“MuRIL is a starting point of what we believe can be the next big evolution for Indian language understanding. We hope it will prove to be a better foundation for researchers, startups, students, and anyone else interested in building Indian language technologies” said Partha Talukdar, Research Scientist, Google Research India.
Talukdar said this model will reduce the amount of time required for researchers and startups to train machine learning models, by acting as a common foundation to transfer the knowledge and learning from one language model to another.
Google has made MuRIL free and open-source, available for download and use from its machine learning platform TensorFlow. Talukdar said that they have trained the model entirely on publicly available data corpus in order to make it easier for researchers to reproduce the results.
“On academic data sets that we have evaluated MuRIL on, we have seen that it significantly outperforms the earlier model by 10% on native texts and by about 27% on the transliterated text,” Talukdar said.
Google also debuted a range of new Indian language features across its various products during an event on Thursday. This includes the ability to toggle their search results between English and four Indian languages including Tamil, Telugu, Bangla and Marathi; surfacing relevant local language content for bilingual users; and adding support for nine Indian languages in Google Maps.
A new feature called Homework Help, also allows students to learn how to solve complex math problems like quadratic equations using Google Lens. Students can take a photo of the problem using Lens from the Search bar in the Google app and browse through step-by-step guides and videos that explain the solution. This is reminiscent of a similar offering from Tencent and Sequoia-backed edtech startup Doubtnut, although it offers solutions from a broader set of subjects including Physics, Chemistry, and Biology.
These developments come at a time when Indian language users are expected to drive future growth and account for a majority of the country’s Internet base in the coming years.
In July, Google earmarked $10 billion to invest in India over the next five to seven years, with a focus on areas crucial to the country’s digitisation efforts.
Gupta said it has formulated a three-point strategy to drive these efforts.
Google plans to invest in machine learning and artificial intelligence efforts to become better at language understanding at its research centre in India and make its models accessible to everyone across the ecosystem.
“We also plan to invest and deeply partner with local startups who are building solutions to cater to the needs of Indians in local languages as well as improve the experience of our own products for Indian language users,” he said.
This article has been published from a wire agency feed without modifications to the text. Only the headline has been changed.