Brain signals converted directly into speech
In a scientific first, neuroengineers from the Zuckerman Institute at Columbia University have developed a system that translates thought into intelligible, recognizable speech.
By monitoring someone’s brain activity, the technology can reconstruct the words a person hears with unprecedented clarity. This breakthrough, which combines the power of speech synthesizers and artificial intelligence, could lead to new ways for computers to communicate directly with the brain. It also lays the groundwork for helping people who cannot speak – such as those living with amyotrophic lateral sclerosis (ALS) or recovering from a stroke – regain their ability to communicate with the outside world.
“Our voices help connect us to our friends, family and the world around us, which is why losing the power of one’s voice due to injury or disease is so devastating,” said Nima Mesgarani, Ph.D., senior author and a principal investigator at the Zuckerman Mind Brain Behaviour Institute. “With today’s study, we have a potential way to restore that power. We’ve shown that, with the right technology, these people’s thoughts could be decoded and understood by any listener.”
Decades of research has shown that when people speak – or even imagine speaking – tell-tale patterns of activity appear in their brain. Distinct signal patterns also emerge when we listen to someone speak, or imagine listening. Experts, trying to record and decode these patterns, see a future in which thoughts need not remain hidden inside the brain, but could instead be translated into verbal speech at will.
However, accomplishing this feat has proven challenging. Early efforts to decode brain signals focused on simple computer models that analyzed spectrograms, which are visual representations of sound frequencies. Because this approach has failed to produce anything resembling intelligible speech, Dr. Mesgarani’s team turned instead to a vocoder, a computer algorithm that can synthesize speech after being trained on recordings of people talking.
“This is the same technology used by Amazon Echo and Apple Siri to give verbal responses to our questions,” said Mesgarani.
To teach the vocoder to interpret brain activity, Dr. Mesgarani teamed up with Ashesh Mehta, PhD, a neurosurgeon at Northwell Health Physician Partners Neuroscience Institute who treats epilepsy patients, some of whom must undergo regular surgeries.
“Working with Dr. Mehta, we asked epilepsy patients already undergoing brain surgery to listen to sentences spoken by different people, while we measured patterns of brain activity,” said Dr. Mesgarani. “These neural patterns trained the vocoder.”
Next, those same patients listened to speakers reciting digits from 0 to 9, while their brain signals were recorded. The sound produced by the vocoder in response to those signals was analyzed and cleaned up by neural networks, a type of AI that mimics neurons in the biological brain. The end result was a robotic-sounding voice reciting a sequence of numbers. To test the accuracy of the recording, Dr. Mesgarani’s team tasked individuals to listen to the recording and report what they heard.
“We found that people could understand and repeat the sounds about 75% of the time, which is well above and beyond any previous attempts,” said Mesgarani. The improvement in intelligibility was especially evident when comparing the new recordings to the earlier, spectrogram-based attempts. “The sensitive vocoder and powerful neural networks represented the sounds the patients had originally listened to with surprising accuracy.”
Dr. Mesgarani and his team now plan to test words and sentences that are more complicated and want to run the same tests on brain signals emitted when a person speaks or imagines speaking. Ultimately, they hope their system could be part of an implant, similar to those worn by epilepsy patients, that translates the wearer’s thoughts directly into words.
“In this scenario, if the wearer thinks ‘I need a glass of water,’ our system could take the brain signals generated by that thought, and turn them into synthesized, verbal speech,” said Dr. Mesgarani. “This would be a game changer. It would give anyone who has lost their ability to speak, whether through injury or disease, the renewed chance to connect to the world around them.”