A machine learning model is presented by auditory neuroscientists at the University of Pittsburgh to help explain how the brain interprets communication sounds like animal calls or spoken language.
The algorithm outlined in the study simulates how social animals, such as guinea pigs and marmoset monkeys, use sound-processing networks in their brains to discern between different sound categories, such as calls for mating, food, or danger, and act on them.
The research represents an essential advance in our understanding of the subtleties and complexities of the neural processing that underlies sound perception. The knowledge gained from this research paves the path for the understanding and eventual treatment of illnesses that influence speech recognition as well as the advancement of hearing aids.
Almost everyone we know will eventually have some hearing loss, either as a result of ageing or loud exposure. According to senior author and Pitt assistant professor of neurobiology Srivatsun Sadagopan, Ph.D., it is crucial to comprehend the biology of sound detection and discover methods to improve it. However, vocal communication itself is an intriguing process. It is nothing short of amazing how our brains communicate with one another and translate ideas into sound.
From the uproar of the jungle to the hum inside a crowded restaurant, humans and animals encounter an incredible array of sounds every day. Animals and humans can communicate and understand one another, including voice pitch and accent, despite the sound pollution in the world around us.
For instance, when we hear the word “hello,” we are able to decipher its meaning regardless of whether the speaker has an American or British accent, whether they are a man or a woman, whether we are in a crowded crossroads or a quiet room.
The team’s first hypothesis was that the recognition and interpretation of communication sounds by the human brain may be similar to that of faces as contrasted to other objects. Although faces are extremely diverse, they share some traits.
Our brain notices relevant elements, like the eyes, nose, and mouth, and their relative placements, and develops a mental map of these minute details that identify a face instead of trying to match every face we come across to some ideal “template” face.
The researchers conducted a number of investigations that demonstrated how such minute qualities might also make up communication sounds. In order to identify the various sounds produced by social animals, the researchers first constructed a machine learning model of sound processing.
They observed the brain activity of guinea pigs listening to their kin’s communication sounds to see if brain responses matched the model. When they heard a noise that contained properties present in particular sorts of these sounds, comparable to the machine learning model, neurons in parts of the brain that process sounds lighted up with a flurry of electrical activity.
The performance of the model was then to be compared to the animals’ actual behaviour.
Squeaks and grunts, which are regarded as distinctive sound signals, were played for the guinea pigs while they were enclosed in a space. Following that, depending on the type of sound that was played, the guinea pigs were trained to move over to various corners of the enclosure and receive fruit rewards.
Then, they made the tasks more difficult by using sound-altering software to speed up or slow down, raise or lower the pitch, add noise and echoes, or all of these things to guinea pig calls in order to simulate the way humans understand the meaning of words spoken by people with different accents.
The animals continued to perform well even in the presence of noise or manufactured echoes, and they were able to do so with the same consistency as if the calls they heard were unmodified. Furthermore, the machine learning model accurately predicted their actions as well as the underlying activation of sound-processing neurons in the brain.
The accuracy of the model for animal speech is now being translated into human speech by the researchers as a further step.
There are considerably better voice recognition models available from an engineering perspective. This model is distinctive in that it closely matches behaviour and brain activity, which helps us understand the biology. According to the lead author Satyabrata Parida, Ph.D., postdoctoral scholar at Pitt’s department of neuroscience, “in the future, these insights can be used to aid individuals with neurodevelopmental conditions or to help engineer better hearing aids.
According to Manaswini Kar, a student in the Sadagopan lab, many people deal with problems that make it difficult for them to recognize speech. It will be able to comprehend and assist those who struggle if we have a better understanding of how a neurotypical brain processes language and makes sense of the auditory environment around it.