Scientists develop AI that translates brainwaves into sentences

Navin Bondade 5 min read

Scientists have developed a new artificial intelligence-based system that converts brain activity into text which could result in transforming communication for people who can’t speak or type.

The electrodes on the brain have been used to translate brainwaves into words spoken by a computer which is helpful for people who have lost the ability to speak. When you speak, your brain sends signals from the motor cortex to the muscles in your jaw, lips, and larynx to coordinate their movement and produce a sound.

The AI-based machine works by using brain implants to track these brainwaves that the neurons generate when someone is speaking something. According to experts, the system could eventually aid communication for patients who are unable to speak or type, such as those with locked-in syndrome.

“The brain translates the thoughts of what you want to say into movements of the vocal tract, and that’s what we’re trying to decode,” says Edward Chang at the University of California San Francisco (UCSF).

The team of scientists has created a two-step process to decode those thoughts using an array of electrodes surgically placed onto the part of the brain that controls movement, and computer simulation of a vocal tract to reproduce the sounds of speech.

They have worked with 5 participants who had electrodes on the surface of their motor cortex as a part of their treatment for epilepsy. These people were asked to read 101 sentences aloud which contained words & phrases covered in English while the team recorded the signals sent from the motor cortex during the speech.

This data was then fed into a machine-learning algorithm, a type of artificial intelligence system that converted the brain activity data for each spoken sentence into a string of numbers.

To make sure the numbers related only to aspects of speech, the system compared sounds predicted from small chunks of the brain activity data with actual recorded audio. The string of numbers was then fed into a second part of the system which converted it into a sequence of words.

After that, the team trained an algorithm to reproduce the sound of a spoken word from the collection of signals sent to the lips, jaw, and tongue in real-time and by keeping the word error rates as low as possible.

According to the researchers, at first, the system was not perfect. Some of its mistakes were sentences like “Those musicians harmonize marvelously” was decoded as “The spinach was a famous singer”, and sentences like “A roll of wire lay near the wall” became “Will robin wear a yellow lily”.

“Many of the mistaken words were similar in meaning to the sound of the original word rodent for rabbit, therefore, we found in many cases the gist of the sentence was able to be understood,” says Josh Chartier.

He says the artificial neural network did well at decoding fricatives sounds like the ‘sh’ in ‘ship’ but had a harder time with plosives, such as the ‘b’ sound in ‘bob’.

However, gradually the team found the accuracy far higher than previous approaches.

The team says “robust performance” was possible when training the device on just 25 minutes of speech, but the decoder improved with more data. For this study, they trained the decoder on each participant’s spoken language to produce audio from their brain signals.

Once they had generated audio files based on the signals, the team asked hundreds of native English speakers to listen to the output sentences and identify the words from a set of 10, 25 or 50 choices.

The listeners transcribed 43 percent of the trials perfectly when they had 25 words to choose from, and 21 percent perfectly when they had 50 choices. One listener provided a perfect transcription for 82 sentences with the smaller word list and 60 with the larger.

The researchers further added that the accuracy of the system varied from person to person. For one participant, the accuracy was just 3% of each sentence on average needed correcting, which was technically higher than the word error rate of 5% for professional human transcribers.

“It’s intelligible enough if you have some choice, but if you don’t have those choices, it might not be,” says Marc Slutzky at Northwestern University in Illinois. “To be fair, for an ultimate clinical application in a paralyzed patient, if they can’t say anything, even having a vocabulary of a few hundred words could be a huge advance.”

That may be possible in the future, he says, as the team showed that an algorithm trained on one person’s speech output could be used to decode words from another participant.

The team also asked one person to mimic speech by moving their mouth without making any sounds. The system did not work as well as it did with spoken words, but they were still able to decode some intelligible speech from the mimed words.

Dr. Mahnaz Arvaneh, an expert in brain-machine interfaces at Sheffield University, said it was important to consider ethical issues now. “We are still very, very far away from the point that machines can read our minds,” she said. “But it doesn’t mean that we should not think about it and we should not plan about it.”

Link to the paper:-

Speech synthesis from neural decoding of spoken sentences