The spoken word is a powerful tool, but not all of us have the ability to use it, either due to biology or circumstances. In such cases, technology can bridge the gap – and now that gap is looking shorter than ever, with a new algorithm that turns messages meant for your muscles into legible sounds.
Converting the complex mix of information sent from the brain to the orchestra of body parts required to transform a puff of air into meaningful sound is by no means a simple feat.
The lips, tongue, throat, jaw, larynx, and diaphragm all need to work together in near-perfect synchrony, requiring our brain to become a master conductor when it comes to uttering even the simplest of phrases.
Researchers from the University of California San Francisco (UCSF) had their work cut out for them when they set out to tap into this confusing pattern of neurological baton-waving in order to create artificial speech.
There are a few different ways to go about it, it seems. Earlier this year, a team led by Columbia University successfully used a completely different approach to turning brain activity into audible language.
Their method reconstructed one-syllable words based directly on the brain’s perception of spoken sounds lifted from the human auditory cortex. Synthetic speech produced this way could be understood three quarters of the time – not a bad result, all things given.
But turning the words as our brain interprets them directly into actual speech risks introducing distortions that make the words hard to understand.
A better way, based on earlier research carried out by the UCSF team, might be to decode the signals a brain sends to the body’s vocal equipment and then guess how it will turn into the kinds of articulations responsible for making sounds.
In principle, translating muscle movements would make for a clearer and therefore easier to interpret outcome than a single-step translation of the brain signals alone.
To test their idea, the researchers recruited five volunteers who were already undergoing brain surgery to treat their chronic epilepsy.
As a part of their procedure, the patients had an array of electrodes implanted right against the surface of their brain – just the thing to sift out neurological messages that pulled the strings on their speech systems.
They also had sensors glued to their tongue, teeth, and lips to keep track of their movements.
Once they were all wired up, the subjects read hundreds of words and sentences from a speech recognition database, as well as a number of passages from famous tales like Sleeping Beauty and the Hare and the Tortoise.
To exclude possible signals that are the result of hearing their own voice, one volunteer silently mimed their reading, stopping just short of turning their speech into audible sounds.
Patterns of brain signals generated exclusively to control the movement of lips, tongue, and jaw were then sifted out of the results by a specially designed algorithm.
It was this map of movements that formed the song-sheet of sounds to be generated by a speech synthesiser.
The results are remarkable. They’re not quite perfect, but it’s hard not to be impressed. Just have a listen to the clips below with your eyes closed.
More than 1,700 participants from the crowdsourcing market Amazon Mechanical Turk did their best to guess which words in long lists of possibilities matched the synthesised sentences they heard.
The results were rather varied. One astute listener nailed every sentence. When given a list of 25 possible words, in general listeners transcribed just under half perfectly.
Yet some sentences were a lot easier than others, and even some of the more garbled strings of sound still had words that clearly stood out.
Applying this kind of research to marketable technology will take a lot more research, not to mention the overcoming of practical and ethical hurdles in delivering neural implants.
Still, the advances clearly speak for themselves.
This research is published in Nature.