A new artificial intelligence created by researchers at the Massachusetts Institute of Technology pulls off a staggering feat: by analyzing only a short audio clip of a person’s voice, it reconstructs what they might look like in real life.
The AI’s results aren’t perfect, but they’re pretty good – a remarkable and somewhat terrifying example of how a sophisticated AI can make incredible inferences from tiny snippets of data.
In a paper published this week to the preprint server arXiv, the team describes how it used trained a generative adversarial network to analyze short voice clips and “match several biometric characteristics of the speaker,” resulting in “matching accuracies that are much better than chance.”
That’s the carefully-couched language of the researchers. In practice, the Speech2Face algorithm seems to have an uncanny knack for spitting out rough likenesses of people based on nothing but their speaking voices.
The MIT researchers urge caution on the project’s GitHub page, acknowledging that the tech raises worrisome questions about privacy and discrimination.
“Although this is a purely academic investigation, we feel that it is important to explicitly discuss in the paper a set of ethical considerations due to the potential sensitivity of facial information,” they wrote, suggesting that “any further investigation or practical use of this technology will be carefully tested to ensure that the training data is representative of the intended user population.”