Even if death metal isn’t a perfect fit for you as far as music genres go, you have to admire the AI smarts behind Relentless Doppelganger – a non-stop, 24/7 YouTube livestream churning out heavy death metal generated completely by algorithms.
And this is by no means a one-off trick by Dadabots, the neural network band behind the channel: the project has produced 10 albums to date before this livestream even appeared.
We have to admit the computer-generated sounds of the livestream, all mangled lyrics and frenetic drum beats, sounds unnerving to us. Your mileage and musical taste may vary, but there’s no doubting the impressiveness of the science behind it.
It’s the work of music technologists CJ Carr and Zack Zukowski, who have been experimenting for years on how to get artificial intelligence to produce recognisable music in genres like metal and punk.
“This early example of neural synthesis is a proof-of concept for how machine learning can drive new types of music software,” writes the pair in a 2018 paper. “Creating music can be as simple as specifying a set of music influences on which a model trains.”
The deep learning behind the YouTube channel is trained on samples of a real death metal band called Archspire, hailing from Canada. These real audio snippets are fed through the SampleRNN neural network to try and create realistic imitations.
Like other AI-powered imitation engines we’ve seen, SampleRNN is smart enough to know when it’s produced an audio clip that’s good enough to pass for the genuine article – and as a result it knows which part of its neural network to tweak and strengthen.
The more data that SampleRNN can be trained on, the better it sounds… or to be more accurate, the more like its source material it sounds.
“Early in its training, the kinds of sounds it produces are very noisy and grotesque and textural,” Carr told Jon Christian at the Outline back in 2017. ”As it improves its training, you start hearing elements of the original music it was trained on come through more and more.”
SampleRNN was originally developed to act as a text-to-speech generator, but Carr and Zukowski have adapted it to work on music genres as well. It’s effectively trying to predict what should happen next based on what it’s just played – sometimes making tens of thousands of predictions a second.
It can also go back to correct previous ‘mistakes’ – audio output that doesn’t sound as it should do – but this only extends back a few hundred milliseconds. The result is the Relentless Doppelganger video.
The team behind the livestream thinks the fast and aggressive play of Archspire particularly suits their approach – in other words, were it applied to a different band, it wouldn’t be quite as realistic.
“Most nets we trained made shitty music,” Carr told Rob Dozier at Motherboard. “Music soup. The songs would destabilise and fall apart. This one was special though.”
The project continues. If you like what you hear on the YouTube livestream, you can check out the neural network’s other creations at the Dadabots site.