Making New Sounds Using Artificial Intelligence


There are some people who believe that the development of Artificial Intelligence and machine learning will lead to humanity’s demise. Collaboration with a team at Google called magenta whose focus is on exploring. How we can create art and music with machine learning. They got in touch a little while ago and showed me a bunch of things they were working on and asked if there was anything. I would want to do with it, and I was really drawn to a project of theirs called N-SYNTH, which stands for the neural synthesizer.

It is super exciting to me because it’s an entirely new form of synthesis rather than manipulating sounds like most synthesizers. N-SYNTH manipulates data, it analyzes audio files learns how all that data relates to itself and then produces new audio files from scratch and this opens up some interesting possibilities. So for instance, if we have this vibraphone sound and this guitar sound, we can ask and synthesized to generate a new sound that is 50% of each.

It is interpolating between the raw data of both sounds at the level of individual samples and when I say samples I don’t mean like this is a drum sample or this is a flute sample I mean like the building blocks of digital audio. You know how a video is made up of different frames, for instance, these are all the frames. You just saw me saying the word frames. You are seeing 24 frames per second, and it makes for a pretty smooth recreation of all of my movement with audio, it’s similar. You need all these tiny little individual samples of sound that get strung together to make what you hear except with audio you need a little bit more right now. You are hearing 48,000 samples per second our ears just need that much more resolution to perceive digital audio with a quality that’s close to real life and synthesize contextually. Generates one sample at a time based on the previous few thousand samples as well as all the audio.

It’s been trained on and synthesizes sounds a little grittier because they opted for a lower sample rate of 16,000 samples per second because that is still a ton of sample. To generate so you can play with synthesizer right in your browser.

Since I had access to magentas resources though I wanted to do some experimenting the incent algorithm was trained on the pitched material. It was trained on 300,000 notes from a thousand different instruments. I like to mess with stuff so when the folks at Google asked if there was anything I’d like to do, I said two things what if we feed it?

Percussion and what if we feed it a bunch of completely random sounds and that’s what we did I sent in about a hundred drum sounds from my own sample packs. And we ended up with sounds from instruments household objects voices animals. The magenta team crossed every combination of two drum sounds and every combination of two random sounds. Leading to just shy of 9,000 new sounds and yes there will be a sample back. The algorithm did a really good job with the drums.

I just got a whole bunch of variations on drum sounds, where things really got interesting is when we combined two sounds that were completely different from each other?

We have got a fountain Frost with a string scrape. This combination is really interesting. Doesn’t that just sound like an electric guitar this type of sound happened fairly often? And I am wondering if it’s a byproduct of ensign having to deal with much more complex sounds than single notes. Dealing with a lot more Frequencies at once but still trying to interpret that in a pitched way because it was trained on the pitched material.


Please enter your comment!
Please enter your name here