Intelligent audio – where will we be in the next five years? Considering recent developments, this is an interesting question…

In 2011 nearing the end of my master’s degree in Audio Engineering, I had to choose what to do for my dissertation. Most of my cohort decided to undertake exciting projects such as designing reference monitors, producing albums or sample libraries, but not me. I chose to do something far more isolating and boring in comparison. My dissertation was a theoretical one, exploring the current and future implementation (as I phrased it) of ‘intelligent audio effects’. Boring it may have been but it was a current subject I was interested in and my tutor was all for it as he was thinking about doing a similar research paper.

My paper focused on the current tools and processing available, that will automate the mixing process, while briefly touching on more of the subjective aspects of what a mix production is and how we go about undertaking one. Starting – Ajusting – Reviewing – Ajusting – Reviewing – Finishing.

Of course there isn’t yet any fully intelligent plugin or Utopian DAW. Anything intelligent relies on pre-defined data and what we manage to achieve with our knowledge, skills and efforts. However since 2011 some interesting things have happened and it appears my boring dissertation has actually been paying off very slowly as more and more intelligent technology keeps cropping up. Every time I see something in my news feed, I get a little pang of excitement and think about my dissertation.

Since working as a dubbing mixer I have tried to implement some automated processes but have found that few of these tools rarely save me any time due to checking over, which is pretty self-defeating. I do wonder however, how long will it be until a job like audio dubbing will become, more or less, fully automated. When will my job be more of a job of checking and finishing rather than full on editing and perspiration?

What I believe we are getting close to, relatively speaking, as I stated in the main conclusion of my dissertation; is the help of more pre-defined ‘intelligent assistance’ in our workflows. Features that could help us with less of the manual labour or button pushing, allowing us to focus on more of the creative aspects of our work.

In this blog I wanted to have a look at what’s happened since writing my dissertation in 2011 and what could happen in the next five years, so lets have a look at some of the innovations that have taken interest and you can make your own mind up.

One of the major finds for my dissertation was the work of the Queen Mary University engineering department. Their work showed the most complete automated processing for music production. Compressors with automatic thresholds, attack and release times, automatic gain and fader control, feedback prevention and more. Live mixing seems to be an area where intelligent features will be pushed with some of the latest digital mixers have automatic features and presets to assist in smoother and quicker operation.

Automation

At the time of my dissertation Waves Vocal Rider was the only tool I could find that would automatically write in automation based on fixed thresholds and parameters. Speaking to a few dubbing mixers at the time they were using Vocal Rider with some success by combining it with standard compression. When I started working as a dubbing mixer after university, I have waited for a more focused tool for levelling post-production dialogue and one finally came along. Wave Rider (Not to be confused with Waves Vocal Rider) by the New Zealand based company Quiet Art. Although I haven’t actually use it that much myself, the couple of times I have on documentaries, it did a very accurate job. The plug-in has found commercial success and industry professionals have incorporated it into their workflow.

AI in audio production 

AI hadn’t yet stepped into the world of music or audio production, it is just scratching the surface.

A team of six researchers at MIT created a machine-learning system that matches sound effects to video clips. The algorithm doesn’t make its own sounds—it just pulls from a database of tens of thousands of audio clips. The research team used a convolutional neural network to analyse video frames and a recurrent neural network to pick the audio for it. They leaned heavily on the Caffe deep-learning framework. The goals of this project aren’t just about replacing foley artists. Hopefully computer-vision tech could help robots identify the materials and physical properties of an object by analysing the sounds it makes.

Adobe Voco

I am pretty sure you have heard about the recently showcased Adobe Voco. This incredible piece of software is able to synthesise new words and phrases simply by typing, based on around 20 minutes of a person’s speech. Obviously the potential for used for voice over editing and language purposing are astounding, but this can also cause new issues.

Imagine the ability to change actors sentences in post-production, would definitely make life a lot easier for us sound engineers.

What will the next five years bring?

Over the next five years I believe we will start seeing even more advanced tools to assist us.

Spotting sound effects could be a much faster and even more creative process. Traditional international language re-versioning might not be a thing of the past but edits and replacing lines won’t mean a trip back to the studio for the voice artist. I predicted in my dissertation that all the tools available will be fragmented, meaning an all in one system that combines these kind of features won’t be available, just lots of different tools making some aspects of our work easier.

Conclusion

What I hope to see is software that will assist us into making our workflow faster and easier. Getting rid of those tedious jobs we hate. I have no issues with it and I fully welcome any innovation as long as the main aim is to not replace human intervention but assist us.  Anything that claims to be automatic has had to be programmed by someone else and functions using set parameters. The human brain does not work on set parameters, we constantly adjust, break rules and think outside the box.

What do you think?