ML is cool, but synthesized media is the new buzz

Once machine learning and AI help us identify images, videos, and sound, we can collect and tag data at scale. Not only that, but we can also do it at incredible speed

sound wave - audio wave - speech recognition
Thinkstock

In the past, there has been a lot of talk about machine learning (ML), artificial intelligence (AI), and virtual reality/augmented reality (VR/AR), but lately, the buzz is growing around “synthesized media.” This brief overview introduces what it is and some examples.

As more companies go through digital transformation, the channels of communication are changing.  As people, we embed digital in everything we do, and the way we interact with one another and with machines is evolving. This is driving innovation in more radical places.

The foundation of synthesized media

Many companies are still investing heavily in imaging—being able to identify video or static images and classify them and retrieve them. After they have put structure to this data, they are able to combine assets together. For example, you can go to a service and request images related to your brand, or have AI build a banner ad for you based on some product and brand inputs.

Then there is natural language processing and speech generation. More companies are putting effort into understanding human language, and more important, intent. Once a machine understands, it can respond more accurately. The layering of intent, accents, dialects, and languages is complicated—we can see why this hasn’t been so easy to achieve. NLP (natural language processing) examples are plentiful today with Apple Siri, Microsoft Cortana, and Google Home.

Once machine learning and AI help us identify images, videos, and sound, we can collect and tag data at scale. Not only that, but we can also do it at incredible speed.

Put all these things together and you have the next phase…synthesized media.

Taking it a step further

There are some very interesting questions that AI, ML, self-driving cars, and robotic automation present to humankind. Just imagine what can happen with scenarios like this…

Let’s start with an audio scenario. We already have machines speak to us. Whether it is prompted or in response to an inquiry, you might have heard a machine-like voice when calling into a bank or airline. Even the latest version of Siri was meant to sound more human-like.

What if a machine could listen to your voice and mimic it? We know that senior citizens already get targeted for calls from their grandchild who is in a pinch and needs money sent immediately. Now imagine that same scenario, but now you hear the actual voice of loved ones—how would you know if it is really them on the other line?

Now let’s look at a scenario that adds video to the voice. Find a public figure, perhaps a politician, and capture video from various speeches. Using some specialized tools, you can synthesize a new video with that politician saying whatever you want. And with a voice-mimicking algorithm, you wouldn’t even have to depend on pulling from existing words and phrases he or she said in previous speeches.

It sounds a bit far-fetched, or maybe you are savvy enough to realize this is just around the corner. In fact, these are both available now and the implications are intriguing, to say the least.  For example:

  • Lyrebird is a bird that can mimic essentially any sound, such as chainsaws or monkeys in a jungle. So, it was fitting that this company created a service that listens to your voice for five minutes and then can sound like you saying anything.
  • Another example is researchers at the University of Washington used AI to synthesize a video of President Barack Obama speaking based on footage from his weekly addresses.

While to some I might be painting a picture of negative implications, there are some constructive and positive aspects to this as well. A few examples in a more positive light could include hearing the morning news in the voice of your favorite celebrity or having footage of a loved one speak to you even when they aren’t in your presence.

More companies are getting into synthesized media, so it will be very interesting what the modern era brings to us and the expectations of engaging with a machine.  Where do you feel the technology will take us in the coming years?

This article is published as part of the IDG Contributor Network. Want to Join?