Generative Algorithms for Voice, Sound and Video

360-degree Video
March 10, 2020
A.I. at the Edge
March 10, 2020

Generative Algorithms for Voice, Sound and Video

his trend is likely to become more problematic as non-malicious deepfakes are used more widely in entertainment, gaming and news media.

Researchers at chipmaker Nvidia deployed a new generative algorithm in 2018 that created realistic human faces using a GAN (generative adversarial network.) In their system, the algorithm could also tweak various elements, like age and freckle density.

A team at University of California-Berkeley created software that can transfer the movements of one person in a video to someone in another video automatically.

For some time, we’ve been training computers to watch videos and predict corresponding sounds in our physical world. For example, researchers at MIT’s CSAIL experimented to learn whether a computer could accurately predict what sound is generated when a wooden drumstick taps a couch, a pile of leaves or a glass windowpane. The focus of this research is to help systems understand how objects interact with each other in the physical realm.

That work led to more sophisticated spoofing: In 2017, researchers at the University of Washington developed a model that convincingly showed former President Barack Obama giving a speech—one that he never actually gave in real life. In 2018, a Belgian political party, Socialistische Partij Anders, or sp.a for short, published realistic videos of Donald Trump all over social media in which he offered advice on climate change: “As you know, I had the balls to withdraw from the Paris climate agreement,” he said, looking directly into the camera, “and so should you.”

This trend is likely to become more problematic as non-malicious deepfakes are used more widely in entertainment, gaming and news media.