Around the middle of last year, I had to roll up to my company’s offices in New England for a couple of days of meetings. As I often do on such drives, I tune into a podcast or two between calls. During that drive, I caught a couple of episodes of the Politcology podcast. They ran a really great 2 part series about politically-motivated deepfakes (Episode 1 & Episode 2) which touched on other areas that are more relatable to our everyday lives.
They kicked off the first episode talking about the now-famous video that Jordan Peele and BuzzFeed created, showing President Obama saying things he never actually said. Here, check it out if you’ve never seen it:
This video was made a few years ago – back in 2018. If you really pay attention, you’ll notice that the voice isn’t quite right. It’s not perfectly synchronized with the facial movements, plus you can tell it’s not actually Obama’s voice. But, it’s good enough to convince a lot of folks, particularly those who aren’t well-informed, or just not paying attention.
Fast forward 5 years to 2023. How has the tech changed? Naturally, software has improved, plus there’s more compute power than ever. There are open-source tools out there to analyze existing media and leverage it to synthesize additional media. We’ve even seen attempts to use these tools in the past year. Look at the Russian invasion of Ukraine – I can recall at least 2 instances of faked videos used as propaganda tools. While better, one can still tell these videos are fakes.
The much greater danger we all face today is less about faked videos of world leaders and more about faked audio leveraged by criminals. Consider the voiceprint authentication systems we’re now seeing many companies deploy to verify our identities when we’re calling in for some sort of service. With just a few seconds of voice sample, the technology now exists to “put words in your mouth” – making synthetic recordings of you saying pretty much anything.
Ok, am I being an alarmist? Have I joined the tin-foil hat squad? I’ll point you to this recent article from TechCrunch about Microsoft’s VALL-E research project. Input 3 seconds of real speech audio from a person, and now you’ve got the power to produce whatever you’d like to hear, but in that person’s voice, with their diction, even with normal-sounding background noise. Now, imagine your bank deploys a voice-based authentication system like this. Some crook takes a few seconds of your voice, uses a tool like VALL-E, and whammo! Access to your accounts is granted.
Ok, so that’s audio – how close are we to average folks having access to manipulate someone’s likeness in a reliably, reproducible way? That’s today. In the past 10 minutes, I started off by going to Phil Wang’s This Person Does Not Exist site. There, I generated 2 face images, 1 male, 1 female. Next, I dropped these images into my iCloud Photo Library, got on my phone and used the FaceApp app to change the gender of each of these people. Check out the results below. “Originals” on the left, gender transformations on the right. Remember – none of these images are real people. They’re all generated and manipulated through software that uses AI.
Nifty, huh? Now, how about I take Mr. Fake Man and make him sing to us using the Facedance app?
Ok, I’ll be first in line to point out how not-perfect that video is. But, consider that I made it in about 10 seconds, using a free app on an iPhone 13 Pro. That tech’s only going to continue getting better at pretending.
So what to do? It seems clear that the present danger surrounds audio. I saw an interview the other day with a man who claimed to have been nearly scammed by a crook pretending to be one of his friends who needed a quick financial bailout. Had he not taken a moment to call the friend’s wife to check up about that voicemail first, he could have easily been scammed. My best advice? Verify before you send any money, even if it’s your family or best friend. Maybe even establish some sort of pre-arranged password or other way to authenticate the person on the other end.