Facebook's deepfake detection challenge shows how hard these fakes still are to spotWill Heavenon June 12, 2020 at 3:02 pm

Social-media companies are concerned that deepfakes could soon flood their sites. But detecting them automatically is hard. To address the problem, Facebook wants to use AI to help fight back against AI-generated fakes. To train AIs to spot manipulated videos, it is releasing the largest ever data set of deepfakes–more than 100,000 clips produced using 3,426 actors and a range of existing face-swapping techniques.

“Deepfakes are currently not a big issue,” says Facebook’s CTO, Mike Schroepfer. “But the lesson I learned the hard way over last couple years is not to be caught flat-footed. I want to be really prepared for a lot of bad stuff that never happens rather than the other way around.”

Facebook has also announced the winner of its Deepfake Detection Challenge, in which 2,114 participants submitted around 35,000 models trained on its data set. The best model, developed by Selim Seferbekov, a machine-learning engineer at mapping firm Mapbox, was able to detect whether a video was a deepfake with 65% accuracy when tested on a set of 10,000 previously unseen clips, including a mix of new videos generated by Facebook and existing ones taken from the internet.

To make things harder, the training set and test set include videos that a detection system might be confused by, such as people giving makeup tutorials, and videos that have been tweaked by pasting text and shapes over the speakers’ faces, changing the resolution or orientation, and slowing them down.

Rather than learning forensic techniques, such as looking for digital fingerprints in the pixels of a video left behind by the deepfake generation process, the top five entries seem to have learned to spot when something looked “off,” as a human might do.

To do this, the winners all used a new type of convolutional neural network (CNN) developed by Google researchers last year, called EfficientNets. CNNs are commonly used to analyze images and are good at detecting faces or recognizing objects. Improving their accuracy beyond a certain point can require ad hoc fine-tuning, however. EfficientNets provide a more structured way to tune, making it easier to develop more accurate models. But exactly what it is that makes them outperform other neural networks on this task isn’t clear, says Seferbekov.

Facebook does not plan to use any of the winning models on its site. For one thing, 65% accuracy is not yet good enough to be useful. Some models achieved more than 80% accuracy with the training data, but this dropped when pitted against unseen clips. Generalizing to new videos, which can include different faces swapped in using different techniques, is the hardest part of the challenge, says Seferbekov.

He thinks that one way to improve detection would be to focus on the transitions between video frames, tracking them over time. “Even very high-quality deepfakes have some flickering between frames,” says Seferbekov. Humans are good at spotting these inconsistencies, especially in footage of faces. But catching these telltale defects automatically will require larger and more varied training data and a lot more computing power. Seferbekov tried to track these frame transitions but couldn’t. “CPU was a real bottleneck there,” he says.

Facebook suggests that deepfake detection may also be improved by using techniques that go beyond the analysis of an image or video itself, such as assessing its context or provenance.

Sam Gregory, who directs Witness, a project that supports human rights activists in their use of video technologies, welcomes the investment of social-media platforms in deepfake detection. Witness is a member of Partnership on AI, which advised Facebook on its data set. Gregory agrees with Schroepfer that it is worth preparing for the worst. “We haven’t had the deepfake apocalyps,e but these tools are a very nasty addition to gender-based violence and misinformation,” he says. For example, the DeepTrace Labs report found that 96% of deepfakes were nonconsensual pornography, in which other people’s faces are pasted over those of performers in porn clips.

When millions of people are able to create and share videos, trusting what we see is more important than ever. Fake news spreads through Facebook like wildfire, and the mere possibility of deepfakes sows doubt, making us more likely to question genuine footage as well as fake.

What’s more, automatic detection may soon be our only option. “In the future we will see deepfakes that cannot be distinguished by humans,” says Seferbekov.