The Expert's View with Jeremy Kirk

Governance & Risk Management , IT Risk Management , Next-Generation Technologies & Secure Development

FaceApp's Real Score: A Mathematical Face Feature Set

Company's Facial Data Set Is a Rare, Valuable Resource
FaceApp's Real Score: A Mathematical Face Feature Set

FaceApp is extraordinary, and not just for its polished face modification technology. Through hundreds of millions of selfies, the small Russian company behind it has likely created one of the largest private troves of geometric and facial landmark data - on the scale of Google and Facebook.

See Also: Cybersecurity workforce development: A Public/Private Partnership that enhances cybersecurity while giving hands-on SOC experience to students

FaceApp surfaced in the news last month after it was falsely accused of raiding individuals' photo libraries and uploading those photos to the cloud.

The accusation was debunked. And FaceApp tried to provide reassurance by saying it discards most photos within 48 hours despite its permissive privacy policy. And while there's been a lot of digging, nothing has surfaced to indicate there's anything more nefarious going on.

But it's not the photos themselves that are necessarily what's most valuable to Wireless Labs, the Russian company behind FaceApp. It's the mathematical data describing faces that's derived from the photos, which these days is highly sought after information.

Wrinkle Training

How FaceApp works is still very much a black box in a cloud computer.

Wireless Labs' founder Yaroslav Goncharov told a Russian publication two years ago he became interested in neural networks - that is, training computers to work in ways that mimic the human brain - during a three-year stint at Microsoft. Facial manipulation and recognition technology is progressing rapidly, and there are shortcut ways to do what FaceApp is doing.

One academic paper describes a simplified way to age people or add smiles or glasses that doesn't involve deep neural training. The paper came out about six months after FaceApp launched in early 2017, one of many parallel efforts to improve facial recognition.

Other clues and theories have come from testing the app. Corridor Crew - a Los Angeles-based video production crew - experimented with how FaceApp handled rudimentary drawings and added facial features in an attempt to unpack it.

The Corridor Crew "breaks" FaceApp.

Corridor Crew's Niko Peuringer says in the video that FaceApp probably uses a mix of style transfers, structural facial queues and pattern recognition. For FaceApp's aging function, the app's backend may have trained on various photo sets of wrinkly people, which it then can apply to facial landmarks on a new face, Peuringer says.

But what set of training photos did FaceApp use to train its algorithm? Public facial data sets have suddenly become hot to touch due to questions of whether consent was obtained of the subjects. Several facial data sets have been removed over the last few months due to privacy concerns, according to a July 13 story in The New York Times.

I reached out to Goncharov to see if he could explain how his team made FaceApp so effective. We exchanged one message, but I wasn't able to secure an interview. In fact, since the FaceApp surfaced in the news, he's only given one interview to an English-language publication - Forbes.

Disappearing Faces

I asked Adam Harvey, a Berlin-based researcher who tracks open facial databases on and contributed to The New York Times story, what he thought of FaceApp. He didn't want to directly comment on the app, but noted that Google and Facebook "use people's faces in the same way."

Neither Google nor Facebook make their vast photo troves or data available to the public, making it harder for researchers and companies, such as Wireless Labs, to start from scratch. And it's even hard for academics to do facial recognition research.

Harvey pointed me to a 2017 academic paper from the University of Maryland. It notes: "The academic community is at a disadvantage in advancing the state-of-the-art in facial recognition problems due to the unavailability of large high quality training datasets and benchmarks."

These are some of well-known data sets used for facial recognition training, although Microsoft took MS Celeb offline in June. (Source: University of Maryland)

But since its launched two years ago, FaceApp may have been able to build its own on par with Google and Facebook. And it has the full consent of its users, which would generally keep it out of privacy trouble.

Micah Hoffman

FaceApp says it usually discards photos in a couple of days. But before that happens, FaceApp may be extracting the relevant geometric and facial landmark data that helps train its algorithm, says Micah Hoffman, who runs Spotlight Infosec and is a certified SANS instructor.

In that case, FaceApp keeps its promise of not being creepy by storing photos perpetually.

"In two to three days, [FaceApp's] facial recognition algorithms have extracted the meaningful geometries and other pieces of information from these faces," Hoffman says. "You no longer need the original face. You've got what makes up that face in your system, and now you can better identify that. You've trained your models at that point."

Many Angles

Scraping the web doesn't necessarily work for training facial recognition systems.

To train facial recognition models, you need multiple of photos of the same person for an algorithm to distinguish between photos of the same person versus someone else, says Alice Towler, a cognitive scientist and postdoctoral research fellow at the University of New South Wales.

"You need to have an understanding of how a person looks over different days and poses," she says. "It would be a data mine for them [FaceApp] if they've got people taking lots of photos on different days, different angles and different facial expressions."

FaceApp is ready-made for that kind of use. FaceApp has been downloaded more than 100 million times, so its scale is huge. The virality of the app opened a large door for faces - or rather, the data behind the faces - that will likely propel the company's technology.

About the Author

Jeremy Kirk

Jeremy Kirk

Executive Editor, Security and Technology, ISMG

Kirk was executive editor for security and technology for Information Security Media Group. Reporting from Sydney, Australia, he created "The Ransomware Files" podcast, which tells the harrowing stories of IT pros who have fought back against ransomware.

Around the Network

Our website uses cookies. Cookies enable us to provide the best experience possible and help us understand how visitors use our website. By browsing, you agree to our use of cookies.