Tech giant Microsoft deletes facial recognition database

Microsoft has deleted a facial recognition training database with 10 million images of 100,000 people amid speculation that it could have allegedly breached GDPR laws.

As privacy and ethical concerns continue to spiral with the increased use of facial recognition technology, the huge collection of images was pulled by the tech giant after an investigation by the Financial Times.

The Microsoft database in question was used to train other facial recognition systems around the world, including those used by military academic researchers and Chinese firms such as SenseTime and Megvii.


The database, called MS Celeb, was used to train facial recognition systems and included images of celebrities pulled from the internet. It is also alleged that the huge database contained photos of private individuals, often without their knowledge or consent.

Microsoft said the database was the largest publicly available facial recognition data set in the world and was meant for use by academic researchers.

The images were harvested from the web under protection of the Creative Commons license, which allows for reuse of images for academic and educational purposes.

Microsoft didn’t announce publicly that the database had been taken down. The company told the paper that: “It was run by an employee that is no longer with Microsoft and has since been removed.”

Databases run by Duke University and Stanford have also been taken offline.

Facial Recognition

Experts believe that facial recognition technology will soon overtake fingerprint technology as the most effective way to identify people.

A facial recognition system used by officials in China connects to millions of CCTV cameras and uses artificial intelligence to pick out targets.

Facial recognition software technology works by matching real time images to a previous photograph of a person. Each face has approximately 80 unique nodal points across the eyes, nose, cheek and mouth which distinguishes one person from another.

The distance between various points on the human face, such as depth of the eye sockets, distance between the eyes and width of nose is then measured by a digital camera.

Face Datasets

The Microsoft database was allegedly  published in 2016 and was first spotted by Berlin-based researcher Adam Harvey, who tracks the use of hundreds of face datasets.

He found that Microsoft has used it to train facial recognition systems, according to the FT.

The data has also been cited in AI research conducted by IBM, Panasonic, Alibaba, Nvidia, Hitachi, Sensetime and Megvii. Sensetime and Megvii supply equipment to officials in Xinjiang, a region in northwestern China, where ethnic minority groups are under surveillance and held in internment camps.


Although Microsoft claimed the dataset was populated with photos of celebrities, it also contained photos of Julie Brill, a former FTC commissioner, as well as several prominent security journalists.

Harvey told the FT that Microsoft has exploited the term “celebrity” to include people who merely work online and have a digital identity.”

Adding: “Many people in the target list are even vocal critics of the very technology Microsoft is using their name and biometric information to build.”

And worryingly, even though the data is no longer available from Microsoft, it could still be used by people who have downloaded a copy.

Harvey added: You cannot make a set of data disappear. After you publish it and download it, it could exist on hard drives around the world.”

Last year, Microsoft’s president, Brad Smith, asked Congress in the U.S to regulate facial recognition technology. The company also turned down a request from police in California to use its facial tech in cars and body cameras.

Scroll to Top