AI models spit out photos of real people and copyrighted images
These image-generating AI models are trained on vast data sets consisting of images with text descriptions that have been scraped from the internet. The latest generation of the technology works by taking images in the data set and changing one pixel at a time until the original image is nothing but a collection of random pixels. The AI model then reverses the process to make the pixelated mess into a new image.
The paper is the first time researchers have managed to prove that these AI models memorize images in their training sets, says Ryan Webster, a PhD student at the University of Caen Normandy in France, who has studied privacy in other image generation models but was not involved in the research. This could have implications for startups wanting to use generative AI models in health care, because it shows that these systems risk leaking sensitive private information. OpenAI, Google, and Stability.AI did not respond to our requests for comment.
Eric Wallace, a PhD student at UC Berkeley who was part of the study group, says they hope to raise the alarm over the potential privacy issues around these AI models before they are rolled out widely in sensitive sectors like medicine.
“A lot of people are tempted to try to apply these types of generative approaches to sensitive data, and our work is definitely a cautionary tale that that’s probably a bad idea, unless there’s some kind of extreme safeguards taken to prevent [privacy infringements],” Wallace says.