Where the AI Art Boom Came From—and Where It’s Going
It used to be widely thought that creative work would be one of the last things to be automated. After 2022, some may reconsider.
In the space of a few months last year, several powerful tools for creating art with AI just by typing in a few words became widely available. The quality of illustrations, photographs, and paintings that can be made that way improved remarkably. Some commercial artists are experimenting with the technology—although not all like it—and stock photo services are preparing to offer AI generated images.
That rapid progress set entrepreneurs racing to build products and companies around AI image generators. Researchers continue to refine the technology. WIRED recently got to experiment with one of the first AI tools capable of generating video, developed by researchers at Meta. The clips aren't flawless, but comparing them with examples from the years of research leading up to 2022’s AI art explosion provides a visual timeline of a technology maturing rapidly from lab experiment to product prototype.
-
Courtesy of MetaA video generation system created by researchers at Meta produced this clip of “Fireworks over Manhattan.”
The image-generation technology capturing the attention of entrepreneurs and artists is built on decades of advances in AI. In particular, about 10 years ago researchers found that feeding algorithms called neural networks huge numbers of images with associated labels enabled them to label previously unseen images with high accuracy. This is how Apple Photos and Google Photos can automatically organize pictures of pets taken on a smartphone.
Image-making AI tools flip this image-labeling trick on its head. Algorithms that have digested huge numbers of images and associated text from the web can generate new images from text provided by a user. At the core is what’s called a “generative model,” which learns the properties of a collection of data and can then create new data that statistically fits in with the original collection. As well as making images, this approach can be used to write text, compose music, or answer questions. The commercial potential of so-called generative AI has sparked excitement among tech investors.
Generative models have been used in statistics for decades, but last year's AI image-making bonanza has its roots in an invention from 2014. That’s when Ian Goodfellow, then a student at the University of Montreal, came up with a new approach to generative models called generative adversarial networks (GANs).
-
Courtesy of Ian GoodfellowIn 2014, an algorithm called a GAN generated these faces. The rightmost column shows real photos used to train the system.
GANS involve two neural networks—algorithms used in machine learning—working against each other. One tries to generate something to match a collection of examples, while the other tries to distinguish between real and fake examples. Over many rounds of competition, the fake detector pushes the fake generator to get better. This trick proved capable of making simple images of handwritten characters, roughly drawn faces, and more complex scenes that resembled real photos.
-
Courtesy of Alec RadfordIn 2016, after digesting 3 million photos of real bedrooms, a GAN generated these rooms of its own.
The first GAN-generated images were hardly saleable art, but they sparked a rush of interest AI-generated imagery. Other researchers quickly honed the technique to produce more complex and coherent output.
In 2016, researchers from Facebook and a startup called Indico made an improved version of GANs able to create far more realistic—although still imperfect—images, such as interior scenes and faces. That same year a team at the University of Michigan and the Max Planck Institute in Germany demonstrated how GANs could generate relevant images in response to a specific text prompt.
-
-
Courtesy of Phillip Isola/Alexei A. EfrosIn 2017, a project called CycleGAN showed algorithms could remix visual components from different images.
Researchers at UC Berkeley showed that GANs could also be used to modify images, for instance adding zebra stripes to horses or converting a photograph into a painting in the style of Monet. The research demonstrated that algorithms could remix different elements or styles encountered in its training data, a feature of the tools that have recently shown so much promise.
Alexei Efros, a professor at UC Berkeley involved with the project, says that it also showed that more data and computing power could significantly improve the output of an image generator—something that deep-pocketed tech companies were well-placed to exploit.
-
Courtesy of NvidiaEvery one of these faces was generated by algorithms trained on 70,000 photos of real people.
In 2019, a team at chip company Nvidia wowed the internet by revealing a GAN-based algorithm for generating photorealistic faces. They look stunning compared to the early attempts, although they still have giveaway flaws.
-
Courtesy of OpenAIOpenAI’s image generator DALL-E marked a turning point in generative AI.
So far, so weird. Then, in January 2021, OpenAI announced DALL-E, a system capable of generating impressive images from a text prompt. (The name is a portmanteau of Salvador Dalì and the Disney character WALL-E.)
It was capable of producing close-to-photo-realistic images in a variety of styles, and could combine concepts in amusing ways—for example sketching out “avocado armchairs” and “an illustration of a radish taking a dog for a walk.” DALL-E was built by modifying a generative model called GPT that is designed to handle text that was trained on text-image pairs from the internet.
-
Courtesy of OpenAIDALL-E 2’s higher quality sparked excitement about the commercial potential of AI-made images.
A key ingredient of DALL-E’s impressive performance, says Efros at Berkeley, was the huge amount of training data OpenAI fed into it. “They're using reasonably simple algorithms that have been done before, more or less," he says. “But they really scale them up in a way that, you know, magic starts to happen.”
This June, OpenAI announced a follow-up, DALL-E 2, that was improved thanks to more data and more computing power. It uses a new and more powerful type of generative algorithm, known as diffusion models, inspired by math used to model phenomena in physics. They work by challenging an algorithm to learn how to remove noise that has been added to an image.
-
Courtesy of OpenAISome images made by DALL-E 2, such as this “astronaut riding a horse on the moon,” could be mistaken for human creations.
OpenAI’s image generators were originally made available only to select people, in part out of concern they would be abused. When this kind of system is trained on material scraped from the web it generally learns to produce sexual imagery and picks up historical biases in how it depicts people of different races and genders.
But it didn’t take long for image generators to become widely available. In June 2022, an independent project inspired by OpenAI’s work, now known as Craiyon, became an online sensation as users competed to produce ever-more surreal or comical images. And several companies made AI image generators similar in power to DALL-E 2 available to anyone to use. In September, OpenAI made its own tool available to anyone.
“It's really just been an incredible time of discovery,” says David Holz, CEO of AI art startup Midjourney, of the past year. “Most startling is the realization of how much further the technology can still go. I think we'll see more aesthetic exploration over the next three years than the past 200 years.”
Emad Mostaque, CEO of Stability AI, a startup with its own image generator, calls 2022 a breakthrough year. "We got fast enough, cheap enough, and most importantly good enough to make this accessible to everyone, everywhere,” he says.
-
Courtesy of MetaRobots watching fireworks, as visualized by a system called Make-A-Video from researchers at Meta.
The wide availability of image generators has caused not only an explosion of experimentation but also discussion around the implications of the technology. One knotty problem is that the images created can inherit biases from the data they are fed; another that they could be used to generate harmful content. The copyright and trademark implications of AI art are also unclear, and some artists worry that such tools may make work harder to find.
Those debates will continue in 2023—and the technology looks likely to keep improving quickly. In December, researchers at Google announced an image-generation tool called Muse built around a new technique. They claim it is significantly more efficient than previous image generators, creating images in a third of the time Stable Diffusion needs, and with higher quality results. Google's new technique can also be used to edit images using text instructions—something that could prove useful to creative professionals.
One thing holding back wider use of image generators is that they do not have a meaningful understanding of how text relates to elements in an image. In October, two students at MIT, Nan Liu and Shuang Li, demonstrated a way to ask an image generator to include or exclude specific elements in an image, and specify details like placing one object in front of another.
That could help people get image generators to do what they ask more often, but Josh Tenenbaum, a professor at MIT involved in the project, says the fact remains that existing AI tools simply do not understand the world in the way humans do. “It's amazing what they can do, but their ability to imagine what the world might be like from simple descriptions is often very limited and counterintuitive,” he says.
As excitement—and funding—for AI art tools grows, 2023 will probably bring higher quality AI-made images and perhaps the emergence of AI video generators. Researchers have demonstrated prototypes, although their output is so far relatively simple. Yet Stable Diffusion, Midjourney, Google, Meta, and Nvidia are all working on the technology.
For a taste of what’s to come, WIRED asked Meta to generate a few videos of New Year’s celebrations. The results are crude, but if the recent history of AI imager generators is anything to go by, then they will improve fast. A whole new set of debates about AI's creative power and ethical and economic consequences may be about to begin.