The Frost nails its uncanny, disconcerting vibe in its first few shots. Vast icy mountains, a makeshift camp of military-style tents, a group of people huddled around a fire, barking dogs. It’s familiar stuff, yet weird enough to plant a growing seed of dread. There’s something wrong here.
“Pass me the tail,” someone says. Cut to a close-up of a man by the fire gnawing on a pink piece of jerky. It’s grotesque. The way his lips are moving isn’t quite right. For a beat it looks as if he’s chewing on his own frozen tongue.
Welcome to the unsettling world of AI moviemaking. “We kind of hit a point where we just stopped fighting the desire for photographic accuracy and started leaning into the weirdness that is DALL-E,” says Stephen Parker at Waymark, the Detroit-based video creation company behind The Frost.
The Frost is a 12-minute movie in which every shot is generated by an image-making AI. It’s one of the most impressive—and bizarre—examples yet of this strange new genre. You can watch the film below in an exclusive reveal from MIT Technology Review.
To make The Frost, Waymark took a script written by Josh Rubin, an executive producer at the company who directed the film, and fed it to OpenAI’s image-making model DALL-E 2. After some trial and error to get the model to produce images in a style they were happy with, the filmmakers used DALL-E 2 to generate every single shot. Then they used D-ID, an AI tool that can add movement to still images, to animate these shots, making tents flap in the wind and lips move.
“We built a world out of what DALL-E was giving back to us,” says Rubin. “It’s a strange aesthetic, but we welcomed it with open arms. It became the look of the film.”
“This is certainly the first generative AI film I’ve seen where the style feels consistent,” says Souki Mehdaoui, an independent filmmaker and cofounder of Bell & Whistle, a consultancy specializing in creative technologies. “Generating still images and puppeteering them gives it a fun collaged vibe.”
The Frost joins a string of short films made using various generative AI tools that have been released in the last few months. The best generative video models can still produce only a few seconds of video. So the current crop of films exhibit a wide range of styles and techniques, ranging from storyboard-like sequences of still images, as in The Frost, to mash-ups of many different seconds-long video clips.
In February and March, Runway, a firm that makes AI tools for video production, hosted an AI film festival in New York. Highlights include the otherworldly PLSTC by Laen Sanches, a dizzying sequence of odd, plastic-wrapped sea creatures generated by the image-making model Midjourney; the dreamlike Given Again by Jake Oleson, which uses a technology called NeRF (neural radiance fields) that turns 2D photos into 3D virtual objects; and the surreal nostalgia of Expanded Childhood by Sam Lawton, a slideshow of Lawton’s old family photos that he got DALL-E 2 to extend beyond their borders, letting him toy with the half-remembered details of old pictures.
Lawton showed the images to his father and records his reaction in the film: “Something’s wrong. I don’t know what that is. Do I just not remember it?”
Fast and cheap
Artists are often the first to experiment with new technology. But the immediate future of generative video is being shaped by the advertising industry. Waymark made The Frost to explore how generative AI could be built into its products. The company makes video creation tools for businesses looking for a fast and cheap way to make commercials. Waymark is one of several startups, alongside firms such as Softcube and Vedia AI, that offer bespoke video ads for clients with just a few clicks.
Waymark’s current tech, launched at the start of the year, pulls together several different AI techniques, including large language models, image recognition, and speech synthesis, to generate a video ad on the fly. Waymark also drew on its large data set of non-AI-generated commercials created for previous customers. “We have hundreds of thousands of videos,” says CEO Alex Persky-Stern. “We’ve pulled the best of those and trained it on what a good video looks like.”
To use Waymark’s tool, which it offers as part of a tiered subscription service starting at $25 a month, users supply the web address or social media accounts for their business, and it goes off and gathers all the text and images it can find. It then uses that data to generate a commercial, using OpenAI’s GPT-3 to write a script that is read aloud by a synthesized voice over selected images that highlight the business. A slick minute-long commercial can be generated in seconds. Users can edit the result if they wish, tweaking the script, editing images, choosing a different voice, and so on. Waymark says that more than 100,000 people have used its tool so far.
The trouble is that not every business has a website or images to draw from, says Parker. “An accountant or a therapist might have no assets at all,” he says.
Waymark’s next idea is to use generative AI to create images and video for businesses that don’t yet have any—or don’t want to use the ones they have. “That’s the thrust behind making The Frost,” says Parker. “Create a world, a vibe.”
The Frost has a vibe, for sure. But it is also janky. “It’s not a perfect medium yet by any means,” says Rubin. “It was a bit of a struggle to get certain things from DALL-E, like emotional responses in faces. But at other times, it delighted us. We’d be like, ‘Oh my God, this is magic happening before our eyes.’”
This hit-and-miss process will improve as the technology gets better. DALL-E 2, which Waymark used to make The Frost, was released just a year ago. Video generation tools that generate short clips have only been around for a few months.
The most revolutionary aspect of the technology is being able to generate new shots whenever you want them, says Rubin: “With 15 minutes of trial and error, you get that shot you wanted that fits perfectly into a sequence.” He remembers cutting the film together and needing particular shots, like a close-up of a boot on a mountainside. With DALL-E, he could just call it up. “It’s mind-blowing,” he says. “That’s when it started to be a real eye-opening experience as a filmmaker.”
Chris Boyle, cofounder of Private Island, a London-based startup that makes short-form video, also recalls his first impressions of image-making models last year: “I had a moment of vertigo when I was like, ‘This is going to change everything.’”
Boyle and his team have made commercials for a range of global brands, including Bud Light, Nike, Uber, and Terry’s Chocolate, as well as short in-game videos for blockbuster titles such as Call of Duty.
Private Island has been using AI tools in postproduction for a few years but ramped up during the pandemic. “During lockdown we were very busy but couldn’t shoot in the same way we could before, so we started leaning a lot more into machine learning at that time,” says Boyle.
The company adopted a range of technologies that make postproduction and visual effects easier, such as creating 3D scenes from 2D images with NeRFs and using machine learning to rip motion-capture data from existing footage instead of collecting it from scratch.
But generative AI is the new frontier. A couple of months ago, Private Island posted a spoof beer commercial on its Instagram account that was produced using Runway’s video-making model Gen-2 and Stability AI’s image-making model Stable Diffusion. It became a slow-burn viral hit. Called Synthetic Summer, the video shows a typical backyard party scene where young, carefree people kick back and sip their drinks in the sunshine. Except many of these people have gaping holes instead of mouths, their beer cans sink into their heads when they drink and the backyard is on fire. It’s a horror show.
“You watch it initially—it’s just a very generic, middle-of-the-road Americana thing,” says Boyle. “But your hind brain or whatever is going, ‘Ugh all their faces are on backwards.’”
“We like to play around with using the medium itself to tell the story,” he says. “And I think ‘Synthetic Summer’ is a great example because the medium itself is so creepy. It kind of visualizes some of our fears about AI.”
Playing to its strengths
Is this the beginning of a new era of filmmaking? Current tools have a limited palette. The Frost and “Synthetic Summer” both play to the strengths of the tech that made them. The Frost is well suited to the creepy aesthetic of DALL-E 2. “Synthetic Summer” has many quick cuts, because video generation tools like Gen-2 produce only a few seconds of video at a time that then need to be stitched together. That works for a party scene where everything is chaotic, says Boyle. Private Island also looked at making a martial arts movie, where rapid cuts suit the subject.
This may mean that we will start to see generative video used in music videos and commercials. But beyond that, it’s not clear. Apart from experimental artists and a few brands, there aren’t many other people using it yet, says Mehdaoui.
The constant state of flux is also off-putting to potential clients. “I’ve spoken with many companies who seem interested but balk at putting resources into projects because the tech is changing so fast,” she says. Boyle says that many companies are also wary of the ongoing lawsuits around the use of copyrighted images in the data sets used to train models such as Stable Diffusion.
Nobody knows for sure where this is headed, says Mehdaoui: “There are a lot of assumptions being thrown like darts right now, without a whole lot of nuanced consideration behind them.”
In the meantime, filmmakers are continuing to experiment with these new tools. Inspired by the work of Jake Olseon, who is a friend of hers, Mehdaoui is using generative AI tools to make a short documentary to help destigmatize opioid use disorder.
Waymark is planning a sequel to The Frost, but it is not sold on DALL-E 2. “I’d say it’s more of a ‘watch this space’ kind of thing,” says Persky-Stern. “When we do the next one, we’ll probably use some new tech and see what it can do.”
Private Island is experimenting with other films too. Earlier this year it made a video with a script produced by ChatGPT and images produced by Stable Diffusion. Now it is working on a film that’s a hybrid, with live-action performers wearing costumes designed by Stable Diffusion.
“We’re very into the aesthetic,” says Boyle, adding that it’s a change from the dominant imagery in digital culture, which has been reduced to the emoji and the glitch effect. “It’s very exciting to see where the new aesthetics will come from. Generative AI is like a broken mirror of us.”