
Meta introduced today Movie Gen, a new AI-powered movie generator that can create video and audio content with a single text prompt. Movie Gen can be used to create original videos and also edit existing videos, and its technology looks really impressive even though it’s still a research project.
“As the most advanced and immersive storytelling suite of models, Movie Gen has four capabilities: video generation, personalized video generation, precise video editing, and audio generation. We’ve trained these models on a combination of licensed and publicly available datasets,” Meta explained today.
Meta isn’t the first company to leverage generative AI technology for video creation. Earlier this year, OpenAI unveiled Sora, a new text-to-video model that can create videos up to a minute long. Google’s DeepMind AI laboratory is also working on generative video model named Veo that can create videos lasting more than 60 seconds.
When creating original videos with AI, Movie Gen is limited to up to 16 seconds of content at a rate of 16 frames per second. However, what probably puts Movie Gen goes ahead of the competition is its ability to create audio that can match the content of a video.
“We trained a 13B parameter audio generation model that can take a video and optional text prompts and generate high-quality and high-fidelity audio up to 45 seconds, including ambient sound, sound effects (Foley), and instrumental background music—all synced to the video content. Further, we introduce an audio extension technique that can generate coherent audio for videos of arbitrary lengths—overall achieving state-of-the-art performance in audio quality, video-to-audio alignment, and text-to-audio alignment,” the company explained.
The company did some A/B human testing to compare Movie Gen with competing models, including Runway Gen 3, OpenAI Sora, and Kling 1.5. The results of its research showed that humans preferred the results of its own video generation model. However, Meta isn’t ready to make Movie Gen available to the public yet.
In the near future, Meta is planning to work with filmmakers and creators and incorporate their feedback to improve its video and audio generation models. “There are lots of optimizations we can do to further decrease inference time and improve the quality of the models by scaling up further,” the company also said today.
On a related note, Paul published earlier today a video demo of Google’s NoteBookLM, an AI-powered note taking and research assistant that can also be used to generate AI podcasts. Paul used it to create a podcast episode with two hosts discussing an 8,600 word blog post from Steven Sinofsky from 2012 about making Windows work on the ARM architecture for Windows. The result is quite fascinating, and I invite you to check it out if you haven’t already.