It should be obvious to anyone following generative AI news that the new technology is quickly becoming ubiquitous. Last year, AI image generators took the world by storm, and near the end of 2022, ChatGPT captured the public imagination. Now, a few weeks into 2023, text-to-video AI is almost here.
This week AI startup Runway announced Gen-2, a multi-modal AI system that can generate videos with clips from other videos, images, or even just text. That’s right, before very long, users will be able to type anything they want into a prompt and get a fully-produced video of what they dreamed up in their imaginations.
Generate videos with nothing but words. If you can say it, now you can see it.
Introducing, Text to Video. With Gen-2.
Learn more at https://t.co/PsJh664G0Q pic.twitter.com/6qEgcZ9QV4
— Runway (@runwayml) March 20, 2023
However, like with all new tech, it’s not quite ready for prime time. According to Gizmodo’s Kyle Bar, while Runway’s new video AI isn’t available to the public yet, there’s already another text-to-video service out there: ModelScope, which was released just a few days ago. Its website is primarily in Chinese, with some English headings. But the samples on AI-generated videos on the site are pretty impressive, even if they’re crude.
Some of the samples on ModelScope’s site include “a giraffe underneath a microwave,” “a goldendoodle playing in a park by a lake,” “A panda bear driving a car,” “a teddy bear running in New York City,” and more. Each video clip is just a few seconds long but plainly demonstrates the power of the new technology. It should be noted that each sample video contains a Shutterstock watermark, likely because the company used stock images to train its AI.
However, ModelScope isn’t exactly user-friendly. In addition to the site being primarily in Chinese, it seems users have to do a little bit of research (or be versed in the ins and outs of generative AI) to get it to work. It’s not like ChatGPT or the New Bing just yet. But the very existence of this tech on the internet means that text-to-video is coming much sooner than many of us thought it might.
Source: Gizmodo