Stable Video Diffusion turns any image into an animation with AI

Hotstar in UAE
Hotstar in UAE

A new artificial intelligence (AI) model from Stability.ai can make any still image become an animation, it announced in a release earlier this month. It’s the latest AI model created by Stability.ai, which is an open-source AI company started back in 2019. The new model is called Stable Video Diffusion and is based on Stability.ai’s Stable Diffusion image model. The entire code for Stable Video Diffusion is available on Stability.ai’s Github repository, and users can now test out the image-to-video model in a research preview.

Stable Video Diffusion generates an animation after it is conditioned from an uploaded image. Basically, this means the AI model uses what’s in a still image to animate a video. Stability.ai trained the model to create 25 frames based on a still image, combining to form a short video animation. However, users can also create 14-frame videos instead. The animation can be generated at a resolution of up to 576×1024 resolution, but this requires the uploaded image to be of an equal or greater size.

The company believes its Stable Video Diffusion model is better received by users than competing image-to-video AI models. That’s based on a research paper it published alongside the release of Stable Video Diffusion. However, it’s important to note that this was not a peer-reviewed study, so it cannot be considered completely unbiased. In the user survey, Stable Video Diffusion was compared to Runway’s GEN-2 model and Pika Labs’ model.

Potential limitations of the Stability.ai video model

The company does list a few limitations of the Stable Video Diffusion model, though. For one, videos created from still images can only last around 4 seconds. While this might be serviceable for looped content, it wouldn’t be great for any kind of original animation. Aside from that, Stability.ai says that the model sometimes fails to create an animation, and renders a still image instead. Furthermore, the motion generated can be slow or unnatural during AI image animation.

Additionally, like many AI models, this Stable Video Diffusion model struggles with faces and text. Any text in images might become illegible when translated to video, and peoples’ faces may be distorted. The model is only intended for research purposes at the moment, but anyone looking to try it out can get started on the company’s GitHub repository. You’ll need some prior experience in downloading and running code, though.

This latest release continues the rapid pace of AI development. Just yesterday Pika Labs revealed a text-to-video AI generator called Pika 1.0. We’ll likely continue to see video and image generators become more advanced as research continues.

2023-11-30 15:07:08