Alibaba unveils Wan 2.1: Advanced AI model for video generation now open-source

Wan 2.1 comes in four versions: T2V-1.3B, T2V-14B, I2V-14B-720P, and I2V-14B-480P. The T2V models generate videos from text, while the I2V models convert images into videos. Notably, the T2V-1.3B model can run on consumer-grade GPUs with just 8.19GB of VRAM, generating a five-second 480p video in four minutes on an Nvidia RTX 4090.
Hyderabad: Chinese technology behemoth Alibaba has released Wan 2.1, an open-source video generation artificial intelligence (AI) model. The model is now offered on Hugging Face, with support for both commercial and academic utilization, although commercial usage has some limitations.
Wan 2.1 is available in four flavors: T2V-1.3B, T2V-14B, I2V-14B-720P, and I2V-14B-480P. The T2V flavors produce videos from text, while the I2V flavors synthesize images to videos. The T2V-1.3B flavor is notable, as it is capable of executing on consumer-grade GPUs with merely 8.19GB VRAM and outputs a five-second 480p video within four minutes using an Nvidia RTX 4090.
The models employ an advanced 3D causal variational autoencoder (VAE), enhancing video consistency, reducing memory usage, and enabling unlimited-length 1080p video generation. Alibaba claims Wan 2.1 outperforms OpenAI’s Sora AI model in key areas such as scene generation, motion smoothness, and spatial accuracy.
Wan 2.1 is not only meant for video generation but also includes text-to-image, video-to-audio, and video editing features. These features are still not available in the open-source releases.
With the growth in demand for AI-created video content, Alibaba’s new breakthrough is set to advance AI video technology. In addition, the company has publicly declared a three-year investment in AI and cloud computing worth $52 billion to further emphasize its dedication to advancing AI.