Alibaba has launched Wan2.2, a set of open-source large video generation models aimed at cinematic video production. Built on the Mixture-of-Experts (MoE) architecture, Wan2.2 is designed to help global creators and developers generate high-quality videos with improved realism and control.
Wan2.2 includes three models:
- Wan2.2-T2V-A14B for text-to-video
- Wan2.2-I2V-A14B for image-to-video
- Wan2.2-TI2V-5B, a hybrid model for both formats
These models enable detailed control over elements like lighting, time of day, frame size, camera angle, and focal length. They also show better motion handling, capturing facial expressions, sports actions, and hand gestures more realistically.
To reduce computing costs, Wan2.2’s two-expert system activates only 14 billion of its 27 billion parameters per step. This lowers power usage by up to 50%.
The models use a cinematic-inspired prompt system to support creative direction across visual dimensions. Compared to its predecessor Wan2.1, Wan2.2 was trained with 65.6% more image data and 83.2% more video data, improving performance in complex scenes and artistic expression.
The hybrid Wan2.2-TI2V-5B uses a high-compression 3D VAE for video creation. It can render 720P videos in minutes on a single consumer GPU, offering better efficiency and scalability for creators.
All Wan2.2 models are available for download on Hugging Face, GitHub, and ModelScope. Alibaba’s earlier models have recorded over 5.4 million downloads across platforms.
