Skip to yearly menu bar Skip to main content


Poster

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

Zhenzhi Wang · Yixuan Li · Yanhong Zeng · Youqing Fang · Yuwei Guo · Wenran Liu · Jing Tan · Kai Chen · Bo Dai · Tianfan Xue · Dahua Lin


Abstract:

Human image animation involves generating videos from a character photo, allowing user control and unlocking potential for video and movie production.While recent approaches yield impressive results using high-quality training data, the inaccessibility of these datasets hampers fair and transparent benchmarking. Moreover, these approaches prioritize 2D human motion and overlook the significance of camera motions in videos, leading to limited control and unstable video generation.To demystify the training data, we present HumanVid, the first large-scale high-quality dataset tailored for human image animation, which combines crafted real-world and synthetic data. For the real-world data, we compile a vast collection of copyright-free real-world videos from the internet. Through a carefully designed rule-based filtering strategy, we ensure the inclusion of high-quality videos, resulting in 20K human-centric videos in 1080P resolution. Human and camera motion annotation is accomplished using a 2D pose estimator and a SLAM-based method.For the synthetic data, we gather 2,300 copyright-free 3D avatar assets to augment existing available 3D assets. Notably, we introduce a rule-based camera trajectory generation method, enabling the synthetic pipeline to incorporate diverse and precise camera motion annotation, which can rarely found in real-world data. To verify the effectiveness of HumanVid, we establish a baseline model that considers both human and camera motions as conditions. Through extensive experimentation, we demonstrate that training with the real-world portion of HumanVid achieves state-of-the-art performance. Moreover, incorporating the synthetic data enhances user control over both human and camera motions, setting a new benchmark.

Live content is unavailable. Log in and register to view live content