Zero-1-to-A: Zero-Shot One image to Animatable Head Avatars Using Video Diffusion


CVPR 2025

Zhenglin Zhou1,2, Fan Ma2, Hehe Fan2,✉, Tat-Seng Chua3

1 State Key Laboratory of Brain-machine Intelligence, Zhejiang University
2 ReLER, CCAI, Zhejiang University
3 National University of Singapore


Video


Method Overview


Zero-1-to-A simultaneously builds both the dataset and avatar from scratch through video diffusion. It establishes a mutually beneficial relationship between dataset construction and avatar reconstruction, iteratively updating the synthesized dataset and training the head avatar on the updated dataset to achieve unified results.



Pipeline of Progressive Learning sequences learning from simple to complex, facilitating symbiotic generation to create consistent avatars from inconsistent video diffusion. This process divides 4D avatar generation into: (1) Spatial Consistency Learning: progressing from frontal to side views with a fixed expression. (2) Temporal Consistency Learning: learn from relaxed to hyperbole expressions under a fixed camera.

Static Avatar Generation

Comparison with the 3D avatar generation methods.


Dynamic Avatar Generation

Comparison with the 4D avatar generation methods.


Talking Head Video Generation

Comparison with the portrait video diffusion methods.


Citation


@inproceedings{zhou2025zero1toa,
  title = {Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion},
  author = {Zhenglin Zhou and Fan Ma and Hehe Fan and Tat-Seng Chua},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025},
}