HeadStudio

Zero-1-to-A: Zero-Shot One image to Animatable Head Avatars Using Video Diffusion

CVPR 2025

Zhenglin Zhou^1,2, Fan Ma², Hehe Fan^2,✉, Tat-Seng Chua³

¹ State Key Laboratory of Brain-machine Intelligence, Zhejiang University
² ReLER, CCAI, Zhejiang University
³ National University of Singapore

Paper

Code

Video

Method Overview

Zero-1-to-A simultaneously builds both the dataset and avatar from scratch through video diffusion. It establishes a mutually beneficial relationship between dataset construction and avatar reconstruction, iteratively updating the synthesized dataset and training the head avatar on the updated dataset to achieve unified results.

Pipeline of Progressive Learning sequences learning from simple to complex, facilitating symbiotic generation to create consistent avatars from inconsistent video diffusion. This process divides 4D avatar generation into: (1) Spatial Consistency Learning: progressing from frontal to side views with a fixed expression. (2) Temporal Consistency Learning: learn from relaxed to hyperbole expressions under a fixed camera.

Static Avatar Generation

Comparison with the 3D avatar generation methods.

Dynamic Avatar Generation

Comparison with the 4D avatar generation methods.

Talking Head Video Generation

Comparison with the portrait video diffusion methods.

Citation


@inproceedings{zhou2025zero1toa,
  title = {Zero-1-to-A: Zero-Shot One Image to Animatable Head Avatars Using Video Diffusion},
  author = {Zhenglin Zhou and Fan Ma and Hehe Fan and Tat-Seng Chua},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2025},
}