AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment

Sun Yat-sen University
MY ALT TEXT MY ALT TEXT MY ALT TEXT

Faithful and controllable cross-species animation results of AnimateZoo. Without any parameter tuning, our model supports seamlessly inheriting actions from diverse animal species while preserving the scene and appearance information consistency.

Abstract

Recent video editing advancements rely on accurate pose sequences to animate human actors. However, these efforts are not suitable for cross-species animation due to pose misalignment between species (for example, the poses of a cat differs greatly from that of a pig due to their distinct body structures). In this paper, we present AnimateZoo, a zero-shot diffusion-based video generator to address this issue, aiming to accurately animate various animals while preserving the background. The key technique involves two-fold subject alignment. First, we improve appearance feature extraction by integrating a Laplacian detail booster and a prompt-tuning identity extractor. They capture essential appearance information, including identity and fine details. Second, we align shape features and address conflicts from differing animals by introducing a scale-information remover and an adaptive rescaling module. They both enhance subject alignment for accurate cross-species animation. Additionally, we introduce two high-quality animal video datasets with diverse species to benchmark cross-species animation. Trained on these extensive datasets, our model directly generates videos with accurate movements, consistent appearances, and high-fidelity frames, eliminating the need for test-time training. Extensive experiments demonstrate our method's superiority in cross-species animation, showcasing robust adaptability and generality.

Method Overview

Video Presentation

MY ALT TEXT

Replacing video subjects with animals exhibiting different appearance characteristics

Comparison with Advanced Methods

MY ALT TEXT

Our method surpasses others in background and subject fidelity, video quality, and coherence

BibTeX

@article{xu2024animatezoo,
  title={AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment},
  author={Xu, Yuanfeng and Chen, Yuhao and Huang, Zhongzhan and He, Zijian and Wang, Guangrun and Lin, Liang},
  journal={arXiv preprint arXiv:2404.04946},
  year={2024}
}