AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment

1Sun Yat-sen University,2University of Oxford
MY ALT TEXT MY ALT TEXT MY ALT TEXT MY ALT TEXT

Faithful and controllable cross-species animation results of AnimateZoo. Without any parameter tuning, our model supports seamlessly inheriting actions from diverse animal species while preserving the scene and appearance information consistency.

Abstract

Recent video editing advancements rely on accurate pose sequences to animate subjects. However, these efforts are not suitable for cross-species animation due to pose misalignment between species (for example, the poses of a cat differs greatly from that of a pig due to differences in body structure). In this paper, we present AnimateZoo, a zero-shot diffusion-based video generator to address this challenging cross-species animation issue, aiming to accurately produce animal animations while preserving the background. The key technique used in our AnimateZoo is subject alignment, which includes two steps. First, we improve appearance feature extraction by integrating a Laplacian detail booster and a prompt-tuning identity extractor. These components are specifically designed to capture essential appearance information, including identity and fine details. Second, we align shape features and address conflicts from differing subjects by introducing a scale-information remover. This ensures accurate cross-species animation. Moreover, we introduce two high-quality animal video datasets featuring a wide variety of species. Trained on these extensive datasets, our model is capable of generating videos characterized by accurate movements, consistent appearance, and high-fidelity frames, without the need for the pre-inference fine-tuning that prior arts required. Extensive experiments showcase the outstanding performance of our method in cross-species action following tasks, demonstrating exceptional shape adaptation capability.

Method Overview

Video Presentation

MY ALT TEXT MY ALT TEXT MY ALT TEXT MY ALT TEXT MY ALT TEXT

Replacing video subjects with animals exhibiting different appearance characteristics

Another Application: Creating Something Out of Nothing

MY ALT TEXT MY ALT TEXT MY ALT TEXT MY ALT TEXT MY ALT TEXT

Generating different animals and actions in the same scenario

MY ALT TEXT MY ALT TEXT MY ALT TEXT MY ALT TEXT

Transplanting the same subject and action across various scenarios

Comparison with Advanced Methods

MY ALT TEXT MY ALT TEXT MY ALT TEXT

Our method surpasses others in background and subject fidelity, video quality, and coherence

BibTeX

@article{xu2024animatezoo,
  title={AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment},
  author={Xu, Yuanfeng and Chen, Yuhao and Huang, Zhongzhan and He, Zijian and Wang, Guangrun and Torr, Philip and Lin, Liang},
  journal={arXiv preprint arXiv:2404.04946},
  year={2024}
}