Expression Domain Translation Network for Cross-domain Head Reenactment

1Korea University, 2KAIST
Responsive image Responsive image Responsive image Responsive image
Cross-domain head reenactment examples of our method

Abstract

Despite the remarkable advancements in head reenactment, the existing methods face challenges in cross-domain head reenactment, which aims to transfer human motions to domains outside the human, including cartoon characters. It is still difficult to extract motion from out-of-domain images due to the distinct appearances, such as large eyes. Recently, previous work introduced a large-scale anime dataset called AnimeCeleb and a cross-domain head reenactment model including an optimization-based mapping function to translate the human domain’s expressions to the anime domain. However, we found that the mapping function, which relies on a subset of expressions, imposes limitations on the mapping of various expressions. To solve this challenge, we introduce a novel expression domain translation network that transforms human expressions into anime expressions. Specifically, to maintain the geometric consistency of expressions between the input and output of the expression domain translation network, we employ a 3D geometric-aware loss function that reduces the distances between the vertices in the 3D mesh of the input and output. By doing so, it forces high-fidelity and one-to-one mapping with respect to two cross-expression domains. Our method outperforms existing methods in both qualitative and quantitative analysis, marking a significant advancement in the field of cross-domain head reenactment.

Method overview

Responsive image
An overview of our proposed method. (A) for expression domain translation network training method. (B) for Anime Generator that can edit the source image with driving expression. (C) for optimizing pose adapter.

Qualitative Comparisons


Videos

Compare 3D AnimeCeleb with Animo

Responsive image Responsive image Responsive image
(A) Source image   (B) Driving image   (C) Animo   (D) Ours

Compare 2D AnimeCeleb with Baselines

Responsive image Responsive image Responsive image
(A) Source image   (B) Driving image   (C) FOMM   (D) Pirenderer+T   (E) Animo   (F) Ours

Compare with rendered image

Responsive image Responsive image Responsive image
(A) Driving image   (B) Rendered   (C)~(E) Ours

Images

2D AnimeCeleb

    (A) Source image   (B) Driving image   (C) Animo   (D) Ours

    3D AnimeCeleb

      (A) Source image   (B) Driving image   (C) Animo   (D) Ours

      In the wild examples

        (A) Source image   (B) Driving image   (C) Animo   (D) Ours

        With rendered image

          (A) Source image   (B) Driving image   (C) Rendered image   (D) Ours

          BibTeX

          @misc{kang2023expression,
                title={Expression Domain Translation Network for Cross-domain Head Reenactment}, 
                author={Taewoong Kang and Jeongsik Oh and Jaeseong Lee and Sunghyun Park and Jaegul Choo},
                year={2023},
                eprint={2310.10073},
                archivePrefix={arXiv},
                primaryClass={cs.CV}
          }