1Beihang University 2Tsinghua University
✉Corresponding author †Work done during an internship at Tsinghua University
Synthesizing high-fidelity and emotion-controllable talking video portraits, with audio-lip sync, vivid expressions, realistic head poses, and eye blinks, has been an important and challenging task in recent years. Most existing methods suffer in achieving personalized and precise emotion control, smooth transitions between different emotion states, and the generation of diverse motions. To tackle these challenges, we present GMTalker, a Gaussian mixture-based emotional talking portraits generation framework. Specifically, we propose a Gaussian mixture-based expression generator that can construct a continuous and disentangled latent space, achieving more flexible emotion manipulation. Furthermore, we introduce a normalizing flow-based motion generator pretrained on a large dataset with a wide-range motion to generate diverse head poses, blinks, and eyeball movements. Finally, we propose a personalized emotion-guided head generator with an emotion mapping network that can synthesize high-fidelity and faithful emotional video portraits. Both quantitative and qualitative experiments demonstrate our method outperforms previous methods in image quality, photo-realism, emotion accuracy, and motion diversity.
Pipeline of GMTalker. Our framework consists of three parts: (a) In the first part, given the input speech and emotion weights label, we propose GMEG to generate 3DMM expression coefficients sampling from Gaussian mixture latent space. (b) In the second part, we introduce NFMG to predict motion coefficients from the audio, including poses, eye blinks, and gaze. (c) In the third part, we render these coefficients to 3DMM renderings for the target person and then use an emotion-guided head generator with EMN to synthesize photo-realistic video portraits with personalized style.
@article{xia2024gmtalker,
title={GMTalker: Gaussian Mixture-based Audio-Driven Emotional Talking Video Portraits},
author={Xia, Yibo and Wang, Lizhen and Deng, Xiang and Luo, Xiaoyan and Liu, Yebin},
journal={arXiv preprint arXiv:2312.07669v2},
year={2024}
}