Realistic Talking Face Synthesis With Geometry-Aware Feature Transformation

Abstract

Recent studies have shown remarkable success in synthesizing realistic talking faces by exploiting generative adversarial networks. However, existing methods are mostly target specific that cannot generate images of previously unseen people, and they suffer from artifacts such as blurriness and mismatching of facial details. In this paper, we tackle these problems by proposing a target-agnostic framework. We introduce a geometry-aware feature transformation module to achieve shape transfer while preserving the appearance of the source face. To further improve image quality of synthesized results, we present a multi-scale spatially-consistent transfer unit to maintain spatial consistency between the encoder and decoder features. Experimental results show that our model is able to synthesize photo-realistic talking faces which are previously unseen, outperforming state-of-the-art methods both qualitatively and quantitatively.

Publication
2020 IEEE International Conference on Image Processing (ICIP)
Han Xue
Han Xue
PhD Student
Jun Ling
Jun Ling
PhD Student

I’m now a PhD student at SJTU MediaLab, supervised by Prof. Li Song. Prior to join Song’s MediaLab, I had got my bachelor degree and master degree from University of Sience and Technology of China and Shanghai Jiao Tong University, in 2018 and 2021 respectively. My research interests focus on image and video generation, deep learning and computer vision.

Li Song
Li Song
Professor, IEEE Senior Member

Related