Realistic Talking Face Synthesis With Geometry-Aware Feature Transformation


Recent studies have shown remarkable success in synthesizing realistic talking faces by exploiting generative adversarial networks. However, existing methods are mostly target specific that cannot generate images of previously unseen people, and they suffer from artifacts such as blurriness and mismatching of facial details. In this paper, we tackle these problems by proposing a target-agnostic framework. We introduce a geometry-aware feature transformation module to achieve shape transfer while preserving the appearance of the source face. To further improve image quality of synthesized results, we present a multi-scale spatially-consistent transfer unit to maintain spatial consistency between the encoder and decoder features. Experimental results show that our model is able to synthesize photo-realistic talking faces which are previously unseen, outperforming state-of-the-art methods both qualitatively and quantitatively.

2020 IEEE International Conference on Image Processing (ICIP)
Li Song
Li Song
Professor, IEEE Senior Member

Professor, Doctoral Supervisor, the Deputy Director of the Institute of Image Communication and Network Engineering of Shanghai Jiao Tong University, the Double-Appointed Professor of the Institute of Artificial Intelligence and the Collaborative Innovation Center of Future Media Network, the Deputy Secretary-General of the China Video User Experience Alliance and head of the standards group.