Recent studies have shown remarkable success in synthesizing realistic talking faces by exploiting generative adversarial networks. However, existing methods are mostly target specific that cannot generate images of previously unseen people, and they suffer from artifacts such as blurriness and mismatching of facial details. In this paper, we tackle these problems by proposing a target-agnostic framework. We introduce a geometry-aware feature transformation module to achieve shape transfer while preserving the appearance of the source face. To further improve image quality of synthesized results, we present a multi-scale spatially-consistent transfer unit to maintain spatial consistency between the encoder and decoder features. Experimental results show that our model is able to synthesize photo-realistic talking faces which are previously unseen, outperforming state-of-the-art methods both qualitatively and quantitatively.
2020 IEEE International Conference on Image Processing (ICIP)
I’m a Research PHD candidate at SJTU Media Lab. My research interests include image/video synthesis and face generation, under the direction of Prof. Li Song and Prof. Wenjun Zhang.
I’m now a PhD student at SJTU MediaLab, supervised by Prof. Li Song. Prior to join Song’s MediaLab, I had got my bachelor degree and master degree from University of Sience and Technology of China and Shanghai Jiao Tong University, in 2018 and 2021 respectively. My research interests focus on image and video generation, deep learning and computer vision.
Professor, IEEE Senior Member
Professor, Doctoral Supervisor, the Deputy Director of the Institute of Image Communication and Network Engineering of Shanghai Jiao Tong University, the Double-Appointed Professor of the Institute of Artificial Intelligence and the Collaborative Innovation Center of Future Media Network, the Deputy Secretary-General of the China Video User Experience Alliance and head of the standards group.