Paralleling variable block size motion estimation of HEVC on CPU plus GPU platform


The emerging HEVC standard supports up to 12 variable block sizes ranging from 4×8/8×4 to 64×64 to conduct motion estimation (ME) and motion compensation (MC). This feature contributes considerable coding gain compared with 7 variable block sizes in H.264/AVC at the cost of huge computational complexity. In the test model HM, ME with variable block sizes (VBSME) may be called up to 425 times for the mode decision procedure of one CTU (Coding Tree Unit). Obviously, VBSME becomes the bottleneck for real time encoding. In this paper, we focus on parallel realization architecture design of VBSME in HEVC. Firstly, an efficient parallel encoder framework is proposed for CPU plus GPU platform. With the framework, VBSME, fractional-pixel image interpolation and border padding processes run on GPU without burden on the host CPU. Secondly, for workload balance between CPU and GPU, a fast Prediction Unit partition mode decision algorithm is also proposed. Lastly, the parallel realization strategy of VBSME on GPU is improved for ME compression performance improvement. Experimental results based on the NVIDIA’s C2050 GPU show that the speed of the VBSME strategy on GPU is about 113 times faster than the one on CPU.

2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW)
Li Song
Li Song
Professor, IEEE Senior Member

Professor, Doctoral Supervisor, the Deputy Director of the Institute of Image Communication and Network Engineering of Shanghai Jiao Tong University, the Double-Appointed Professor of the Institute of Artificial Intelligence and the Collaborative Innovation Center of Future Media Network, the Deputy Secretary-General of the China Video User Experience Alliance and head of the standards group.