Real-time UHD video super-resolution and transcoding on heterogeneous hardware

Abstract

Videos have become the major type of data produced and consumed every day. With screens grow larger, ultra high definition (UHD) videos are becoming more popular since they provide better visual experience. However, video contents with UHD resolution are still scarce. High-performance video super-resolution (SR) techniques that can obtain high resolution (HR) videos from low resolution (LR) sources are recently used in UHD video production. Deep learning (DL)-based SR methods can provide HR videos with appreciable objective and subjective qualities, while their massive computational complexity makes the processing speed far slower than real-time even on GPU servers when producing UHD videos. Moreover, transcoding and other video processing algorithms executed during the enhancement are also time and resource consuming, which performs relatively slow on ordinary CPU and GPU servers. Nowadays, hardware including GPU, field-programmable gate array (FPGA) and application specific integrated circuit (ASIC) are proved to have outstanding capability on image and video processing tasks in different aspects, and there are also dedicated hardware accelerators meant for specific video processing tasks. In this paper, we focus on accelerating a UHD video enhancement workflow on heterogeneous system with multiple hardware accelerators. First, we optimize the most time consuming task, video SR, with CUDNN and CUDA libraries to achieve real-time processing speed for a single UHD output frame on an ordinary GPU. Second, we design a GPU-friendly multi-thread scheduling algorithm for data and computation to better utilize GPU resources and achieve real-time performance on outputting UHD video clips. Third, targeting on production environment, we build a UHD video enhancement application on selected heterogeneous hardware, with an integrated command line tool of our proposed algorithm, and achieve 60 fps real-time end to end processing speed. Experiments show high efficiency, robustness and compatibility of our approach.

Publication
Journal of Real-Time Image Processing
Yu Dong
Yu Dong
PhD Student
Li Song
Li Song
Professor, IEEE Senior Member