Video is becoming the “biggest big data” as the most important and valuable source for insights and information.

Researches in our lab cover a broad range of topics on video signal. Inspired by state-of-the-art representation model like deep learning, and driven by modern biologically computational model for visual experience, we are working on better solution for video analysis, video processing and video compression. Specifically we are exploring each research topic from three aspects: algorithm, computation and data.


@sjtu-medialab   @media_tech

Follow our Github for academic open source projects.
Follow our Wechat Official Account for the latest media technology progress.

Recent Publications

Modeling Acceleration Properties for Flexible INTRA HEVC Complexity Control

It is a very well-known fact, that the high complexity of the High Efficiency Video Coding standard (HEVC) is the main hurdle for its wide deployment and use. To tackle this problem, a number of recent research outcomes exploit heuristic algorithms and machine learning, including deep learning, to reduce the coding complexity. However, in most cases, each encoder module, i.e., encoding process, is first accelerated individually, and then different acceleration algorithms are manually combined. Without a holistic strategy, the acceleration potential of multi-module combination is not exploited and the Rate-Distortion (RD) loss is generally not well controlled. To tackle these shortcomings, this paper exploits the acceleration properties of different modules, i.e., the numerical representation of potential time saving and possible RD loss, from which a heuristic model is explored. Then a Heuristic Model Oriented Framework (HMOF) is proposed which adapts the properties of modules to underlying acceleration algorithms. In the framework, two advanced acceleration algorithms, including Border Considered CNN (BC-CNN)-based Coding Unit (CU) partition and Naive Bayes-based Prediction Unit (PU) partition, are proposed for the CU and PU modules, respectively. Further, by leveraging the heuristic model as the guidance to combine the proposed acceleration algorithms, HMOF is globally optimized, where different time saving budgets are wisely allocated to different modules and a theoretically minimal RD loss is achieved. According to the experimental results, through fusing a suitable deep learning technique and a Bayes-Based prediction, the proposed acceleration framework HMOF enable multiple acceleration choices. Here the proposed joint optimization strategy help to make a choice leading to the best cost-performance. Furthermore, within the proposed framework, intra coding time can be precisely controlled with negligible Bjøntegaard delta bit-rate (BDBR) loss. In this context, as a complexity control method, HMOF outperforms the state-of-the-art complexity reduction algorithms under a similar complexity reduction ratio. These results partially demonstrate the superiority of the proposed technique.

SpaAbr: Size Prediction Assisted Adaptive Bitrate Algorithm for Scalable Video Coding Contents

Dynamic Adaptive Streaming over HTTP(DASH) is a video transmission ptotocol to adapt to different network conditions and heterogeneous client devices. The client first requests and parses the Media Presentation Description(MPD) file from the DASH server to obtain basic information including the average bitrate list and segment URLs. After that, the adaptive bitrate(ABR) algorithm decides the quality of the next requested segment based on the network and buffer conditions. However, there is a big discrepancy between the segment size calculated with this coarse-grained average bitrate and the real size due to the constantly changing video scenes, which is especially obvious for variable bitrate(VBR) encoded videos. And this error is transparent to the ABR algorithm. This paper first analyzes and confirms the inter-layer correlation of scalable video coding(SVC) encoded contents, then designs a segment size prediction module to cooperate with the ABR algorithm. Experimental results show that predicting a certain enhancement layer(EL) segment size by all its previous layers can increase the probability that the predicted value falls within the accurate interval by 20% −58% compared to predicting the EL only by base layer(BL). Besides, the ABR algorithm assisted with size prediction module can increase the average bitrate by 29.2%, reduce the average bitrate switches by 13.4% and the average rebuffering events by 78.7% compared with the independent ABR algorithm.

IdentityDP: Differential Private Identification Protection for Face Images

Because of the explosive growth of face photos as well as their widespread dissemination and easy accessibility in social media, the security and privacy of personal identity information becomes an unprecedented challenge. Meanwhile, the convenience brought by advanced identity-agnostic computer vision technologies is attractive. Therefore, it is important to use face images while taking careful consideration in protecting people’s identities. Given a face image, face de-identification, also known as face anonymization, refers to generating another image with similar appearance and the same background, while the real identity is hidden. Although extensive efforts have been made, existing face de-identification techniques are either insufficient in photo-reality or incapable of well-balancing privacy and utility. In this paper, we focus on tackling these challenges to improve face de-identification. We propose IdentityDP, a face anonymization framework that combines a data-driven deep neural network with a differential privacy (DP) mechanism. This framework encompasses three stages: facial representations disentanglement, $epsilon$-IdentityDP perturbation and image reconstruction. Our model can effectively obfuscate the identity-related information of faces, preserve significant visual similarity, and generate high-quality images that can be used for identity-agnostic computer vision tasks, such as detection, tracking, etc. Different from the previous methods, we can adjust the balance of privacy and utility through the privacy budget according to pratical demands and provide a diversity of results without pre-annotations. Extensive experiments demonstrate the effectiveness and generalization ability of our proposed anonymization framework.

An Elastic System Architecture for Edge Based Low Latency Interactive Video Applications

5G and edge computing have brought great changes to video industry. Interactive video is becoming an emerging application form of multimedia service, which provides attraction beyond typical scenarios like cloud gaming and remote virtual reality (VR), and puts forward great challenges in resource capacity, response latency, and function flexibility to its service system. In this paper, we propose an elastic system architecture with low latency features to accommodate generic interactive video applications on near user edges. To increase system flexibility, we firstly design a dynamic Directed Acyclic Graph (dDAG) model for efficient task representation. Secondly, based on the model, we present the elastic architecture together with its scalable workflow pipeline. Thirdly, we propose a set of novel latency measurement metrics to analyze and optimize the performance of an interactive video system. Based on the proposed approaches, we disassemble a real world free-viewpoint synthesis application and benchmark its performance with the metrics. Extensive experimental results show the flexibility of our system to handle the stochastic human interactions during a video service session, with less than 5 ms additional scheduling latency introduced. End to end latency is kept within 43 ms for complex functions, and 28 ms for simpler scenarios, which satisfies the restrictions of most interactive video applications provided by an edge. Client of the architecture serves as a pure video player, which is also friendly to power limited terminals such as 5G phones. Efficiency and stability analyses of the system show superiorities over existing work, and also reveal potential optimization directions for future research.

Current Frame Priors Assisted Neural Network for Intra Prediction

Intra prediction is the key technology to reduce spatial redundancy in the modern video coding standard. Recently, deep learning based methods that directly generate the intra prediction by neural network achieve superior performance than traditional directional based intra prediction. However, these methods lack the ability to handle complex blocks which contain mixed directional textures or recurrent patterns since they only use the neighboring reference samples of the current one. The other intermediate information denoted as reference priors in this paper generated during the coding process is not exploited. In this paper, a Current Frame Priors assisted Neural Network (CFPNN) is presented to improve the intra prediction efficiency. Specifically, we utilize the local contextual information provided by the neighboring multiple references as the primary inference source. In addition to the neighboring references, we additionally use the other two reference priors within the current frame – the predictor searched by intra block copy (IntraBC) and the corresponding residual component. The IntraBC predictor provides useful nonlocal information to help generate more accurate prediction for complex blocks together with neighboring local information. While the residual component contains unique information that reflects the characteristics of the block to some extent is utilized to reduce the noise contained in the reconstructed reference samples. Moreover, we investigate the best way to integrate the proposed method into the codec. Experimental results demonstrate that compared to HEVC, our proposed CFPNN achieves an average of 4.1% BD-rate reduction for the luma component under the All Intra configuration.