Shot boundary detection using convolutional neural networks


Video shot boundary detection (SBD) is necessary for further video analysis like video retrieval and annotation. Great efforts have been made to develop SBD algorithms for speed and accuracy. Most works implement frame histogram as features to measure similarity for detection. However, when changes between consecutive shot boundaries are small and backgrounds of them are highly similar, most state-of-the-art methods miss these boundaries thus cannot achieve high accuracy of detection. In this paper we propose a novel SBD framework with Convolutional Neural Networks (CNNs). Firstly we adopt a candidate segment selection method to locate the positions of shot boundaries coarsely using adaptive thresholds and eliminate most non-boundary frames. Then CNN is implemented to extract representative features of frames in candidate segments. Finally cut and gradual transitions can be obtained by using a novel pattern-matching method based on a new similarity strategy. Experiments on TRECVID 2001 test data demonstrate that the proposed scheme outperforms the state-of-the-art methods and achieves high accuracy of detection.

2016 Visual Communications and Image Processing (VCIP)
Li Song
Li Song
Professor, IEEE Senior Member

Professor, Doctoral Supervisor, the Deputy Director of the Institute of Image Communication and Network Engineering of Shanghai Jiao Tong University, the Double-Appointed Professor of the Institute of Artificial Intelligence and the Collaborative Innovation Center of Future Media Network, the Deputy Secretary-General of the China Video User Experience Alliance and head of the standards group.