Two-stream deep encoder-decoder architecture for fully automatic video object segmentation

Abstract

We propose a two-stream Deep Encoder-Decoder architecture to tackle the task of fully automatic video object segmentation. Both two streams, i.e., ImSeg-Stream (for static image segmentation) and MoSeg-Stream (for optical flow segmentation), hold the totally same Encoder-Decoder architecture. The Encoder part generates a low-resolution mask with accurate locations and smooth boundaries, while the Decoder part refines the details of initial mask and enlarges its resolution via integrating lower-level features progressively. At last two streams learn to integrate for better results. Moreover, to handle the problem of inadequate video object segmentation datasets, we propose a seeking strategy to generate a large-scale handcrafted dataset for training. Experiments on two standard datasets demonstrate that proposed method outperforms most state-of-the-art methods in both segmentation accuracy and run time.

Publication
2017 IEEE Visual Communications and Image Processing (VCIP)
Jingwei Xu
Jingwei Xu
Master Student
Li Song
Li Song
Professor, IEEE Senior Member

Related