We propose a two-stream Deep Encoder-Decoder architecture to tackle the task of fully automatic video object segmentation. Both two streams, i.e., ImSeg-Stream (for static image segmentation) and MoSeg-Stream (for optical flow segmentation), hold the totally same Encoder-Decoder architecture. The Encoder part generates a low-resolution mask with accurate locations and smooth boundaries, while the Decoder part refines the details of initial mask and enlarges its resolution via integrating lower-level features progressively. At last two streams learn to integrate for better results. Moreover, to handle the problem of inadequate video object segmentation datasets, we propose a seeking strategy to generate a large-scale handcrafted dataset for training. Experiments on two standard datasets demonstrate that proposed method outperforms most state-of-the-art methods in both segmentation accuracy and run time.