Traditional frame interpolation methods first estimate motion between two consecutive frames and then synthesize intermediate frames. This problem is challenging because of complex motion and video scenes. In this paper, we present an end-to-end deep network for frame interpolation problem. Based on a video synthesis method deep voxel flow (DVF), refinement modules are designed to increase the accuracy of voxel flow, which we call Refined DVF (RDVF). A deeper architecture with more convolution and deconvolution layers is also utilized to help extract motion. Our results greatly improve the performance of original DVF and compare favorably to state-of-the-art methods both quantitatively and qualitatively.