Fractional-sample precision motion compensation has been widely adopted in a series of video coding standards to further improve compression efficiency. Usually, signal decomposition based interpolation filters are used to generate fractional samples from integer pixels. However, the coefficients of these finite impluse response filters may not be suitable for varied video contents and coding conditions because of the assumption when designing these filters. In this paper, we regard the fractional interpolation process as an image generation task, which utilizes the real interger position samples at the reference block to predict and generate fractional samples that are much closer to current coding block. We use the con-volutional neural netwok (CNN) as the generator. Moreover, to make the best of CNN’s powerful nonlinear learning ability, instead of inputting the reference block directly, we separately input the corresponding prediction and residual parts of reference block. The proposed dual-input CNN-based interpolation scheme has been incorporated into the HEVC framework and experimental results demonstrate our approach achieves average 0.9% bitrate reduction.