Compared with spatial domain motion compensated temporal filtering (MCTF) scheme, in-band MCTF scheme needs more coding bits for motion information since the motion estimation (ME) and motion compensation (MC) are implemented on each spatial subband. Therefore, how to employ motion prediction and coding is a key problem to improve the coding efficiency of in-band MCTF. In this paper, we proposed an efficient level-by-level modebased motion prediction and coding scheme for in-band MCTF. In our scheme, three motion prediction and coding modes are introduced to exploit the subband motion correlation at different resolution as well as the spatial motion correlation in the high frequency subband. To tradeoff the complexity and the accuracy of block-based motion search, a jointly rate-distortion criterion is proposed to decide a set of optimized motion vector for three spatial high frequency subbands at the same level. By the rate-distortion optimized mode selection engine, the proposed scheme can improve the coding efficiency about 0.6db for 4CIF sequence.