Bit-depth expansion(BDE) algorithms have made great progress on single image. However, it is hard to directly apply on videos with no temporal constraint. Aiming at the problem of video BDE, we adopt an encoder-decoder structure with 3D convolution to fuse spatial and temporal domain information. The encoder utilizes 3-stage down sampling with 3D ResBlocks to align the features of different time series, the decoder adopt the inverse structure with Coordinate Attention to fuse the aligned features and reconstruct the high bit-depth frame. The quantitative and qualitative experimental results show that the proposed video BDE network is superior to other methods. Compared with the best BDE algorithm, we obtain a 1.51db improvement on PSNR. With no additional temporal information such as optical flow, our method is also superior in running speed.