The layered feature of scalable video coding (SVC) offers a sufficient adaptation to unreliable transmission. When network condition drops sharply, enhancement layers will be abandoned, and only base layers are delivered. However, this will cause noticeable visual artifacts due to quality differences between different layers. To alleviate this problem, we novelly introduce a deep learning-based method into video reconstruction phase of scalable bitstreams. A super-resolution motivated recurrent network is proposed to extract and fuse features from both previous high-resolution frames and the current low-resolution frame. To the best of our knowledge, this is the first attempt to improve the performance of scalable bitstreams reconstruction by a specifically designed super-resolution network. By seamlessly integrating the accessible features, significant video quality improvements in terms of PSNR, SSIM, and VMAF are achieved. At the same time, the improvement of overall visual quality stability is apparent under packet lossy networks, indicating both efficiency and robustness of our approach.