关于自然视频的照片真实视频预测，有关框架的自然视频

论文标题

关于自然视频的照片真实视频预测，有关框架的自然视频

Photo-Realistic Video Prediction on Natural Videos of Largely Changing Frames

论文作者

Shouno, Osamu

论文摘要

深度学习的最新进展显着改善了视频预测的性能。但是，最新的方法在未来的预测中仍然遭受模糊和扭曲的困扰，尤其是在框架之间存在很大动作的情况下。为了解决这些问题，我们提出了一个具有分层体系结构的深层残留网络，在该网络中，每一层以不同的空间分辨率对未来状态进行预测，并且通过自上而下的连接合并了对不同层的这些预测，以生成未来的框架。我们通过对抗性和感知损失功能培训了模型，并在由汽车安装的摄像机捕获的自然视频数据集上进行了评估。我们的模型在未来的框架序列上，在很大程度上和略有变化的框架上，在未来的框架预测中，我们的模型在未来的框架预测中均优于最先进的基线。此外，我们的模型生成了未来的框架，其细节和纹理比基线更现实，尤其是在快速相机运动下。

Recent advances in deep learning have significantly improved performance of video prediction. However, state-of-the-art methods still suffer from blurriness and distortions in their future predictions, especially when there are large motions between frames. To address these issues, we propose a deep residual network with the hierarchical architecture where each layer makes a prediction of future state at different spatial resolution, and these predictions of different layers are merged via top-down connections to generate future frames. We trained our model with adversarial and perceptual loss functions, and evaluated it on a natural video dataset captured by car-mounted cameras. Our model quantitatively outperforms state-of-the-art baselines in future frame prediction on video sequences of both largely and slightly changing frames. Furthermore, our model generates future frames with finer details and textures that are perceptually more realistic than the baselines, especially under fast camera motions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题