Oral
in
Workshop: Deep Generative Models and Downstream Applications
Stochastic Video Prediction with Perceptual Loss
Donghun Lee · Ingook Jang · Seonghyun Kim · Chanwon Park · JUN HEE PARK
Predicting future states is a challenging process in the decision-making system because of its inherently uncertain nature. Most works in this literature are based on deep generative networks such as variational autoencoder which uses pixel-wise reconstruction in their loss functions. Predicting the future with pixel-wise reconstruction could fail to capture the full distribution of high-level representations and result in inaccurate and blurred predictions. In this paper, we propose stochastic video generation with perceptual loss (SVG-PL) to improve uncertainty and blurred area in future prediction. The proposed model combines perceptual loss function and pixel-wise loss function for image reconstruction and future state predictions. The model is built on a variational autoencoder to reduce high dimensionality to latent variable to capture both spatial information and temporal dynamics of future prediction. We show that utilization of perceptual loss on video prediction improves reconstruction ability and result in clear predictions. Improvements in video prediction could further help the decision-making process in multiple downstream applications.