We present a fast and accurate demo system for our state-of-the-art multi-task video captioning model, with additional interactive-length paragraph generation and cooperative user feedback techniques. The task of automatic video captioning has various applications such as assistance to a visually impaired person and improving the quality of online visual content search or retrieval. Our recent multi-task model uses auxiliary temporal video-to-video and logical premise-to-entailment generation tasks to achieve the best results on three popular community datasets. To address the lack of useful online demo systems for video captioning, we present a fast and interactive demo system of our state-of-the-art multi-task model, that allows users to upload any video file or YouTube link, with the additional novel aspect of generating multi-sentence, paragraph-style captions based on redundancy filtering (especially useful for real-world lengthy videos), where the user can ask for longer captions on the fly. Our demo system also allows for cooperative user feedback, where the user can click on a displayed alternative top-k beam option or rewrite corrections directly, providing us with valuable data for discriminative retraining.
Han Guo (University of North Carolina at Chapel Hill)
Ramakanth Pasunuru (UNC Chapel Hill)
Mohit Bansal (UNC Chapel Hill)
More from the Same Authors
2020 Workshop: HAMLETS: Human And Model in the Loop Evaluation and Training Strategies »
Divyansh Kaushik · Bhargavi Paranjape · Forough Arabshahi · Yanai Elazar · Yixin Nie · Max Bartolo · Polina Kirichenko · Pontus Lars Erik Saito Stenetorp · Mohit Bansal · Zachary Lipton · Douwe Kiela