Toronto Deep Learning
Jamie Kiros · Russ Salakhutdinov · Nitish Srivastava · Yichuan Charlie Tang
2014 Demonstration
Abstract
We demonstrate an interactive system for tagging, retrieving and generating sentence descriptions for images. Our models are based on learning a multimodal vector space using deep convolutional networks and long short-term memory (LSTM) recurrent networks for encoding images and sentences. A highly structured multimodal neural language model is used for decoding and generating image descriptions from scratch.
Alongside this, we will also showcase a mobile app where a user can take pictures with their phone (such as objects in the demonstration room) and have these images be classified in real time.
Chat is not available.
Successful Page Load