Demonstration

Toronto Deep Learning

Jamie Kiros · Russ Salakhutdinov · Nitish Srivastava · Yichuan Charlie Tang

2014 Demonstration

Project Page

Abstract

We demonstrate an interactive system for tagging, retrieving and generating sentence descriptions for images. Our models are based on learning a multimodal vector space using deep convolutional networks and long short-term memory (LSTM) recurrent networks for encoding images and sentences. A highly structured multimodal neural language model is used for decoding and generating image descriptions from scratch.

Alongside this, we will also showcase a mobile app where a user can take pictures with their phone (such as objects in the demonstration room) and have these images be classified in real time.

Chat is not available.