Demonstration
Toronto Deep Learning
Jamie Kiros · Russ Salakhutdinov · Nitish Srivastava · Yichuan Charlie Tang
Level 2, room 230B
[
Abstract
]
[ Project Page ]
Abstract:
We demonstrate an interactive system for tagging, retrieving and generating sentence descriptions for images. Our models are based on learning a multimodal vector space using deep convolutional networks and long short-term memory (LSTM) recurrent networks for encoding images and sentences. A highly structured multimodal neural language model is used for decoding and generating image descriptions from scratch.
Alongside this, we will also showcase a mobile app where a user can take pictures with their phone (such as objects in the demonstration room) and have these images be classified in real time.
Live content is unavailable. Log in and register to view live content