Timezone: »

Multimodal Machine Learning
Louis-Philippe Morency · Tadas Baltrusaitis · Aaron Courville · Kyunghyun Cho

Fri Dec 11 05:30 AM -- 03:30 PM (PST) @ 512 dh
Event URL: https://sites.google.com/site/multiml2015/ »

Workshop Overview
Multimodal machine learning aims at building models that can process and relate information from multiple modalities. From the early research on audio-visual speech recognition to the recent explosion of interest in models mapping images to natural language, multimodal machine learning is is a vibrant multi-disciplinary field of increasing importance and with extraordinary potential.
Learning from paired multimodal sources offers the possibility of capturing correspondences between modalities and gain in-depth understanding of natural phenomena. Thus, multimodal data provides a means of reducing our dependence on the more standard supervised learning paradigm that is inherently limited by the availability of labeled examples.

This research field brings some unique challenges for machine learning researchers given the heterogeneity of the data and the complementarity often found between modalities. This workshop will facilitate the progress in multimodal machine learning by bringing together researchers from natural language processing, multimedia, computer vision, speech processing and machine learning to discuss the current challenges and identify the research infrastructure needed to enable a stronger multidisciplinary collaboration.

For keynote talk abstracts and MMML 2015 workshop proceedings:

Oral presentation
- Listen, Attend, and Walk: Neural Mapping of Navigational Instructions to Action Sequences - Hongyuan Mei, Mohit Bansal, Matthew Walter

Oral spotlights
- An Analysis-By-Synthesis Approach to Multisensory Object Shape Perception. Goker Erdogan, Ilker Yildirim, Robert Jacobs
- Active Perception based on Multimodal Hierarchical Dirichlet Processes. Tadahiro Taniguchi, Toshiaki Takano, Ryo Yoshino
- Towards Deep Alignment of Multimodal Data. George Trigeorgis, Mihalis Nicolaou, Stefanos Zafeiriou, Bjorn Schuller
- Multimodal Transfer Deep Learning with an Application in Audio-Visual Recognition. Seungwhan Moon, Suyoun Kim, Haohan Wang

- Multimodal Convolutional Neural Networks for Matching Image and Sentence. Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li
- Group sparse factorization of multiple data views. Eemeli Leppäaho, Samuel Kaski
- Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation. Angeliki Lazaridou, Dat Tien Nguyen, Raffaella Bernardi, Marco Baroni
- Cross-Modal Attribute Recognition in Fashion. Susana Zoghbi, Geert Heyman, Juan Carlos Gomez Carranza, Marie-Francine Moens
- Multimodal Sparse Coding for Event Detection. Youngjune Gwon, William Campbell, Kevin Brady, Douglas Sturim, Miriam Cha, H. T. Kung
- Multimodal Symbolic Association using Parallel Multilayer Perceptron. Federico Raue, Sebastian Palacio, Thomas Breuel, Wonmin Byeon, Andreas Dengel, Marcus Liwicki
- Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning. Janarthanan Rajendran, Mitesh Khapra, Sarath Chandar, Balaraman Ravindran
- Multimodal Learning of Object Concepts and Word Meanings by Robots. Tatsuya Aoki, Takayuki Nagai, Joe Nishihara, Tomoaki Nakamura, Muhammad Attamimi
- Multi-task, Multi-Kernel Learning for Estimating Individual Wellbeing. Natasha Jaques, Sara Taylor, Akane Sano, Rosalind Picard
- Generating Images from Captions with Attention. Elman Mansimov, Emilio Parisotto, Jimmy Ba, Ruslan Salakhutdinov
- Manifold Alignment Determination. Andreas Damianou, Neil Lawrence, Carl Henrik Ek
- Accelerating Multimodal Sequence Retrieval with Convolutional Networks. Colin Raffel, Daniel P. W. Ellis
- Audio-Visual Fusion for Noise Robust Speech Recognition. Nagasrikanth Kallakuri, Ian Lane
- Learning Multimodal Semantic Models for Image Question Answering. Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng
- Greedy Vector-valued Multi-view Learning. Hachem Kadri, Stephane Ayache, Cecile Capponi, François-Xavier Dupé
- S2VT: Sequence to Sequence -- Video to Text. Subhashini Venugopalan, Marcus Rohrbach, Jeff Donahue, Raymond Mooney, Trevor Darrell, Kate Saenko

Fri 6:00 a.m. - 6:15 a.m. [iCal]
Introduction (Talk)
Aaron Courville
Fri 6:15 a.m. - 7:00 a.m. [iCal]
Visual Question Answering (Talk)
Dhruv Batra
Fri 7:00 a.m. - 7:30 a.m. [iCal]
Listen, Attend and Walk: Neural Mapping of Navigational Instructions to Action Sequences (Talk)
Matthew Walter
Fri 7:30 a.m. - 8:00 a.m. [iCal]
Accepted Orals and Spotlights (Spotlight )
George Trigeorgis, Goker Erdogan, Shane Moon, Tadahiro Taniguchi
Fri 8:00 a.m. - 8:30 a.m. [iCal]
Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition (Talk)
Shane Moon
Fri 11:30 a.m. - 12:15 p.m. [iCal]
Generating Natural-Language Video Descriptions using LSTM Recurrent Neural Networks (Talk)
Ray Mooney
Fri 12:15 p.m. - 1:00 p.m. [iCal]
Cross-Modality Distant Supervised Learning for Speech, Text, and Image Classification (Talk)
Li Deng
Fri 1:30 p.m. - 2:15 p.m. [iCal]
Generating Images from Captions with Attention (Talk)
Russ Salakhutdinov
Fri 2:15 p.m. - 3:00 p.m. [iCal]
Automatic Cross-Media Event Schema Construction and Knowledge Population (Talk )
Heng Ji

Author Information

LP Morency (Carnegie Mellon University)
Tadas Baltrusaitis (Carnegie Mellon University)
Aaron Courville (University of Montreal)
Kyunghyun Cho (NYU)

Kyunghyun Cho is an associate professor of computer science and data science at New York University and a research scientist at Facebook AI Research. He was a postdoctoral fellow at the Université de Montréal until summer 2015 under the supervision of Prof. Yoshua Bengio, and received PhD and MSc degrees from Aalto University early 2014 under the supervision of Prof. Juha Karhunen, Dr. Tapani Raiko and Dr. Alexander Ilin. He tries his best to find a balance among machine learning, natural language processing, and life, but almost always fails to do so.

More from the Same Authors