Timezone: »
In this paper, we focus on training and evaluating effective word embeddings with both text and visual information. More specifically, we introduce a large-scale dataset with 300 million sentences describing over 40 million images crawled and downloaded from publicly available Pins (i.e. an image with sentence descriptions uploaded by users) on Pinterest. This dataset is more than 200 times larger than MS COCO, the standard large-scale image dataset with sentence descriptions. In addition, we construct an evaluation dataset to directly assess the effectiveness of word embeddings in terms of finding semantically similar or related words and phrases. The word/phrase pairs in this evaluation dataset are collected from the click data with millions of users in an image search system, thus contain rich semantic relationships. Based on these datasets, we propose and compare several Recurrent Neural Networks (RNNs) based multimodal (text and image) models. Experiments show that our model benefits from incorporating the visual information into the word embeddings, and a weight sharing strategy is crucial for learning such multimodal embeddings. The project page is: http://www.stat.ucla.edu/~junhua.mao/multimodal_embedding.html (The datasets introduced in this work will be gradually released on the project page.).
Author Information
Junhua Mao (UCLA)
Jiajing Xu (Pinterest)
Kevin Jing (Pinterest)
Alan Yuille (JHU)
More from the Same Authors
-
2021 : Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge »
Jiyang Qi · Yan Gao · Yao Hu · Xinggang Wang · Xiaoyu Liu · Xiang Bai · Serge Belongie · Alan Yuille · Philip Torr · Song Bai -
2021 : Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Mapping »
prakhar kaushik · Adam Kortylewski · Alex Gain · Alan Yuille -
2022 : Volumetric Neural Human for Robust Pose Optimization via Analysis-by-synthesis »
Pengliang Ji · Angtian Wang · Yi Zhang · Adam Kortylewski · Alan Yuille -
2022 : Synthetic Tumors Make AI Segment Tumors Better »
Qixin Hu · Junfei Xiao · Alan Yuille · Zongwei Zhou -
2022 : Assembling Existing Labels from Public Datasets to\\Diagnose Novel Diseases: COVID-19 in Late 2019 »
Zengle Zhu · Mintong Kang · Alan Yuille · Zongwei Zhou -
2022 : Making Your First Choice: To Address Cold Start Problem in Vision Active Learning »
Liangyu Chen · Yutong Bai · Siyu Huang · Yongyi Lu · Bihan Wen · Alan Yuille · Zongwei Zhou -
2023 Poster: 3D-Aware Visual Question Answering about Parts, Poses and Occlusions »
XINGRUI WANG · Zhuowan Li · Wufei Ma · Adam Kortylewski · Alan Yuille -
2023 Poster: Annotating 8,000 Abdominal CT Volumes for Multi-Organ Segmentation in Three Weeks »
Chongyu Qu · Tiezheng Zhang · Hualin Qiao · jie liu · Yucheng Tang · Alan Yuille · Zongwei Zhou -
2021 Poster: Glance-and-Gaze Vision Transformer »
Qihang Yu · Yingda Xia · Yutong Bai · Yongyi Lu · Alan Yuille · Wei Shen -
2021 Poster: Are Transformers more robust than CNNs? »
Yutong Bai · Jieru Mei · Alan Yuille · Cihang Xie -
2021 Poster: Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose »
Angtian Wang · Shenxiao Mei · Alan Yuille · Adam Kortylewski -
2017 Poster: Label Distribution Learning Forests »
Wei Shen · KAI ZHAO · Yilu Guo · Alan Yuille -
2016 Poster: SURGE: Surface Regularized Geometry Estimation from a Single Image »
Peng Wang · Xiaohui Shen · Bryan Russell · Scott Cohen · Brian Price · Alan Yuille -
2015 Demonstration: Scaling up visual search for product recommendation »
Kevin Jing -
2015 Poster: Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question »
Haoyuan Gao · Junhua Mao · Jie Zhou · Zhiheng Huang · Lei Wang · Wei Xu -
2014 Workshop: Modern Nonparametrics 3: Automating the Learning Pipeline »
Eric Xing · Mladen Kolar · Arthur Gretton · Samory Kpotufe · Han Liu · Zoltán Szabó · Alan Yuille · Andrew G Wilson · Ryan Tibshirani · Sasha Rakhlin · Damian Kozbur · Bharath Sriperumbudur · David Lopez-Paz · Kirthevasan Kandasamy · Francesco Orabona · Andreas Damianou · Wacha Bounliphone · Yanshuai Cao · Arijit Das · Yingzhen Yang · Giulia DeSalvo · Dmitry Storcheus · Roberto Valerio -
2014 Poster: Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations »
Xianjie Chen · Alan Yuille -
2014 Poster: Learning From Weakly Supervised Data by The Expectation Loss SVM (e-SVM) algorithm »
Jun Zhu · Junhua Mao · Alan Yuille -
2010 Poster: Gaussian sampling by local perturbations »
George Papandreou · Alan Yuille -
2010 Poster: Functional form of motion priors in human motion perception »
HongJing Lu · Tungyou Lin · Alan L Lee · Luminita Vese · Alan Yuille -
2010 Poster: A unified model of short-range and long-range motion perception »
Shuang Wu · Xuming He · HongJing Lu · Alan Yuille -
2009 Poster: Modeling the spacing effect in sequential category learning »
HongJing Lu · Matthew Weiden · Alan Yuille -
2008 Poster: A Hierarchical Image Model for Polynomial-Time 2D Parsing »
Long Zhu · Yuanhao Chen · Yuan Lin · Alan Yuille -
2008 Poster: Model selection and velocity estimation using novel priors for motion patterns »
Alan Yuille · Shuang Wu · HongJing Lu -
2008 Spotlight: A Hierarchical Image Model for Polynomial-Time 2D Parsing »
Long Zhu · Yuanhao Chen · Yuan Lin · Alan Yuille -
2008 Oral: Model selection and velocity estimation using novel priors for motion patterns »
Alan Yuille · Shuang Wu · HongJing Lu -
2007 Workshop: The Grammar of Vision: Probabilistic Grammar-Based Models for Visual Scene Understanding and Object Categorization »
Virginia Savova · Josh Tenenbaum · Leslie Kaelbling · Alan Yuille -
2007 Poster: The Noisy-Logical Distribution and its Application to Causal Inference »
Alan Yuille · HongJing Lu -
2007 Poster: Rapid Inference on a novel AND/OR graph: Detection, Segmentation and Parsing of Articulated Deformable Objects in Cluttered Backgrounds »
Yuanhao Chen · Long Zhu · Chenxi Lin · Alan Yuille · Hongjiang Zhang -
2006 Talk: Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing »
Long Zhu · Yuanhao Chen · Alan Yuille -
2006 Poster: Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing »
Long Zhu · Yuanhao Chen · Alan Yuille