Timezone: »
Accurately measuring the similarity between text documents lies at the core of many real world applications of machine learning. These include web-search ranking, document recommendation, multi-lingual document matching, and article categorization. Recently, a new document metric, the word mover's distance (WMD), has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high quality word embeddings to document metrics by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely unsupervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised WMD (S-WMD) metric. Our algorithm learns document distances that measure the underlying semantic differences between documents by leveraging semantic differences between individual words discovered during supervised training. This is achieved with an linear transformation of the underlying word embedding space and tailored word-specific weights, learned to minimize the stochastic leave-one-out nearest neighbor classification error on a per-document level. We evaluate our metric on eight real-world text classification tasks on which S-WMD consistently outperforms almost all of our 26 competitive baselines.
Author Information
Gao Huang (Cornell University)
Chuan Guo (Cornell University)
Matt J Kusner (Washington University in St. Louis)
Yu Sun (Cornell University)
Fei Sha (University of Southern California)
Kilian Weinberger (Cornell University / ASAPP Research)
More from the Same Authors
-
2021 : Fixed Neural Network Steganography: Train the images, not the network »
Varsha Kishore · Xiangyu Chen · Yan Wang · Boyi Li · Kilian Weinberger -
2022 Poster: Contrastive Language-Image Pre-Training with Knowledge Graphs »
Xuran Pan · Tianzhu Ye · Dongchen Han · Shiji Song · Gao Huang -
2022 Poster: Provable General Function Class Representation Learning in Multitask Bandits and MDP »
Rui Lu · Andrew Zhao · Simon Du · Gao Huang -
2022 Poster: A Mixture Of Surprises for Unsupervised Reinforcement Learning »
Andrew Zhao · Matthieu Lin · Yangguang Li · Yong-jin Liu · Gao Huang -
2022 Poster: Efficient Knowledge Distillation from Model Checkpoints »
Chaofei Wang · Qisen Yang · Rui Huang · Shiji Song · Gao Huang -
2022 : Boosting Offline Reinforcement Learning via Data Resampling »
Yang Yue · Bingyi Kang · Xiao Ma · Zhongwen Xu · Gao Huang · Shuicheng Yan -
2023 Poster: Teaching Cars to See in a Day: Unsupervised Object Discovery with Reward Fine-tuning »
Katie Luo · Zhenzhen Liu · Xiangyu Chen · Yurong You · Sagie Benaim · Cheng Perng Phoo · Mark Campbell · Wen Sun · Bharath Hariharan · Kilian Weinberger -
2023 Poster: Latent Diffusion for Language Generation »
Justin Lovelace · Varsha Kishore · Chao Wan · Eliot Shekhtman · Kilian Weinberger -
2022 Spotlight: Lightning Talks 4A-4 »
Yunhao Tang · LING LIANG · Thomas Chau · Daeha Kim · Junbiao Cui · Rui Lu · Lei Song · Byung Cheol Song · Andrew Zhao · Remi Munos · Łukasz Dudziak · Jiye Liang · Ke Xue · Kaidi Xu · Mark Rowland · Hongkai Wen · Xing Hu · Xiaobin Huang · Simon Du · Nicholas Lane · Chao Qian · Lei Deng · Bernardo Avila Pires · Gao Huang · Will Dabney · Mohamed Abdelfattah · Yuan Xie · Marc Bellemare -
2022 Spotlight: Provable General Function Class Representation Learning in Multitask Bandits and MDP »
Rui Lu · Andrew Zhao · Simon Du · Gao Huang -
2022 Spotlight: Lightning Talks 1B-3 »
Chaofei Wang · Qixun Wang · Jing Xu · Long-Kai Huang · Xi Weng · Fei Ye · Harsh Rangwani · shrinivas ramasubramanian · Yifei Wang · Qisen Yang · Xu Luo · Lei Huang · Adrian G. Bors · Ying Wei · Xinglin Pan · Sho Takemori · Hong Zhu · Rui Huang · Lei Zhao · Yisen Wang · Kato Takashi · Shiji Song · Yanan Li · Rao Anwer · Yuhei Umeda · Salman Khan · Gao Huang · Wenjie Pei · Fahad Shahbaz Khan · Venkatesh Babu R · Zenglin Xu -
2022 Spotlight: Efficient Knowledge Distillation from Model Checkpoints »
Chaofei Wang · Qisen Yang · Rui Huang · Shiji Song · Gao Huang -
2022 Poster: Latency-aware Spatial-wise Dynamic Networks »
Yizeng Han · Zhihang Yuan · Yifan Pu · Chenhao Xue · Shiji Song · Guangyu Sun · Gao Huang -
2022 Poster: Unsupervised Adaptation from Repeated Traversals for Autonomous Driving »
Yurong You · Cheng Perng Phoo · Katie Luo · Travis Zhang · Wei-Lun Chao · Bharath Hariharan · Mark Campbell · Kilian Weinberger -
2021 Poster: Online Adaptation to Label Distribution Shift »
Ruihan Wu · Chuan Guo · Yi Su · Kilian Weinberger -
2021 Poster: Fixes That Fail: Self-Defeating Improvements in Machine-Learning Systems »
Ruihan Wu · Chuan Guo · Awni Hannun · Laurens van der Maaten -
2021 Poster: ReAct: Out-of-distribution Detection With Rectified Activations »
Yiyou Sun · Chuan Guo · Yixuan Li -
2021 Poster: BulletTrain: Accelerating Robust Neural Network Training via Boundary Example Mining »
Weizhe Hua · Yichi Zhang · Chuan Guo · Zhiru Zhang · G. Edward Suh -
2020 : Panel »
Kilian Weinberger · Maria De-Arteaga · Shibani Santurkar · Jonathan Frankle · Deborah Raji -
2020 : Q&A with Kilian »
Kilian Weinberger -
2020 : Invited: Kilian Weinberger »
Kilian Weinberger -
2020 Poster: Identifying Mislabeled Data using the Area Under the Margin Ranking »
Geoff Pleiss · Tianyi Zhang · Ethan Elenberg · Kilian Weinberger -
2020 Poster: Wasserstein Distances for Stereo Disparity Estimation »
Divyansh Garg · Yan Wang · Bharath Hariharan · Mark Campbell · Kilian Weinberger · Wei-Lun Chao -
2020 Spotlight: Wasserstein Distances for Stereo Disparity Estimation »
Divyansh Garg · Yan Wang · Bharath Hariharan · Mark Campbell · Kilian Weinberger · Wei-Lun Chao -
2019 Poster: Breaking the Glass Ceiling for Embedding-Based Classifiers for Large Output Spaces »
Chuan Guo · Ali Mousavi · Xiang Wu · Daniel Holtmann-Rice · Satyen Kale · Sashank Reddi · Sanjiv Kumar -
2019 Poster: Positional Normalization »
Boyi Li · Felix Wu · Kilian Weinberger · Serge Belongie -
2019 Spotlight: Positional Normalization »
Boyi Li · Felix Wu · Kilian Weinberger · Serge Belongie -
2019 Poster: Exact Gaussian Processes on a Million Data Points »
Ke Alexander Wang · Geoff Pleiss · Jacob Gardner · Stephen Tyree · Kilian Weinberger · Andrew Gordon Wilson -
2019 Poster: A New Defense Against Adversarial Images: Turning a Weakness into a Strength »
Shengyuan Hu · Tao Yu · Chuan Guo · Wei-Lun Chao · Kilian Weinberger -
2018 Poster: GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration »
Jacob Gardner · Geoff Pleiss · Kilian Weinberger · David Bindel · Andrew Wilson -
2018 Spotlight: GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration »
Jacob Gardner · Geoff Pleiss · Kilian Weinberger · David Bindel · Andrew Wilson -
2017 Poster: On Fairness and Calibration »
Geoff Pleiss · Manish Raghavan · Felix Wu · Jon Kleinberg · Kilian Weinberger -
2016 Poster: Supervised Word Mover's Distance »
Gao Huang · Chuan Guo · Matt J Kusner · Yu Sun · Fei Sha · Kilian Weinberger -
2015 : Deep Manifold Traversal »
Kilian Weinberger -
2015 Poster: Fast Distributed k-Center Clustering with Outliers on Massive Data »
Gustavo Malkomes · Matt J Kusner · Wenlin Chen · Kilian Q Weinberger · Benjamin Moseley -
2015 Poster: Bayesian Active Model Selection with an Application to Automated Audiometry »
Jacob Gardner · Gustavo Malkomes · Roman Garnett · Kilian Weinberger · Dennis Barbour · John Cunningham -
2014 Workshop: Representation and Learning Methods for Complex Outputs »
Richard Zemel · Dale Schuurmans · Kilian Q Weinberger · Yuhong Guo · Jia Deng · Francesco Dinuzzo · Hal Daumé III · Honglak Lee · Noah A Smith · Richard Sutton · Jiaqian YU · Vitaly Kuznetsov · Luke Vilnis · Hanchen Xiong · Calvin Murdock · Thomas Unterthiner · Jean-Francis Roy · Martin Renqiang Min · Hichem SAHBI · Fabio Massimo Zanzotto -
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser -
2012 Poster: Non-linear Metric Learning »
Dor Kedem · Stephen Tyree · Kilian Q Weinberger · Fei Sha · Gert Lanckriet -
2011 Workshop: Beyond Mahalanobis: Supervised Large-Scale Learning of Similarity »
Greg Shakhnarovich · Dhruv Batra · Brian Kulis · Kilian Q Weinberger -
2011 Poster: Co-Training for Domain Adaptation »
Minmin Chen · Kilian Q Weinberger · John Blitzer -
2010 Session: Oral Session 16 »
Kilian Q Weinberger -
2010 Poster: Large Margin Multi-Task Metric Learning »
Shibin Parameswaran · Kilian Q Weinberger -
2010 Poster: Decoding Ipsilateral Finger Movements from ECoG Signals in Humans »
Yuzong Liu · Mohit Sharma · Charles M Gaona · Jonathan D Breshears · jarod Roland · zachary V Freudenburg · Kilian Q Weinberger · Eric C Leuthardt -
2008 Poster: Large Margin Taxonomy Embedding for Document Categorization »
Kilian Q Weinberger · Olivier Chapelle -
2008 Spotlight: Large Margin Taxonomy Embedding for Document Categorization »
Kilian Q Weinberger · Olivier Chapelle -
2006 Workshop: Novel Applications of Dimensionality Reduction »
John Blitzer · Rajarshi Das · Irina Rish · Kilian Q Weinberger -
2006 Poster: Graph Regularization for Maximum Variance Unfolding with an Application to Sensor Localization »
Kilian Q Weinberger · Fei Sha · Qihui Zhu · Lawrence Saul