Polynomial Semantic Indexing
Bing Bai · Jason E Weston · David Grangier · Ronan Collobert · Kunihiko Sadamasa · Yanjun Qi · Corinna Cortes · Mehryar Mohri

We present a class of nonlinear (polynomial) models that are discriminatively trained to directly map from the word content in a query-document or document-document pair to a ranking score. Dealing with polynomial models on word features is computationally challenging. We propose a low rank (but diagonal preserving) representation of our polynomial models to induce feasible memory and computation requirements. We provide an empirical study on retrieval tasks based on Wikipedia documents, where we obtain state-of-the-art performance while providing realistically scalable methods.

Author Information

Bing Bai (NEC Labs America)
Jason E Weston (Facebook AI Research)

Jason Weston received a PhD. (2000) from Royal Holloway, University of London under the supervision of Vladimir Vapnik. From 2000 to 2002, he was a researcher at Biowulf technologies, New York, applying machine learning to bioinformatics. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2004 to June 2009 he was a research staff member at NEC Labs America, Princeton. From July 2009 onwards he has been a research scientist at Google, New York. Jason Weston's current research focuses on various aspects of statistical machine learning and its applications, particularly in text and images.

David Grangier (NEC Labs America)
Ronan Collobert (Facebook)

Ronan Collobert received his master degree in pure mathematics from University of Rennes (France) in 2000. He then performed graduate studies in University of Montreal and IDIAP (Switzerland) under the Bengio brothers, and received his PhD in 2004 from University of Paris VI. He joined NEC Labs (USA) in January 2005 as a postdoc, and became a research staff member after about one year. His research interests always focused on large-scale machine-learning algorithms, with a particular interest in semi-supervised learning and deep learning architectures. Two years ago, his research shifted in the natural language processing area, slowly going towards automatic text understanding.

Kunihiko Sadamasa
Yanjun Qi (University of Virginia)
Corinna Cortes (Google Research)
Mehryar Mohri (Google Research & Courant Institute of Mathematical Sciences)

