Timezone: »

A Neural Corpus Indexer for Document Retrieval
Yujing Wang · Yingyan Hou · Haonan Wang · Ziming Miao · Shibin Wu · Hao Sun · Qi Chen · Yuqing Xia · Chengmin Chi · Guoshuai Zhao · Zheng Liu · Xing Xie · Hao Sun · Weiwei Deng · Qi Zhang · Mao Yang

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #242

Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying training and indexing stages can significantly improve the recall performance of traditional methods. To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query. To optimize the recall performance of NCI, we invent a prefix-aware weight-adaptive decoder architecture, and leverage tailored techniques including query generation, semantic document identifiers, and consistency-based regularization. Empirical studies demonstrated the superiority of NCI on two commonly used academic benchmarks, achieving +21.4% and +16.8% relative enhancement for Recall@1 on NQ320k dataset and R-Precision on TriviaQA dataset, respectively, compared to the best baseline method.

Author Information

Yujing Wang (Microsoft)
Yingyan Hou (Tsinghua University, Tsinghua University)
Haonan Wang (national university of singaore, National University of Singapore)
Ziming Miao (Microsoft)
Shibin Wu (Tsinghua University)
Hao Sun (Peking University)
Qi Chen (Microsoft Research Asia)
Yuqing Xia (Peking University)
Chengmin Chi (Microsoft)
Guoshuai Zhao (Beijing University of Posts and Telecommunications)
Zheng Liu (The Hong Kong University of Science and Technology)
Xing Xie (Microsoft Research Asia)
Hao Sun (Microsoft)
Weiwei Deng (South China University of Technology)
Qi Zhang (Microsoft)
Mao Yang (Microsoft Research Asia)

More from the Same Authors