Timezone: »

Generic Entity Resolution Models
Jiawei Tang · Yifei Zuo · Lei Cao · Samuel Madden
Event URL: https://openreview.net/forum?id=tRkVo1jMas »

Entity resolution (ER) -- which decides whether two data records refer to the same real-world object -- is a long-standing data integration problem. The state-of-the-art results on ER are achieved by deep learning based methods, which typically convert each pair of records into a distributed representation, followed by using a binary classifier to decide whether these two records are a match or a non-match.However, these methods are dataset specific; that is, one deep learning based model needs to be trained or fine-tuned for each new dataset, which is not generalizable and thus we call them specific ER models. In this paper, we investigate generic ER models, which use a single model to serve multiple ER datasets over different datasets from various domains. In particular, we study two types of generic ER models: Employs foundation models ( e.g., GPT-3) or trains a generic ER model. Our results show that although GPT-3 can perform ER with zero-shot or few-shot learning, the performance is worse than specific ER models. Our trained generic ER model can achieve comparable performance with specific ER models, but with much less train data and much smaller storage overhead.

Author Information

Jiawei Tang (Massachusetts Institute of Technology)
Yifei Zuo
Lei Cao (University of Arizona/MIT)

Assistant Professor of University of Arizona and Research Scientist at MIT

Samuel Madden (Massachusetts Institute of Technology)

More from the Same Authors