Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Table Representation Learning Workshop

Introducing the Observatory Library for End-to-End Table Embedding Inference

Tianji Cong · Zhenjie Sun · Paul Groth · H. V. Jagadish · Madelon Hulsebos

Keywords: [ Table Representation Learning ] [ tabular language models ] [ End-to-End Table Embedding Inference ]


Abstract:

Transformer-based tabular language models have become prevalent for a wide range of applications involving tabular data. Such models require the serialization of a table as a sequence of tokens for model ingestion and embedding inference. Different downstream tasks require different kinds or levels of embeddings such as column or entity embeddings. Hence, various serialization and encoding methods have been proposed and implemented. Surprisingly, this conceptually simple process of creating table embeddings is not straightforward in practice for a few reasons: 1) a model may not natively expose a certain level of embedding; 2) choosing the correct table serialization and input preprocessing methods is difficult because there are many available; and 3) tables with a massive number of rows and columns cannot fit the input limit of models. In this work, we extend Observatory, a framework for characterizing embeddings of relational tables, by streamlining end-to-end inference of table embeddings, which eases the use of tabular language models in practice. The codebase of Observatory is publicly available at https://github.com/superctj/observatory.

Chat is not available.