NeurIPS 2022
Skip to yearly menu bar Skip to main content


Table Representation Learning

Madelon Hulsebos · Bojan Karlaš · Pengcheng Yin · haoyu dong

Room 398

We develop large models to “understand” images, videos and natural language that fuel many intelligent applications from text completion to self-driving cars. But tabular data has long been overlooked despite its dominant presence in data-intensive systems. By learning latent representations from (semi-)structured tabular data, pretrained table models have shown preliminary but impressive performance for semantic parsing, question answering, table understanding, and data preparation. Considering that such tasks share fundamental properties inherent to tables, representation learning for tabular data is an important direction to explore further. These works also surfaced many open challenges such as finding effective data encodings, pretraining objectives and downstream tasks.

Key questions that we aim to address in this workshop are:
- How should tabular data be encoded to make learned Table Models generalize across tasks?
- Which pre-training objectives, architectures, fine-tuning and prompting strategies, work for tabular data?
- How should the varying formats, data types, and sizes of tables be handled?
- To what extend can Language Models be adapted towards tabular data tasks and what are their limits?
- What tasks can existing Table Models accomplish well and what opportunities lie ahead?
- How do existing Table Models perform, what do they learn, where and how do they fall short?
- When and how should Table Models be updated in contexts where the underlying data source continuously evolves?

The First Table Representation Learning workshop is the first workshop in this emerging research area and is centered around three main goals:
1) Motivate tabular data as primal modality for representation learning and further shaping this area.
2) Showcase impactful applications of pretrained table models and discussing future opportunities thereof.
3) Foster discussion and collaboration across the machine learning, natural language processing, and data management communities.

Alon Halevy (keynote), Meta AI
Graham Neubig (keynote), Carnegie Mellon University
Carsten Binnig, TU Darmstadt
Çağatay Demiralp, Sigma Computing
Huan Sun, Ohio State University
Xinyun Chen, Google Brain


We invite submissions that address, but are not limited to, any of the following topics on machine learning for tabular data:
Representation Learning Representation learning techniques for structured (e.g., relational databases) or semi-structured (Web tables, spreadsheet tables) tabular data and interfaces to it. This includes developing specialized data encodings or adaptation of general-purpose ones (e.g., GPT-3) for tabular data, multimodal learning across tables, and other modalities (e.g., natural language, images, code), and relevant fine-tuning and prompting strategies.
Downstream Applications Machine learning applications involving tabular data, such as data preparation (e.g. data cleaning, integration, cataloging, anomaly detection), retrieval (e.g., semantic parsing, question answering, fact-checking), information extraction, and generation (e.g., table-to-text).
Upstream Applications Applications that use representation learning to optimize tabular data processing systems, such as table parsers (extracting tables from documents, spreadsheets, presentations, images), storage (e.g. compression, indexing), and querying (e.g. query plan optimization, cost estimation).
Industry Papers Applications of tabular representation models in production. Challenges of maintaining and managing table representation models in a fast evolving context, e.g. data updating, error correction, monitoring.
New Resources Survey papers, analyses, benchmarks and datasets for tabular representation models and their applications, visions and reflections to structure and guide future research.

Important dates
Submission open: 20 August 2022
Submission deadline: 26 September 2022
Notifications: 20 October 2022
Camera-ready, slides and recording upload: 3 November 2022
Workshop: 2 December 2022

Submission formats
Abstract: 1 page + references.
Extended abstract: at most 4 pages + references.
Regular paper: at least 6 pages + references.

Questions: (public) (private)

Chat is not available.
Timezone: America/Los_Angeles