Skip to yearly menu bar Skip to main content

( events)   Timezone:  
Fri Dec 02 06:30 AM -- 03:45 PM (PST) @ Room 398 None
Table Representation Learning
Madelon Hulsebos · Bojan Karlaš · Pengcheng Yin · haoyu dong

Workshop Home Page

We develop large models to “understand” images, videos and natural language that fuel many intelligent applications from text completion to self-driving cars. But tabular data has long been overlooked despite its dominant presence in data-intensive systems. By learning latent representations from (semi-)structured tabular data, pretrained table models have shown preliminary but impressive performance for semantic parsing, question answering, table understanding, and data preparation. Considering that such tasks share fundamental properties inherent to tables, representation learning for tabular data is an important direction to explore further. These works also surfaced many open challenges such as finding effective data encodings, pretraining objectives and downstream tasks.

Key questions that we aim to address in this workshop are:
- How should tabular data be encoded to make learned Table Models generalize across tasks?
- Which pre-training objectives, architectures, fine-tuning and prompting strategies, work for tabular data?
- How should the varying formats, data types, and sizes of tables be handled?
- To what extend can Language Models be adapted towards tabular data tasks and what are their limits?
- What tasks can existing Table Models accomplish well and what opportunities lie ahead?
- How do existing Table Models perform, what do they learn, where and how do they fall short?
- When and how should Table Models be updated in contexts where the underlying data source continuously evolves?

The First Table Representation Learning workshop is the first workshop in this emerging research area and is centered around three main goals:
1) Motivate tabular data as primal modality for representation learning and further shaping this area.
2) Showcase impactful applications of pretrained table models and discussing future opportunities thereof.
3) Foster discussion and collaboration across the machine learning, natural language processing, and data management communities.

Alon Halevy (keynote), Meta AI
Graham Neubig (keynote), Carnegie Mellon University
Carsten Binnig, TU Darmstadt
Çağatay Demiralp, Sigma Computing
Huan Sun, Ohio State University
Xinyun Chen, Google Brain


We invite submissions that address, but are not limited to, any of the following topics on machine learning for tabular data:
Representation Learning Representation learning techniques for structured (e.g., relational databases) or semi-structured (Web tables, spreadsheet tables) tabular data and interfaces to it. This includes developing specialized data encodings or adaptation of general-purpose ones (e.g., GPT-3) for tabular data, multimodal learning across tables, and other modalities (e.g., natural language, images, code), and relevant fine-tuning and prompting strategies.
Downstream Applications Machine learning applications involving tabular data, such as data preparation (e.g. data cleaning, integration, cataloging, anomaly detection), retrieval (e.g., semantic parsing, question answering, fact-checking), information extraction, and generation (e.g., table-to-text).
Upstream Applications Applications that use representation learning to optimize tabular data processing systems, such as table parsers (extracting tables from documents, spreadsheets, presentations, images), storage (e.g. compression, indexing), and querying (e.g. query plan optimization, cost estimation).
Industry Papers Applications of tabular representation models in production. Challenges of maintaining and managing table representation models in a fast evolving context, e.g. data updating, error correction, monitoring.
New Resources Survey papers, analyses, benchmarks and datasets for tabular representation models and their applications, visions and reflections to structure and guide future research.

Important dates
Submission open: 20 August 2022
Submission deadline: 26 September 2022
Notifications: 20 October 2022
Camera-ready, slides and recording upload: 3 November 2022
Workshop: 2 December 2022

Submission formats
Abstract: 1 page + references.
Extended abstract: at most 4 pages + references.
Regular paper: at least 6 pages + references.

Questions: (public) (private)

Opening Remarks (Notes)
Alon Halevy - "Structured Data Inside and Out" (Keynote)
Analysis of the Attention in Tabular Language Models (Talk)
Huan Sun - "Self-supervised Pre-training on Tables" (Talk)
Coffee/Tea Break (Break)
Poster Session 1 (Poster Session)
Carsten Binnig - Pre-trained Models for Learned DBMS Components (Talk)
STable: Table Generation Framework for Encoder-Decoder Models (Talk)
Transfer Learning with Deep Tabular Models (Talk)
Lunch Break (Break)
Graham Neubig - "Unsupervised Methods for Table and Schema Understanding" (Keynote)
Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning (Talk)
Byung-Hak - "RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild" (Talk)
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second (Talk)
Coffee/Tea Break (Break)
Poster Session 2 (Poster Session)
Xinyun Chen - "Program Synthesis from Semi-Structured Context" (Talk)
Panel [Huan Sun (chair), Frank Hutter, Heng Ji, Julian Eisenschlos, Gaël Varoquaux, Graham Neubig] (Panel)
Closing Remarks (Notes)
Generic Entity Resolution Models (Poster)
CASPR: Customer Activity Sequence based Prediction and Representation (Poster)
RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild (Poster)
Towards Foundation Models for Relational Databases [Vision Paper] (Poster)
SiMa: Federating Data Silos using GNNs (Poster)
MET: Masked Encoding for Tabular Data (Poster)
The Need for Tabular Representation Learning: An Industry Perspective (Poster)
SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training (Poster)
STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables (Poster)
STab: Self-supervised Learning for Tabular Data (Poster)
Transfer Learning with Deep Tabular Models (Poster)
Diffusion models for missing value imputation in tabular data (Poster)
Conditional Contrastive Networks (Poster)
Active Learning with Table Language Models (Poster)
Self Supervised Pre-training for Large Scale Tabular Data (Poster)
STable: Table Generation Framework for Encoder-Decoder Models (Poster)
Tabular Data Generation: Can We Fool XGBoost ? (Poster)
Analysis of the Attention in Tabular Language Models (Poster)
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second (Poster)
MapQA: A Dataset for Question Answering on Choropleth Maps (Poster)
Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning (Poster)
Self-supervised Representation Learning Across Sequential and Tabular Features Using Transformers (Poster)
Structural Embedding of Data Files with MAGRITTE (Poster)
RoTaR: Efficient Row-Based Table Representation Learning via Teacher-Student Training (Short Paper) (Poster)