Table Representation Learning

Workshop

Table Representation Learning

Madelon Hulsebos · Bojan Karlaš · Pengcheng Yin · haoyu dong

Room 398

Fri 2 Dec, 6:30 a.m. PST

[ Abstract ] Workshop Website

[ Contact: table-representation-learning-workshop@googlegroups.com ]

We develop large models to “understand” images, videos and natural language that fuel many intelligent applications from text completion to self-driving cars. But tabular data has long been overlooked despite its dominant presence in data-intensive systems. By learning latent representations from (semi-)structured tabular data, pretrained table models have shown preliminary but impressive performance for semantic parsing, question answering, table understanding, and data preparation. Considering that such tasks share fundamental properties inherent to tables, representation learning for tabular data is an important direction to explore further. These works also surfaced many open challenges such as finding effective data encodings, pretraining objectives and downstream tasks.

Key questions that we aim to address in this workshop are:
- How should tabular data be encoded to make learned Table Models generalize across tasks?
- Which pre-training objectives, architectures, fine-tuning and prompting strategies, work for tabular data?
- How should the varying formats, data types, and sizes of tables be handled?
- To what extend can Language Models be adapted towards tabular data tasks and what are their limits?
- What tasks can existing Table Models accomplish well and what opportunities lie ahead?
- How do existing Table Models perform, what do they learn, where and how do they fall short?
- When and how should Table Models be updated in contexts where the underlying data source continuously evolves?

The First Table Representation Learning workshop is the first workshop in this emerging research area and is centered around three main goals:
1) Motivate tabular data as primal modality for representation learning and further shaping this area.
2) Showcase impactful applications of pretrained table models and discussing future opportunities thereof.
3) Foster discussion and collaboration across the machine learning, natural language processing, and data management communities.

Speakers
Alon Halevy (keynote), Meta AI
Graham Neubig (keynote), Carnegie Mellon University
Carsten Binnig, TU Darmstadt
Çağatay Demiralp, Sigma Computing
Huan Sun, Ohio State University
Xinyun Chen, Google Brain

Panelists
TBA

Scope
We invite submissions that address, but are not limited to, any of the following topics on machine learning for tabular data:
Representation Learning Representation learning techniques for structured (e.g., relational databases) or semi-structured (Web tables, spreadsheet tables) tabular data and interfaces to it. This includes developing specialized data encodings or adaptation of general-purpose ones (e.g., GPT-3) for tabular data, multimodal learning across tables, and other modalities (e.g., natural language, images, code), and relevant fine-tuning and prompting strategies.
Downstream Applications Machine learning applications involving tabular data, such as data preparation (e.g. data cleaning, integration, cataloging, anomaly detection), retrieval (e.g., semantic parsing, question answering, fact-checking), information extraction, and generation (e.g., table-to-text).
Upstream Applications Applications that use representation learning to optimize tabular data processing systems, such as table parsers (extracting tables from documents, spreadsheets, presentations, images), storage (e.g. compression, indexing), and querying (e.g. query plan optimization, cost estimation).
Industry Papers Applications of tabular representation models in production. Challenges of maintaining and managing table representation models in a fast evolving context, e.g. data updating, error correction, monitoring.
New Resources Survey papers, analyses, benchmarks and datasets for tabular representation models and their applications, visions and reflections to structure and guide future research.

Important dates
Submission open: 20 August 2022
Submission deadline: 26 September 2022
Notifications: 20 October 2022
Camera-ready, slides and recording upload: 3 November 2022
Workshop: 2 December 2022

Submission formats
Abstract: 1 page + references.
Extended abstract: at most 4 pages + references.
Regular paper: at least 6 pages + references.

Questions:
table-representation-learning-workshop@googlegroups.com (public)
m.hulsebos@uva.nl (private)

Chat is not available.

Timezone: America/Los_Angeles

Schedule

Fri 6:30 a.m. - 6:45 a.m.	Opening Remarks ( Notes ) > SlidesLive Video	🔗
Fri 6:45 a.m. - 7:30 a.m.	Alon Halevy - "Structured Data Inside and Out" ( Keynote ) > SlidesLive Video	Alon Halevy 🔗
Fri 7:30 a.m. - 7:45 a.m.	Analysis of the Attention in Tabular Language Models ( Talk ) > link SlidesLive Video Link	Aneta Koleva · Martin Ringsquandl · Volker Tresp 🔗
Fri 7:45 a.m. - 8:15 a.m.	Huan Sun - "Self-supervised Pre-training on Tables" ( Talk ) > SlidesLive Video	Huan Sun 🔗
Fri 8:15 a.m. - 8:30 a.m.	Coffee/Tea Break	🔗
Fri 8:30 a.m. - 9:15 a.m.	Poster Session 1 ( Poster Session ) >	🔗
Fri 9:15 a.m. - 9:45 a.m.	Carsten Binnig - Pre-trained Models for Learned DBMS Components ( Talk ) > SlidesLive Video	Carsten Binnig 🔗
Fri 9:45 a.m. - 10:00 a.m.	STable: Table Generation Framework for Encoder-Decoder Models ( Talk ) > link SlidesLive Video Link	Michał Pietruszka · Michał Turski · Łukasz Borchmann · Tomasz Dwojak · Gabriela Pałka · Karolina Szyndler · Dawid Jurkiewicz · Łukasz Garncarek 🔗
Fri 10:00 a.m. - 10:15 a.m.	Transfer Learning with Deep Tabular Models ( Talk ) > link SlidesLive Video Link	Roman Levin · Valeriia Cherepanova · Avi Schwarzschild · Arpit Bansal · C. Bayan Bruss · Tom Goldstein · Andrew Wilson · Micah Goldblum 🔗
Fri 10:15 a.m. - 11:30 a.m.	Lunch Break	🔗
Fri 11:30 a.m. - 12:15 p.m.	Graham Neubig - "Unsupervised Methods for Table and Schema Understanding" ( Keynote ) > SlidesLive Video	Graham Neubig 🔗
Fri 12:15 p.m. - 12:30 p.m.	Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning ( Talk ) > link SlidesLive Video Link	David Vos · Till Döhmen · Sebastian Schelter 🔗
Fri 12:30 p.m. - 12:45 p.m.	Byung-Hak - "RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild" ( Talk ) > SlidesLive Video	🔗
Fri 12:45 p.m. - 1:00 p.m.	TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second ( Talk ) > link SlidesLive Video Link	Noah Hollmann · Samuel Müller · Katharina Eggensperger · Frank Hutter 🔗
Fri 1:15 p.m. - 1:30 p.m.	Coffee/Tea Break	🔗
Fri 1:30 p.m. - 2:00 p.m.	Poster Session 2 ( Poster Session ) >	🔗
Fri 2:00 p.m. - 2:30 p.m.	Xinyun Chen - "Program Synthesis from Semi-Structured Context" ( Talk ) > SlidesLive Video	Xinyun Chen 🔗
Fri 2:30 p.m. - 3:30 p.m.	Panel [Huan Sun (chair), Frank Hutter, Heng Ji, Julian Eisenschlos, Gaël Varoquaux, Graham Neubig] ( Panel ) > SlidesLive Video	🔗
Fri 3:30 p.m. - 3:45 p.m.	Closing Remarks ( Notes ) > SlidesLive Video	🔗
-	The Need for Tabular Representation Learning: An Industry Perspective ( Poster ) > link Link	13 presenters Joyce Cahoon · Alexandra Savelieva · Andreas Mueller · Avrilia Floratou · Carlo Curino · Hiren Patel · Jordan Henkel · Markus Weimer · Roman Batoukov · Shaleen Deep · Venkatesh Emani · Richard Wydrowski · Nellie Gustafsson 🔗
-	SAINT: Improved Neural Networks for Tabular Data via Row Attention and Contrastive Pre-Training ( Poster ) > link Link	Gowthami Somepalli · Avi Schwarzschild · Micah Goldblum · C. Bayan Bruss · Tom Goldstein 🔗
-	Generic Entity Resolution Models ( Poster ) > link Link	Jiawei Tang · Yifei Zuo · Lei Cao · Samuel Madden 🔗
-	RoTaR: Efficient Row-Based Table Representation Learning via Teacher-Student Training (Short Paper) ( Poster ) > link Link	Zui Chen · Lei Cao · Samuel Madden 🔗
-	SiMa: Federating Data Silos using GNNs ( Poster ) > link Link	Christos Koutras · Rihan Hai · Kyriakos Psarakis · Marios Fragkoulis · Asterios Katsifodimos 🔗
-	STUNT: Few-shot Tabular Learning with Self-generated Tasks from Unlabeled Tables ( Poster ) > link Link	Jaehyun Nam · Jihoon Tack · Kyungmin Lee · Hankook Lee · Jinwoo Shin 🔗
-	Analysis of the Attention in Tabular Language Models ( Poster ) > link Link	Aneta Koleva · Martin Ringsquandl · Volker Tresp 🔗
-	Towards Foundation Models for Relational Databases [Vision Paper] ( Poster ) > link Link	Liane Vogel · Benjamin Hilprecht · Carsten Binnig 🔗
-	Transfer Learning with Deep Tabular Models ( Poster ) > link Link	Roman Levin · Valeriia Cherepanova · Avi Schwarzschild · Arpit Bansal · C. Bayan Bruss · Tom Goldstein · Andrew Wilson · Micah Goldblum 🔗
-	RegCLR: A Self-Supervised Framework for Tabular Representation Learning in the Wild ( Poster ) > link Link	Weiyao Wang · Byung-Hak Kim · Varun Ganapathi 🔗
-	Diffusion models for missing value imputation in tabular data ( Poster ) > link Link	Shuhan Zheng · Nontawat Charoenphakdee 🔗
-	STab: Self-supervised Learning for Tabular Data ( Poster ) > link Link	Ehsan Hajiramezanali · Max Shen · Gabriele Scalia · Nathaniel Diamant 🔗
-	TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second ( Poster ) > link Link	Noah Hollmann · Samuel Müller · Katharina Eggensperger · Frank Hutter 🔗
-	MapQA: A Dataset for Question Answering on Choropleth Maps ( Poster ) > link Link	Shuaichen Chang · David Palzer · Jialin Li · Eric Fosler-Lussier · Ningchuan Xiao 🔗
-	Towards Parameter-Efficient Automation of Data Wrangling Tasks with Prefix-Tuning ( Poster ) > link Link	David Vos · Till Döhmen · Sebastian Schelter 🔗
-	CASPR: Customer Activity Sequence based Prediction and Representation ( Poster ) > link Link	Damian Kowalczyk · Pin-Jung Chen · Sahil Bhatnagar 🔗
-	MET: Masked Encoding for Tabular Data ( Poster ) > link Link	Kushal Majmundar · Sachin Goyal · Praneeth Netrapalli · Prateek Jain 🔗
-	Conditional Contrastive Networks ( Poster ) > link Link	Emily Mu · John Guttag 🔗
-	Structural Embedding of Data Files with MAGRITTE ( Poster ) > link Link	Gerardo Vitagliano · Mazhar Hameed · Felix Naumann 🔗
-	Active Learning with Table Language Models ( Poster ) > link Link	Martin Ringsquandl · Aneta Koleva 🔗
-	Self-supervised Representation Learning Across Sequential and Tabular Features Using Transformers ( Poster ) > link Link	Rajat Agarwal · Anand Muralidhar · Agniva Som · Hemant Kowshik 🔗
-	Self Supervised Pre-training for Large Scale Tabular Data ( Poster ) > link Link	Sharad Chitlangia · Anand Muralidhar · Rajat Agarwal 🔗
-	STable: Table Generation Framework for Encoder-Decoder Models ( Poster ) > link Link	Michał Pietruszka · Michał Turski · Łukasz Borchmann · Tomasz Dwojak · Gabriela Pałka · Karolina Szyndler · Dawid Jurkiewicz · Łukasz Garncarek 🔗
-	Tabular Data Generation: Can We Fool XGBoost ? ( Poster ) > link Link	EL Hacen Zein · Tanguy Urvoy 🔗