Skip to yearly menu bar Skip to main content

Workshop: Table Representation Learning Workshop

Invited talk: Co-Designing LLMs and LLM-Powered Data Management Tools

Simran Arora

[ ]
Fri 15 Dec 6:45 a.m. PST — 7:15 a.m. PST


Large Language Models (LLMs) are now widely used for data management. We recently proposed Evaporate [ICLR Spotlight 2023, VLDB 2024], a system that uses LLMs to help users efficiently query semi-structured documents. We also showed how off-the-shelf LLMs perform data-wrangling tasks with state-of-the-art quality and no specialized training [VLDB 2023]. This talk discusses some of my lessons from working on these early LLM-for-data-management projects and subsequent research to improve the reach of these systems — in particular, there is ways to go for extending LLMs to datatypes such as private, semi-structured, and long-sequence data. Towards extending our capabilities on these datatypes, I’ll discuss MQAR and Monarch Mixer [NeurIPS Oral 2023], new LM architectures that can match the quality of attention-based LMs, while remaining asymptotically more efficient at training and inference time. We’ll finally discuss how these fundamental breakthroughs can power next-generation data management tools.

Chat is not available.