Skip to yearly menu bar Skip to main content

Workshop: Table Representation Learning Workshop

Pool-Search-Demonstrate: Improving Data-wrangling LLMs via better in-context examples

Joon Suk Huh · Changho Shin · Elina Choi

Keywords: [ Data wrangling ] [ Foundation Model ] [ large language model ] [ Database ]

[ ] [ Project Page ]
Fri 15 Dec 7:23 a.m. PST — 7:30 a.m. PST
presentation: Table Representation Learning Workshop
Fri 15 Dec 6:30 a.m. PST — 3:30 p.m. PST


Data-wrangling is a process that transforms raw data for further analysis and for use in downstream tasks. Recently, it has been shown that foundation models can be successfully used for data-wrangling tasks (Narayan et. al., 2022). An important aspect of data wrangling with LMs is to properly construct prompts for the given task. Within these prompts, a crucial component is the choice of in-context examples. In the previous study of Narayan et. al., demonstration examples are chosen manually by the authors, which may not be scalable to new datasets. In this work, we propose a simple demonstration strategy that individualizes demonstration examples for each input by selecting them from a pool based on their distance in the embedding space. Additionally, we propose a postprocessing method that exploits the embedding of labels under a closed-world assumption. Empirically, our embedding-based example retrieval and postprocessing improve foundation models' performance by up to 84\% over randomly selected examples and 49\% over manually selected examples in the demonstration. Ablation tests reveal the effect of class embeddings, and various factors in demonstration such as quantity, quality, and diversity.

Chat is not available.