Skip to yearly menu bar Skip to main content


Poster

DAVED: Data Acquisition via Experimental Design for Data Markets

Charlie Lu · Baihe Huang · Sai Praneeth Karimireddy · Praneeth Vepakomma · Michael Jordan · Ramesh Raskar


Abstract:

The acquisition of training data is crucial for machine learning applications. Data markets can increase the supply of data, particularly in data-scarce domains such as healthcare, by incentivizing potential data providers to join the market. A major challenge for a data buyer in such a market is choosing the most valuable data points from a data seller. Unlike prior work in data valuation, which assumes centralized data access, we propose a federated approach to the data acquisition problem that is inspired by linear experimental design. Our proposed data acquisition method achieves lower prediction error without requiring labeled validation data and can be optimized in a fast and federated procedure. The key insight of our work is that a method that directly estimates the benefit of acquiring data for test set prediction is particularly compatible with a decentralized market setting.

Live content is unavailable. Log in and register to view live content