Timezone: »
Data valuation arises as a non-trivial challenge in real-world use cases such as collaborative machine learning, federated learning, trusted data sharing, data marketplaces. The value of data is often associated with the learning performance (e.g., validation accuracy) of a model trained on the data, which introduces a close coupling between data valuation and validation. However, a validation set may notbe available in practice and it can be challenging for the data providers to reach an agreement on the choice of the validation set. Another practical issue is that of data replication: Given the value of some data points, a dishonest data provider may replicate these data points to exploit the valuation for a larger reward/payment. We observe that the diversity of the data points is an inherent property of a dataset that is independent of validation. We formalize diversity via the volume of the data matrix (i.e., determinant of its left Gram), which allows us to establish a formal connection between the diversity of data and learning performance without requiring validation. Furthermore, we propose a robust volume measure with a theoretical guarantee on the replication robustness by following the intuition that copying the same data points does not increase the diversity of data. We perform extensive experiments to demonstrate its consistency in valuation and practical advantages over existing baselines and show that our method is model- and task-agnostic and can be flexibly adapted to handle various neural networks.
Author Information
Xinyi Xu (National University of Singapore)

I am a fourth year Ph.D. student (funded by A*STAR through the ACIS Scholarship) in the department of computer science at National University of Singapore where I study multi-agent machine learning systems.
Zhaoxuan Wu (National University of Singapore)
Chuan Sheng Foo (Institute for Infocomm Research)
Bryan Kian Hsiang Low (National University of Singapore)
More from the Same Authors
-
2023 Poster: Exploiting Correlated Auxiliary Feedback in Parameterized Bandits »
Arun Verma · Zhongxiang Dai · YAO SHU · Bryan Kian Hsiang Low -
2023 Poster: Equitable Model Valuation with Black-box Access »
Xinyi Xu · Thanh Lam · Chuan Sheng Foo · Bryan Kian Hsiang Low -
2023 Poster: Quantum Bayesian Optimization »
Zhongxiang Dai · Gregory Kang Ruey Lau · Arun Verma · YAO SHU · Bryan Kian Hsiang Low · Patrick Jaillet -
2023 Poster: Batch Bayesian Optimization For Replicable Experimental Design »
Zhongxiang Dai · Quoc Phong Nguyen · Sebastian Tay · Daisuke Urano · Richalynn Leong · Bryan Kian Hsiang Low · Patrick Jaillet -
2023 Poster: Incentives in Private Collaborative Machine Learning »
Rachael Sim · Yehong Zhang · Nghia Hoang · Xinyi Xu · Bryan Kian Hsiang Low · Patrick Jaillet -
2023 Poster: Bayesian Optimization with Cost-varying Variable Subsets »
Sebastian Tay · Chuan Sheng Foo · Daisuke Urano · Richalynn Leong · Bryan Kian Hsiang Low -
2022 Poster: Trade-off between Payoff and Model Rewards in Shapley-Fair Collaborative Machine Learning »
Quoc Phong Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet -
2022 Poster: Sample-Then-Optimize Batch Neural Thompson Sampling »
Zhongxiang Dai · YAO SHU · Bryan Kian Hsiang Low · Patrick Jaillet -
2022 Poster: Unifying and Boosting Gradient-Based Training-Free Neural Architecture Search »
YAO SHU · Zhongxiang Dai · Zhaoxuan Wu · Bryan Kian Hsiang Low -
2021 Workshop: New Frontiers in Federated Learning: Privacy, Fairness, Robustness, Personalization and Data Ownership »
Nghia Hoang · Lam Nguyen · Pin-Yu Chen · Tsui-Wei Weng · Sara Magliacane · Bryan Kian Hsiang Low · Anoop Deoras -
2021 Poster: Differentially Private Federated Bayesian Optimization with Distributed Exploration »
Zhongxiang Dai · Bryan Kian Hsiang Low · Patrick Jaillet -
2021 Poster: Gradient Driven Rewards to Guarantee Fairness in Collaborative Machine Learning »
Xinyi Xu · Lingjuan Lyu · Xingjun Ma · Chenglin Miao · Chuan Sheng Foo · Bryan Kian Hsiang Low -
2021 Poster: Fault-Tolerant Federated Reinforcement Learning with Theoretical Guarantee »
Xiaofeng Fan · Yining Ma · Zhongxiang Dai · Wei Jing · Cheston Tan · Bryan Kian Hsiang Low -
2021 Poster: Optimizing Conditional Value-At-Risk of Black-Box Functions »
Quoc Phong Nguyen · Zhongxiang Dai · Bryan Kian Hsiang Low · Patrick Jaillet -
2020 Poster: Variational Bayesian Unlearning »
Quoc Phong Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet -
2020 Poster: Federated Bayesian Optimization via Thompson Sampling »
Zhongxiang Dai · Bryan Kian Hsiang Low · Patrick Jaillet -
2020 Poster: Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization »
Sreejith Balakrishnan · Quoc Phong Nguyen · Bryan Kian Hsiang Low · Harold Soh -
2019 Poster: Implicit Posterior Variational Inference for Deep Gaussian Processes »
Haibin YU · Yizhou Chen · Bryan Kian Hsiang Low · Patrick Jaillet · Zhongxiang Dai -
2019 Spotlight: Implicit Posterior Variational Inference for Deep Gaussian Processes »
Haibin YU · Yizhou Chen · Bryan Kian Hsiang Low · Patrick Jaillet · Zhongxiang Dai -
2017 : Poster Session 2 »
Farhan Shafiq · Antonio Tomas Nevado Vilchez · Takato Yamada · Sakyasingha Dasgupta · Robin Geyer · Moin Nabi · Crefeda Rodrigues · Edoardo Manino · Alexantrou Serb · Miguel A. Carreira-Perpinan · Kar Wai Lim · Bryan Kian Hsiang Low · Rohit Pandey · Marie C White · Pavel Pidlypenskyi · Xue Wang · Christine Kaeser-Chen · Michael Zhu · Suyog Gupta · Sam Leroux -
2017 : Aligned AI Poster Session »
Amanda Askell · Rafal Muszynski · William Wang · Yaodong Yang · Quoc Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet · Candice Schumann · Anqi Liu · Peter Eckersley · Angelina Wang · William Saunders -
2015 Poster: Inverse Reinforcement Learning with Locally Consistent Reward Functions »
Quoc Phong Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet