Timezone: »
A coreset is a small set that can approximately preserve the structure of the original input data set. Therefore we can run our algorithm on a coreset so as to reduce the total computational complexity. Conventional coreset techniques assume that the input data set is available to process explicitly. However, this assumption may not hold in real-world scenarios. In this paper, we consider the problem of coresets construction over relational data. Namely, the data is decoupled into several relational tables, and it could be very expensive to directly materialize the data matrix by joining the tables. We propose a novel approach called ``aggregation tree with pseudo-cube'' that can build a coreset from bottom to up. Moreover, our approach can neatly circumvent several troublesome issues of relational learning problems [Khamis et al., PODS 2019]. Under some mild assumptions, we show that our coreset approach can be applied for the machine learning tasks, such as clustering, logistic regression and SVM.
Author Information
Jiaxiang Chen (University of Science and Technology of China)
Qingyuan Yang (University of Science and Technology of China)
Ruomin Huang (University of Science and Technology of China)
Hu Ding (University of Science and Technology of China)
More from the Same Authors
-
2021 Spotlight: Robust and Fully-Dynamic Coreset for Continuous-and-Bounded Learning (With Outliers) Problems »
Zixiu Wang · Yiwen Guo · Hu Ding -
2022 Spotlight: Coresets for Wasserstein Distributionally Robust Optimization Problems »
Ruomin Huang · Jiawei Huang · Wenjie Liu · Hu Ding -
2022 Spotlight: Lightning Talks 4A-1 »
Jiawei Huang · Su Jia · Abdurakhmon Sadiev · Ruomin Huang · Yuanyu Wan · Denizalp Goktas · Jiechao Guan · Andrew Li · Wei-Wei Tu · Li Zhao · Amy Greenwald · Jiawei Huang · Dmitry Kovalev · Yong Liu · Wenjie Liu · Peter Richtarik · Lijun Zhang · Zhiwu Lu · R Ravi · Tao Qin · Wei Chen · Hu Ding · Nan Jiang · Tie-Yan Liu -
2022 Spotlight: Lightning Talks 1B-4 »
Andrei Atanov · Shiqi Yang · Wanshan Li · Yongchang Hao · Ziquan Liu · Jiaxin Shi · Anton Plaksin · Jiaxiang Chen · Ziqi Pan · yaxing wang · Yuxin Liu · Stepan Martyanov · Alessandro Rinaldo · Yuhao Zhou · Li Niu · Qingyuan Yang · Andrei Filatov · Yi Xu · Liqing Zhang · Lili Mou · Ruomin Huang · Teresa Yeo · kai wang · Daren Wang · Jessica Hwang · Yuanhong Xu · Qi Qian · Hu Ding · Michalis Titsias · Shangling Jui · Ajay Sohmshetty · Lester Mackey · Joost van de Weijer · Hao Li · Amir Zamir · Xiangyang Ji · Antoni Chan · Rong Jin -
2022 Spotlight: Coresets for Relational Data and The Applications »
Jiaxiang Chen · Qingyuan Yang · Ruomin Huang · Hu Ding -
2022 Poster: Coresets for Wasserstein Distributionally Robust Optimization Problems »
Ruomin Huang · Jiawei Huang · Wenjie Liu · Hu Ding -
2021 Poster: Solving Soft Clustering Ensemble via $k$-Sparse Discrete Wasserstein Barycenter »
Ruizhe Qin · Mengying Li · Hu Ding -
2021 Poster: Robust and Fully-Dynamic Coreset for Continuous-and-Bounded Learning (With Outliers) Problems »
Zixiu Wang · Yiwen Guo · Hu Ding