Timezone: »
Nearest neighbor is a popular class of classification methods with many desirable properties. For a large data set which cannot be loaded into the memory of a single machine due to computation, communication, privacy, or ownership limitations, we consider the divide and conquer scheme: the entire data set is divided into small subsamples, on which nearest neighbor predictions are made, and then a final decision is reached by aggregating the predictions on subsamples by majority voting. We name this method the big Nearest Neighbor (bigNN) classifier, and provide its rates of convergence under minimal assumptions, in terms of both the excess risk and the classification instability, which are proven to be the same rates as the oracle nearest neighbor classifier and cannot be improved. To significantly reduce the prediction time that is required for achieving the optimal rate, we also consider the pre-training acceleration technique applied to the bigNN method, with proven convergence rate. We find that in the distributed setting, the optimal choice of the neighbor k should scale with both the total sample size and the number of partitions, and there is a theoretical upper limit for the latter. Numerical studies have verified the theoretical findings.
Author Information
Xingye Qiao (Binghamton University)
Jiexin Duan (Purdue University)
Guang Cheng (Purdue University)
More from the Same Authors
-
2021 : Optimum-statistical Collaboration Towards Efficient Black-boxOptimization »
Wenjie Li · Chi-Hua Wang · Guang Cheng -
2022 Poster: Fair Bayes-Optimal Classifiers Under Predictive Parity »
Xianli Zeng · Edgar Dobriban · Guang Cheng -
2022 Poster: Why Do Artificially Generated Data Help Adversarial Robustness »
Yue Xing · Qifan Song · Guang Cheng -
2022 Poster: Phase Transition from Clean Training to Adversarial Training »
Yue Xing · Qifan Song · Guang Cheng -
2021 Poster: On the Algorithmic Stability of Adversarial Training »
Yue Xing · Qifan Song · Guang Cheng -
2020 Poster: Efficient Variational Inference for Sparse Deep Learning with Theoretical Guarantee »
Jincheng Bai · Qifan Song · Guang Cheng -
2020 Poster: Statistical Guarantees of Distributed Nearest Neighbor Classification »
Jiexin Duan · Xingye Qiao · Guang Cheng -
2020 Poster: Directional Pruning of Deep Neural Networks »
Shih-Kang Chao · Zhanyu Wang · Yue Xing · Guang Cheng -
2019 Poster: Bootstrapping Upper Confidence Bound »
Botao Hao · Yasin Abbasi Yadkori · Zheng Wen · Guang Cheng -
2018 Poster: Learning Confidence Sets using Support Vector Machines »
Wenbo Wang · Xingye Qiao -
2018 Poster: Early Stopping for Nonparametric Testing »
Meimei Liu · Guang Cheng -
2015 Poster: Non-convex Statistical Optimization for Sparse Tensor Graphical Model »
Wei Sun · Zhaoran Wang · Han Liu · Guang Cheng