Timezone: »
A new algorithm is proposed which accelerates the mini-batch k-means algorithm of Sculley (2010) by using the distance bounding approach of Elkan (2003). We argue that, when incorporating distance bounds into a mini-batch algorithm, already used data should preferentially be reused. To this end we propose using nested mini-batches, whereby data in a mini-batch at iteration t is automatically reused at iteration t+1. Using nested mini-batches presents two difficulties. The first is that unbalanced use of data can bias estimates, which we resolve by ensuring that each data sample contributes exactly once to centroids. The second is in choosing mini-batch sizes, which we address by balancing premature fine-tuning of centroids with redundancy induced slow-down. Experiments show that the resulting nmbatch algorithm is very effective, often arriving within 1\% of the empirical minimum 100 times earlier than the standard mini-batch algorithm.
Author Information
James Newling (Idiap Research Institute)
François Fleuret (Idiap Research Institute)
François Fleuret got a PhD in Mathematics from INRIA and the University of Paris VI in 2000, and an Habilitation degree in Mathematics from the University of Paris XIII in 2006. He is Full Professor in the department of Computer Science at the University of Geneva, and Adjunct Professor in the School of Engineering of the École Polytechnique Fédérale de Lausanne. He has published more than 80 papers in peer-reviewed international conferences and journals. He is Associate Editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence, serves as Area Chair for NeurIPS, AAAI, and ICCV, and in the program committee of many top-tier international conferences in machine learning and computer vision. He was or is expert for multiple funding agencies. He is the inventor of several patents in the field of machine learning, and co-founder of Neural Concept SA, a company specializing in the development and commercialization of deep learning solutions for engineering design. His main research interest is machine learning, with a particular focus on computational aspects and sample efficiency.
More from the Same Authors
-
2021 : Test time Adaptation through Perturbation Robustness »
Prabhu Teja Sivaprasad · François Fleuret -
2019 Poster: Reducing Noise in GAN Training with Variance Reduced Extragradient »
Tatjana Chavdarova · Gauthier Gidel · François Fleuret · Simon Lacoste-Julien -
2019 Demonstration: Real Time CFD simulations with 3D Mesh Convolutional Networks »
Pierre Baque · Pascal Fua · François Fleuret -
2019 Poster: Full-Gradient Representation for Neural Network Visualization »
Suraj Srinivas · François Fleuret -
2018 Poster: Practical Deep Stereo (PDS): Toward applications-friendly deep stereo matching »
Stepan Tulyakov · Anton Ivanov · François Fleuret -
2017 Poster: K-Medoids For K-Means Seeding »
James Newling · François Fleuret -
2017 Spotlight: K-Medoids For K-Means Seeding »
James Newling · François Fleuret -
2015 Poster: Kullback-Leibler Proximal Variational Inference »
Mohammad Emtiyaz Khan · Pierre Baque · François Fleuret · Pascal Fua -
2014 Demonstration: A 3D Simulator for Evaluating Reinforcement and Imitation Learning Algorithms on Complex Tasks »
Leonidas Lefakis · François Fleuret · Cijo Jose -
2013 Poster: Reservoir Boosting : Between Online and Offline Ensemble Learning »
Leonidas Lefakis · François Fleuret -
2011 Poster: Boosting with Maximum Adaptive Sampling »
Charles Dubout · François Fleuret -
2010 Demonstration: Platform to Share Feature Extraction Methods »
François Fleuret -
2010 Poster: Joint Cascade Optimization Using A Product Of Boosted Classifiers »
Leonidas Lefakis · François Fleuret