Timezone: »

Curriculum Learning by Dynamic Instance Hardness
Tianyi Zhou · Shengjie Wang · Jeff Bilmes

Thu Dec 10 09:00 AM -- 11:00 AM (PST) @ Poster Session 5 #1333

A good teacher can adjust the curriculum based on students' learning history. By analogy, in this paper, we study the dynamics of a deep neural network's (DNN) performance on individual samples during its learning process. The observed properties allow us to develop an adaptive curriculum that leads to faster learning of more accurate models. We introduce dynamic instance hardness (DIH), the exponential moving average of a sample's instantaneous hardness (e.g., a loss, or a change in outputs) over the training history. A low DIH indicates that a model retains knowledge about a sample over time, and implies a flat loss landscape for that sample. Moreover, for DNNs, we find that a sample's DIH early in training predicts its DIH in later stages. Hence, we can train a model using samples with higher DIH and safely ignore those with lower DIH. This motivates a DIH guided curriculum learning (DIHCL). Compared to existing CL methods: (1) DIH is more stable over time than using only instantaneous hardness, which is noisy due to stochastic training and DNN's non-smoothness; (2) DIHCL is computationally inexpensive since it uses only a byproduct of back-propagation and thus does not require extra inference. On 11 datasets, DIHCL significantly outperforms random mini-batch SGD and recent CL methods in terms of efficiency and final performance.

Author Information

Tianyi Zhou (University of Washington, Seattle)

Tianyi Zhou is a 6th-year Ph.D student of Paul G. Allen School of Computer Science and Engineering at University of Washington, Seattle, supervised by Jeff Bilmes and Carlos Guestrin. He has worked with Dacheng Tao at University of Technology Sydney and Nanyang Technological University for 4 years before going to UW. His research covers topics in machine learning, natural language processing, statistics, and data analysis. He has published 30+ papers with 1300+ citations at top conferences and journals including NeurIPS, ICML, ICLR, AISTATS, NAACL, ACM SIGKDD, IEEE ICDM, AAAI, IJCAI, IEEE ISIT, Machine Learning Journal (Springer), DMKD (Springer), IEEE TIP, IEEE TNNLS, etc. He is the recipient of the best student paper award at IEEE ICDM 2013.

Shengjie Wang (University of Washington)
Jeff Bilmes (University of Washington, Seattle)

More from the Same Authors