Timezone: »
Several machine learning models involve mapping a score vector to a probability vector. Usually, this is done by projecting the score vector onto a probability simplex, and such projections are often characterized as Lipschitz continuous approximations of the argmax function, whose Lipschitz constant is controlled by a parameter that is similar to a softmax temperature. The aforementioned parameter has been observed to affect the quality of these models and is typically either treated as a constant or decayed over time. In this work, we propose a method that adapts this parameter to individual training examples. The resulting method exhibits desirable properties, such as sparsity of its support and numerically efficient implementation, and we find that it can significantly outperform competing non-adaptive projection methods.
Author Information
Weiwei Kong (Georgia Institute of Technology)
Walid Krichene (Google)
Nic E Mayoraz (Google, Inc.)
Steffen Rendle (Google)
Li Zhang (Google)
More from the Same Authors
-
2017 Poster: Acceleration and Averaging in Stochastic Descent Dynamics »
Walid Krichene · Peter Bartlett -
2017 Spotlight: Acceleration and Averaging in Stochastic Descent Dynamics »
Walid Krichene · Peter Bartlett -
2016 Poster: Adaptive Averaging in Accelerated Descent Dynamics »
Walid Krichene · Alexandre Bayen · Peter Bartlett -
2016 Poster: Minimizing Regret on Reflexive Banach Spaces and Nash Equilibria in Continuous Zero-Sum Games »
Maximilian Balandat · Walid Krichene · Claire Tomlin · Alexandre Bayen -
2015 Poster: Accelerated Mirror Descent in Continuous and Discrete Time »
Walid Krichene · Alexandre Bayen · Peter Bartlett -
2015 Spotlight: Accelerated Mirror Descent in Continuous and Discrete Time »
Walid Krichene · Alexandre Bayen · Peter Bartlett -
2015 Poster: Nearly Optimal Private LASSO »
Kunal Talwar · Abhradeep Guha Thakurta · Li Zhang