Skip to yearly menu bar Skip to main content


Poster

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

Ehsan Amid · Manfred K. Warmuth · Rohan Anil · Tomer Koren

East Exhibition Hall B + C #163

Keywords: [ Deep Learning ] [ Efficient Training Methods ]


Abstract:

We introduce a temperature into the exponential function and replace the softmax output layer of the neural networks by a high-temperature generalization. Similarly, the logarithm in the loss we use for training is replaced by a low-temperature logarithm. By tuning the two temperatures, we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the neural networks by our bi-temperature generalization of the logistic loss, the training becomes more robust to noise. We visualize the effect of tuning the two temperatures in a simple setting and show the efficacy of our method on large datasets. Our methodology is based on Bregman divergences and is superior to a related two-temperature method that uses the Tsallis divergence.

Live content is unavailable. Log in and register to view live content