Timezone: »

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences
Ehsan Amid · Manfred K. Warmuth · Rohan Anil · Tomer Koren

Thu Dec 12 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #163

We introduce a temperature into the exponential function and replace the softmax output layer of the neural networks by a high-temperature generalization. Similarly, the logarithm in the loss we use for training is replaced by a low-temperature logarithm. By tuning the two temperatures, we create loss functions that are non-convex already in the single layer case. When replacing the last layer of the neural networks by our bi-temperature generalization of the logistic loss, the training becomes more robust to noise. We visualize the effect of tuning the two temperatures in a simple setting and show the efficacy of our method on large datasets. Our methodology is based on Bregman divergences and is superior to a related two-temperature method that uses the Tsallis divergence.

Author Information

Ehsan Amid (University of California, Santa Cruz)
Manfred K. Warmuth (Google Brain)
Rohan Anil (Google)
Tomer Koren (Google)

More from the Same Authors