Skip to yearly menu bar Skip to main content


Poster

Nearly Optimal Best-of-Both-Worlds Algorithms for Online Learning with Feedback Graphs

Shinji Ito · Taira Tsuchiya · Junya Honda

Hall J (level 1) #820

Keywords: [ follow the regularized leader ] [ learning with feedback graphs ] [ best-of-both-worlds algorithm ] [ multi-armed bandit ]


Abstract: This study considers online learning with general directed feedback graphs. For this problem, we present best-of-both-worlds algorithms that achieve nearly tight regret bounds for adversarial environments as well as poly-logarithmic regret bounds for stochastic environments. As Alon et al. [2015] have shown, tight regret bounds depend on the structure of the feedback graph: strongly observable graphs yield minimax regret of Θ~(α1/2T1/2), while weakly observable graphs induce minimax regret of Θ~(δ1/3T2/3), where α and δ, respectively, represent the independence number of the graph and the domination number of a certain portion of the graph. Our proposed algorithm for strongly observable graphs has a regret bound of O~(α1/2T1/2) for adversarial environments, as well as of O(α(lnT)3Δmin) for stochastic environments, where Δmin expresses the minimum suboptimality gap. This result resolves an open question raised by Erez and Koren [2021]. We also provide an algorithm for weakly observable graphs that achieves a regret bound of O~(δ1/3T2/3) for adversarial environments and poly-logarithmic regret for stochastic environments. The proposed algorithms are based on the follow-the-regularized-leader approach combined with newly designed update rules for learning rates.

Chat is not available.