Skip to yearly menu bar Skip to main content


Policy Optimization in CMDPs with Bandit Feedback: Learning with Stochastic and Adversarial Constraints

Francesco Emanuele Stradi ⋅ Anna Lunghi ⋅ Matteo Castiglioni ⋅ Alberto Marchesi ⋅ Nicola Gatti

Abstract

Chat is not available.