Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Towards Safe & Trustworthy Agents

Modelling the oversight of deceptive interpretability agents

Simon Lermen ⋅ Mateusz Dziemian

Abstract

Chat is not available.