Skip to yearly menu bar Skip to main content


Poster
in
Affinity Event: Women in Machine Learning

Can We Predict Alignment Before Models Finish Thinking? Towards Monitoring Misaligned Reasoning Models

Yik Siu Chan · Yong Zheng-Xin · Stephen Bach

Abstract

Chat is not available.