Skip to yearly menu bar Skip to main content


Empirical Evidence for Alignment Faking in a Small LLM and Prompt-Based Mitigation Techniques

Jeanice Koorndijk

Abstract

Chat is not available.