Skip to yearly menu bar Skip to main content


Reinforcement Learning Fine-tuning of Language Models is Biased Towards More Extractable Features

Diogo Cruz ⋅ Edoardo Pona ⋅ Alex Holness-Tofts ⋅ Elias Schmied ⋅ Víctor Abia Alonso ⋅ Charlie J Griffin ⋅ Bogdan-Ionut Cirstea

Abstract

Chat is not available.