NeurIPS Comparing Optimization Targets for Contrast-Consistent Search

Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)

Comparing Optimization Targets for Contrast-Consistent Search

Hugo Fry · Seamus Fallows · Jamie Wright · Ian Fan · Nandi Schoots

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

We investigate the optimization target of contrast-consistent search (CCS), which aims to recover the internal representations of truth of a large language model. We present a new loss function that we call the Midpoint-Displacement (MD) loss function. We demonstrate that for a certain hyper-parameter value this MD loss function leads to a prober with very similar weights to CCS. We further show that this hyper-parameter is not optimal and that with a better hyper-parameter the MD loss function tentatively attains a higher test accuracy than CCS.

Chat is not available.

Poster in Workshop: Socially Responsible Language Modelling Research (SoLaR)

Comparing Optimization Targets for Contrast-Consistent Search

Hugo Fry · Seamus Fallows · Jamie Wright · Ian Fan · Nandi Schoots

Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)