Inference-time alignment of language models by importance sampling on pre-logit space
Abstract
Inference-time alignment of large language models (LLMs) attracts attentions because fine-tuning LLMs requires high computational costs. We propose a new sampling-based alignment method called adaptive importance sampling on pre-logits (AISP). AISP maximizes the expected value of a given reward model with respect to the distribution of pre-logits, which are outputs of the penultimate layer. We assume that the conditional distribution of pre-logits can be approximated by a Gaussian distribution, which bridges between the LLM alignment and the sampling-based control algorithm. The optimization of the pre-logit are solved by using importance sampling. AISP outperforms best-of-n sampling in terms of average rewards over the number of used samples.