NeurIPS Poster HonestLLM: Toward an Honest and Helpful Large Language Model

Poster

HonestLLM: Toward an Honest and Helpful Large Language Model

Gao Chujie · Siyuan Wu · Yue Huang · Dongping Chen · Qihui Zhang · Zhengyan Fu · Yao Wan · Lichao Sun · Xiangliang Zhang

East Exhibit Hall A-C #4307

[ Abstract ] [ Project Page ]

[ Paper] [ Poster] [ OpenReview]

Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Large Language Models (LLMs) have achieved remarkable success across various industries and applications, owing to their exceptional generative capabilities. Nevertheless, honesty and helpfulness, which ensure safe and useful real-world deployments, have been considered as the longstanding cornerstones in practice. In this paper, we first established comprehensive principles for honesty LLM and further created the HoneSet with 930 queries across six categories, which is designed to evaluate LLMs’ ability to maintain honesty. Then, we improved the honesty and helpfulness of LLMs in both training-free and fine-tuning settings. Specifically, we propose a training-free method named Curiosity-Driven Prompting, which enables LLMs to express their internal confusion and uncertainty about the given query and then optimize their responses. Moreover, we also propose a two-stage fine-tuning approach, inspired by curriculum learning, to enhance the honesty and helpfulness of LLMs. The method first teaches LLMs to distinguish between honest and dishonest, and then LLMs are trained to learn to respond more helpfully. Experimental results demonstrated that both of the two proposed methods improve the helpfulness of LLMs while making them maintain honesty. Our research has paved the way for more reliable and trustworthy LLMs in real-world applications.

Chat is not available.