Workshop
|
Sat 17:27
|
Understanding The Effect Of Temperature On Alignment With Human Opinions
Maja Pavlovic · Massimo Poesio
|
|
Workshop
|
|
Understanding The Effect Of Temperature On Alignment With Human Opinions
Maja Pavlovic · Massimo Poesio
|
|
Poster
|
Wed 16:30
|
Off-Policy Selection for Initiating Human-Centric Experimental Design
Ge Gao · Xi Yang · Qitong Gao · Song Ju · Miroslav Pajic · Min Chi
|
|
Tutorial
|
Tue 13:30
|
Cross-disciplinary insights into alignment in humans and machines
Gillian Hadfield · Dylan Hadfield-Menell · Joel Leibo · Rakshit Trivedi
|
|
Poster
|
Wed 11:00
|
Improved Generation of Adversarial Examples Against Safety-aligned LLMs
Qizhang Li · Yiwen Guo · Wangmeng Zuo · Hao Chen
|
|
Workshop
|
|
The Art of Knowing When to Stop: Analysis of Optimal Stopping in People and Machines
Fukun Zhang · Bonan Zhao
|
|
Poster
|
Wed 16:30
|
Expectation Alignment: Handling Reward Misspecification in the Presence of Expectation Mismatch
Malek Mechergui · Sarath Sreedharan
|
|
Affinity Event
|
|
Text-Image Concept Human Alignment Dataset and Metric
Maria Alejandra Bravo
|
|
Affinity Event
|
|
Detecting Machine-Generated vs. Human-Generated Content Across Varying Text Lengths: Small, Medium, and Large
Anjali Rawal · Hui Wang · Mengyuan Liu · Yu-Hsuan Lin · Youjia Zheng · Shanu Sushmita
|
|
Poster
|
Fri 11:00
|
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evaluation
jingnan zheng · Han Wang · An Zhang · Nguyen Duy Tai · Jun Sun · Tat-Seng Chua
|
|
Poster
|
Thu 16:30
|
Learning Human-like Representations to Enable Learning Human Values
Andrea Wynn · Ilia Sucholutsky · Tom Griffiths
|
|
Poster
|
Wed 11:00
|
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
Leon Lang · Davis Foote · Stuart J Russell · Anca Dragan · Erik Jenner · Scott Emmons
|
|