Poster
Regularized Conditional Diffusion Model for Multi-Task Preference Alignment
Xudong Yu · Chenjia Bai · Haoran He · Changhong Wang · Xuelong Li
West Ballroom A-D #6609
Sequential decision-making can be formulated as a conditional generation process, with targets for alignment with human intents and versatility across various tasks. Previous return-conditioned diffusion models manifest comparable performance but rely on well-defined reward functions, which requires amounts of human efforts and faces challenges in multi-task settings. Preferences serve as an alternative but recent work rarely considers preference learning given multiple tasks. To facilitate the alignment and versatility in multi-task preference learning, we adopt multi-task preferences as a unified framework. In this work, we propose to learn preference representations aligned with preference labels, which are then used as conditions to guide the conditional generation process of diffusion models. The traditional classifier-free guidance paradigm suffers from the inconsistency between the conditions and generated trajectories. We thus introduce an auxiliary regularization objective to maximize the mutual information between conditions and corresponding generated trajectories, improving their alignment with preferences. Experiments in D4RL and Meta-World demonstrate the effectiveness and favorable performance of our method in single- and multi-task scenarios.
Live content is unavailable. Log in and register to view live content