Timezone: »
When learning task-oriented dialogue (TOD) agents, one can naturally utilize reinforcement learning (RL) techniques to train dialogue strategies to achieve user-specific goals. Prior works mainly focus on adopting advanced RL techniques to train the TOD agents, while the design of the reward function is not well studied. This paper aims at answering the question of how to efficiently learn and leverage a reward function for training end-to-end TOD agents. Specifically, we introduce two generalized objectives for reward-function learning, inspired by the classical learning-to-rank literature. Further, we utilize the learned reward-function to guide the training of the end-to-end TOD agent. With the proposed techniques, we achieve competitive results on the end-to-end response-generation task on the Multiwoz 2.0 dataset.
Author Information
Yihao Feng (Salesforce Research)
Researcher from Salesforce Research
Shentao Yang (The University of Texas at Austin)
Shujian Zhang (UT Austin)
Jianguo Zhang (Salesforce AI Research)
Caiming Xiong (Salesforce Research)
Mingyuan Zhou (University of Texas at Austin)
Huan Wang (Salesforce Research)
More from the Same Authors
-
2022 Poster: Knowledge-Aware Bayesian Deep Topic Model »
Dongsheng Wang · Yishi Xu · Miaoge Li · Zhibin Duan · Chaojie Wang · Bo Chen · Mingyuan Zhou -
2022 Poster: HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding »
Yishi Xu · Dongsheng Wang · Bo Chen · Ruiying Lu · Zhibin Duan · Mingyuan Zhou -
2022 : Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems »
Yihao Feng · Shentao Yang · Shujian Zhang · Jianguo Zhang · Caiming Xiong · Mingyuan Zhou · Huan Wang -
2022 : Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning »
Zhendong Wang · jonathan j hunt · Mingyuan Zhou -
2022 Spotlight: Lightning Talks 5B-4 »
Yuezhi Yang · Zeyu Yang · Yong Lin · Yishi Xu · Linan Yue · Tao Yang · Weixin Chen · Qi Liu · Jiaqi Chen · Dongsheng Wang · Baoyuan Wu · Yuwang Wang · Hao Pan · Shengyu Zhu · Zhenwei Miao · Yan Lu · Lu Tan · Bo Chen · Yichao Du · Haoqian Wang · Wei Li · Yanqing An · Ruiying Lu · Peng Cui · Nanning Zheng · Li Wang · Zhibin Duan · Xiatian Zhu · Mingyuan Zhou · Enhong Chen · Li Zhang -
2022 Spotlight: Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization »
Devansh Arpit · Huan Wang · Yingbo Zhou · Caiming Xiong -
2022 Spotlight: HyperMiner: Topic Taxonomy Mining with Hyperbolic Embedding »
Yishi Xu · Dongsheng Wang · Bo Chen · Ruiying Lu · Zhibin Duan · Mingyuan Zhou -
2022 Spotlight: Lightning Talks 5B-1 »
Devansh Arpit · Xiaojun Xu · Zifan Shi · Ivan Skorokhodov · Shayan Shekarforoush · Zhan Tong · Yiqun Wang · Shichong Peng · Linyi Li · Ivan Skorokhodov · Huan Wang · Yibing Song · David Lindell · Yinghao Xu · Seyed Alireza Moazenipourasil · Sergey Tulyakov · Peter Wonka · Yiqun Wang · Ke Li · David Fleet · Yujun Shen · Yingbo Zhou · Bo Li · Jue Wang · Peter Wonka · Marcus Brubaker · Caiming Xiong · Limin Wang · Deli Zhao · Qifeng Chen · Dit-Yan Yeung -
2022 Spotlight: Lightning Talks 2A-4 »
Sarthak Mittal · Richard Grumitt · Zuoyu Yan · Lihao Wang · Dongsheng Wang · Alexander Korotin · Jiangxin Sun · Ankit Gupta · Vage Egiazarian · Tengfei Ma · Yi Zhou · Yishi Xu · Albert Gu · Biwei Dai · Chunyu Wang · Yoshua Bengio · Uros Seljak · Miaoge Li · Guillaume Lajoie · Yiqun Wang · Liangcai Gao · Lingxiao Li · Jonathan Berant · Huang Hu · Xiaoqing Zheng · Zhibin Duan · Hanjiang Lai · Evgeny Burnaev · Zhi Tang · Zhi Jin · Xuanjing Huang · Chaojie Wang · Yusu Wang · Jian-Fang Hu · Bo Chen · Chao Chen · Hao Zhou · Mingyuan Zhou -
2022 Spotlight: Knowledge-Aware Bayesian Deep Topic Model »
Dongsheng Wang · Yishi Xu · Miaoge Li · Zhibin Duan · Chaojie Wang · Bo Chen · Mingyuan Zhou -
2022 : Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets »
Philippe Laban · Chien-Sheng Wu · Wenhao Liu · Caiming Xiong -
2022 Poster: Learning to Re-weight Examples with Optimal Transport for Imbalanced Classification »
Dandan Guo · Zhuo Li · meixi zheng · He Zhao · Mingyuan Zhou · Hongyuan Zha -
2022 Poster: Adaptive Distribution Calibration for Few-Shot Learning with Hierarchical Optimal Transport »
Dandan Guo · Long Tian · He Zhao · Mingyuan Zhou · Hongyuan Zha -
2022 Poster: Alleviating "Posterior Collapse'' in Deep Topic Models via Policy Gradient »
Yewen Li · Chaojie Wang · Zhibin Duan · Dongsheng Wang · Bo Chen · Bo An · Mingyuan Zhou -
2022 Poster: A Variational Edge Partition Model for Supervised Graph Representation Learning »
Yilin He · Chaojie Wang · Hao Zhang · Bo Chen · Mingyuan Zhou -
2022 Poster: Policy Optimization for Markov Games: Unified Framework and Faster Convergence »
Runyu Zhang · Qinghua Liu · Huan Wang · Caiming Xiong · Na Li · Yu Bai -
2022 Poster: Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization »
Devansh Arpit · Huan Wang · Yingbo Zhou · Caiming Xiong -
2022 Poster: A Unified Framework for Alternating Offline Model Training and Policy Learning »
Shentao Yang · Shujian Zhang · Yihao Feng · Mingyuan Zhou -
2022 Poster: CARD: Classification and Regression Diffusion Models »
Xizewen Han · Huangjie Zheng · Mingyuan Zhou -
2021 Poster: Exploiting Chain Rule and Bayes' Theorem to Compare Probability Distributions »
Huangjie Zheng · Mingyuan Zhou -
2021 Poster: Alignment Attention by Matching Key and Query Distributions »
Shujian Zhang · Xinjie Fan · Huangjie Zheng · Korawat Tanwisuth · Mingyuan Zhou -
2021 Poster: Probabilistic Margins for Instance Reweighting in Adversarial Training »
qizhou wang · Feng Liu · Bo Han · Tongliang Liu · Chen Gong · Gang Niu · Mingyuan Zhou · Masashi Sugiyama -
2021 Poster: Convex Polytope Trees »
Mohammadreza Armandpour · Ali Sadeghian · Mingyuan Zhou -
2021 Poster: TopicNet: Semantic Graph-Guided Topic Discovery »
Zhibin Duan · Yishi Xu · Bo Chen · Dongsheng Wang · Chaojie Wang · Mingyuan Zhou -
2021 Poster: A Prototype-Oriented Framework for Unsupervised Domain Adaptation »
Korawat Tanwisuth · Xinjie Fan · Huangjie Zheng · Shujian Zhang · Hao Zhang · Bo Chen · Mingyuan Zhou -
2021 Poster: CARMS: Categorical-Antithetic-REINFORCE Multi-Sample Gradient Estimator »
Alek Dimitriev · Mingyuan Zhou -
2020 Poster: Bidirectional Convolutional Poisson Gamma Dynamical Systems »
wenchao chen · Chaojie Wang · Bo Chen · Yicheng Liu · Hao Zhang · Mingyuan Zhou -
2020 Poster: Implicit Distributional Reinforcement Learning »
Yuguang Yue · Zhendong Wang · Mingyuan Zhou -
2020 Poster: Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network »
Chaojie Wang · Hao Zhang · Bo Chen · Dongsheng Wang · Zhengjue Wang · Mingyuan Zhou -
2020 Poster: Bayesian Attention Modules »
Xinjie Fan · Shujian Zhang · Bo Chen · Mingyuan Zhou -
2020 Poster: Off-Policy Interval Estimation with Lipschitz Value Iteration »
Ziyang Tang · Yihao Feng · Na Zhang · Jian Peng · Qiang Liu -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Poster: Variational Graph Recurrent Neural Networks »
Ehsan Hajiramezanali · Arman Hasanzadeh · Krishna Narayanan · Nick Duffield · Mingyuan Zhou · Xiaoning Qian -
2019 Poster: A Kernel Loss for Solving the Bellman Equation »
Yihao Feng · Lihong Li · Qiang Liu -
2019 Poster: Semi-Implicit Graph Variational Auto-Encoders »
Arman Hasanzadeh · Ehsan Hajiramezanali · Krishna Narayanan · Nick Duffield · Mingyuan Zhou · Xiaoning Qian -
2019 Poster: Poisson-Randomized Gamma Dynamical Systems »
Aaron Schein · Scott Linderman · Mingyuan Zhou · David Blei · Hanna Wallach -
2018 Poster: Nonparametric Bayesian Lomax delegate racing for survival analysis with competing risks »
Quan Zhang · Mingyuan Zhou -
2018 Poster: Deep Poisson gamma dynamical systems »
Dandan Guo · Bo Chen · Hao Zhang · Mingyuan Zhou -
2018 Poster: Dirichlet belief networks for topic structure learning »
He Zhao · Lan Du · Wray Buntine · Mingyuan Zhou -
2018 Poster: Parsimonious Bayesian deep networks »
Mingyuan Zhou -
2018 Poster: Masking: A New Perspective of Noisy Supervision »
Bo Han · Jiangchao Yao · Gang Niu · Mingyuan Zhou · Ivor Tsang · Ya Zhang · Masashi Sugiyama -
2018 Poster: Bayesian multi-domain learning for cancer subtype discovery from next-generation sequencing count data »
Ehsan Hajiramezanali · Siamak Zamani Dadaneh · Alireza Karbalayghareh · Mingyuan Zhou · Xiaoning Qian -
2016 Poster: Poisson-Gamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou -
2016 Oral: Poisson-Gamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou -
2015 Poster: The Poisson Gamma Belief Network »
Mingyuan Zhou · Yulai Cong · Bo Chen -
2014 Poster: Beta-Negative Binomial Process and Exchangeable Random Partitions for Mixed-Membership Modeling »
Mingyuan Zhou -
2012 Poster: Augment-and-Conquer Negative Binomial Processes »
Mingyuan Zhou · Lawrence Carin -
2012 Spotlight: Augment-and-Conquer Negative Binomial Processes »
Mingyuan Zhou · Lawrence Carin -
2009 Poster: Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations »
Mingyuan Zhou · Haojun Chen · John Paisley · Lu Ren · Guillermo Sapiro · Lawrence Carin -
2009 Oral: Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations »
Mingyuan Zhou · Haojun Chen · John Paisley · Lu Ren · Guillermo Sapiro · Larry Carin