firstbacksecondback
100 Results
Workshop
|
Exploiting Contextual Structure to Generate Useful Auxiliary Tasks Benedict Quartey · Ankit Shah · George Konidaris |
||
Poster
|
Thu 8:45 |
State-Action Similarity-Based Representations for Off-Policy Evaluation Brahma Pavse · Josiah Hanna |
|
Poster
|
Tue 15:15 |
Uncertainty-Aware Instance Reweighting for Off-Policy Learning Xiaoying Zhang · Junpu Chen · Hongning Wang · Hong Xie · Yang Liu · John C.S. Lui · Hang Li |
|
Poster
|
Wed 8:45 |
Off-Policy Evaluation for Human Feedback Qitong Gao · Ge Gao · Juncheng Dong · Vahid Tarokh · Min Chi · Miroslav Pajic |
|
Poster
|
Tue 8:45 |
Global Optimality and Finite Sample Analysis of Softmax Off-Policy Actor Critic under State Distribution Mismatch Shangtong Zhang · Remi Tachet des Combes · Romain Laroche |
|
Poster
|
Tue 8:45 |
Reliable Off-Policy Learning for Dosage Combinations Jonas Schweisthal · Dennis Frauen · Valentyn Melnychuk · Stefan Feuerriegel |
|
Poster
|
Wed 8:45 |
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs Masatoshi Uehara · Haruka Kiyohara · Andrew Bennett · Victor Chernozhukov · Nan Jiang · Nathan Kallus · Chengchun Shi · Wen Sun |
|
Workshop
|
Learning Models and Evaluating Policies with Offline Off-Policy Data under Partial Observability Shreyas Chaudhari · Philip Thomas · Bruno C. da Silva |
||
Poster
|
Thu 8:45 |
Marginal Density Ratio for Off-Policy Evaluation in Contextual Bandits Muhammad Faaiz Taufiq · Arnaud Doucet · Rob Cornish · Jean-Francois Ton |
|
Poster
|
Thu 8:45 |
Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective Zeyu Zhang · Yi Su · Hui Yuan · Yiran Wu · Rishab Balasubramanian · Qingyun Wu · Huazheng Wang · Mengdi Wang |
|
Workshop
|
Chain-of-Thought Reasoning is a Policy Improvement Operator Hugh Zhang · David Parkes |
||
Poster
|
Wed 8:45 |
f-Policy Gradients: A General Framework for Goal-Conditioned RL using f-Divergences Siddhant Agarwal · Ishan Durugkar · Peter Stone · Amy Zhang |