firstbacksecondback
50 Results
Poster
|
Wed 9:00 |
On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach Dennis Wei · Rahul Nair · Amit Dhurandhar · Kush Varshney · Elizabeth Daly · Moninder Singh |
|
Poster
|
Wed 14:00 |
Shield Decentralization for Safe Multi-Agent Reinforcement Learning Daniel Melcer · Christopher Amato · Stavros Tripakis |
|
Poster
|
Tue 9:00 |
Neural Abstractions Alessandro Abate · Alec Edwards · Mirco Giacobbe |
|
Poster
|
Thu 9:00 |
Parametrically Retargetable Decision-Makers Tend To Seek Power Alex Turner · Prasad Tadepalli |
|
Poster
|
Wed 9:00 |
Capturing Failures of Large Language Models via Human Cognitive Biases Erik Jones · Jacob Steinhardt |
|
Poster
|
MExMI: Pool-based Active Model Extraction Crossover Membership Inference Yaxin Xiao · Qingqing Ye · Haibo Hu · Huadi Zheng · Chengfang Fang · Jie Shi |
||
Poster
|
Thu 9:00 |
Defining and Characterizing Reward Gaming Joar Skalse · Nikolaus Howe · Dmitrii Krasheninnikov · David Krueger |
|
Poster
|
Tue 14:00 |
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits Ruibo Liu · Chenyan Jia · Ge Zhang · Ziyu Zhuang · Tony Liu · Soroush Vosoughi |
|
Poster
|
Tue 9:00 |
Safety Guarantees for Neural Network Dynamic Systems via Stochastic Barrier Functions Rayan Mazouz · Karan Muvvala · Akash Ratheesh Babu · Luca Laurenti · Morteza Lahijanian |
|
Poster
|
Thu 9:00 |
Towards Safe Reinforcement Learning with a Safety Editor Policy Haonan Yu · Wei Xu · Haichao Zhang |
|
Poster
|
Thu 14:00 |
Active Learning with Safety Constraints Romain Camilleri · Andrew Wagenmaker · Jamie Morgenstern · Lalit Jain · Kevin Jamieson |
|
Poster
|
Wed 9:00 |
Enhancing Safe Exploration Using Safety State Augmentation Aivar Sootla · Alexander Cowen-Rivers · Jun Wang · Haitham Bou Ammar |