Timezone: »
A fundamental question in neuroscience is how the brain creates an internal model of the world to guide actions using sequences of ambiguous sensory information. This is naturally formulated as a reinforcement learning problem under partial observations, where an agent must estimate relevant latent variables in the world from its evidence, anticipate possible future states, and choose actions that optimize total expected reward. This problem can be solved by control theory, which allows us to find the optimal actions for a given system dynamics and objective function. However, animals often appear to behave suboptimally. Why? We hypothesize that animals have their own flawed internal model of the world, and choose actions with the highest expected subjective reward according to that flawed model. We describe this behavior as {\it rational} but not optimal. The problem of Inverse Rational Control (IRC) aims to identify which internal model would best explain an agent's actions. Our contribution here generalizes past work on Inverse Rational Control which solved this problem for discrete control in partially observable Markov decision processes. Here we accommodate continuous nonlinear dynamics and continuous actions, and impute sensory observations corrupted by unknown noise that is private to the animal. We first build an optimal Bayesian agent that learns an optimal policy generalized over the entire model space of dynamics and subjective rewards using deep reinforcement learning. Crucially, this allows us to compute a likelihood over models for experimentally observable action trajectories acquired from a suboptimal agent. We then find the model parameters that maximize the likelihood using gradient ascent. Our method successfully recovers the true model of rational agents. This approach provides a foundation for interpreting the behavioral and neural dynamics of animal brains during complex tasks.
Author Information
Minhae Kwon (Soongsil University)
Saurabh Daptardar (Google)
Paul R Schrater (University of Minnesota)
Xaq Pitkow (BCM/Rice)
More from the Same Authors
-
2021 : Stability Analysis in Mixed-Autonomous Traffic with Deep Reinforcement Learning »
Dongsu Lee · Minhae Kwon -
2022 : Attention as inference with third-order interactions »
Yicheng Fei · Xaq Pitkow -
2022 : Learning Dynamics and Structure of Complex Systems Using Graph Neural Networks »
Zhe Li · Andreas Tolias · Xaq Pitkow -
2022 : Information-theoretic Neural Decoding Reproduces Several Laws of Human Behavior »
S. Thomas Christie · Paul R Schrater -
2022 : Information-theoretic Neural Decoding Reproduces Several Laws of Human Behavior »
S. Thomas Christie · Paul R Schrater -
2022 Poster: Phase transitions in when feedback is useful »
Lokesh Boominathan · Xaq Pitkow -
2019 Poster: Learning from brains how to regularize machines »
Zhe Li · Wieland Brendel · Edgar Walker · Erick Cobos · Taliah Muhammad · Jacob Reimer · Matthias Bethge · Fabian Sinz · Xaq Pitkow · Andreas Tolias -
2018 : Poster Session 1 + Coffee »
Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang -
2018 Poster: Stimulus domain transfer in recurrent models for large scale cortical population prediction on video »
Fabian Sinz · Alexander Ecker · Paul Fahey · Edgar Walker · Erick M Cobos · Emmanouil Froudarakis · Dimitri Yatsenko · Xaq Pitkow · Jacob Reimer · Andreas Tolias -
2016 : Paul Schrater (University of Minnesota) »
Paul R Schrater -
2016 Poster: Inference by Reparameterization in Neural Population Codes »
Rajkumar Vasudeva Raju · Xaq Pitkow -
2014 Poster: Magnitude-sensitive preference formation` »
Nisheeth Srivastava · Ed Vul · Paul R Schrater -
2012 Poster: Rational inference of relative preferences »
Nisheeth Srivastava · Paul R Schrater -
2008 Poster: Structure Learning in Human Sequential Decision-Making »
Daniel Acuna · Paul R Schrater -
2006 Poster: Theory and Dynamics of Perceptual Bistability »
Paul R Schrater · Rashmi Sundareswara