Timezone: »
Understanding multimodal perception for embodied AI is an open question because such inputs may contain highly complementary as well as redundant information for the task. A relevant direction for multimodal policies is understanding the global trends of each modality at the fusion layer. To this end, we disentangle the attributions for visual, language, and previous action inputs across different policies trained on the ALFRED dataset. Attribution analysis can be utilized to rank and group the failure scenarios, investigate modeling and dataset biases, and critically analyze multimodal EAI policies for robustness and user trust before deployment. We present MAFEA, a framework to compute global attributions per modality of any differentiable policy. In addition, we show how attributions enable lower-level behavior analysis in EAI policies through two example case studies on language and visual attributions.
Author Information
Vidhi Jain (Carnegie Mellon University)
Vidhi Jain is a Robotics Ph.D. student advised by Yonatan Bisk (CMU, LTI). She is interested in the learning multimodal policies for embodied AI and robots for complex everyday tasks.
Jayant Sravan Tamarapalli (Carnegie Mellon University)
Sahiti Yerramilli (Carnegie Mellon University)
Yonatan Bisk (Carnegie Mellon University)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 : MAEA: Multimodal Attribution Framework for Embodied AI »
Dates n/a. Room
More from the Same Authors
-
2022 : MAFEA: Multimodal Attribution Framework for Embodied AI »
Vidhi Jain · Jayant Sravan Tamarapalli · Sahiti Yerramilli · Yonatan Bisk -
2022 : MAFEA: Multimodal Attribution Framework for Embodied AI »
Vidhi Jain · Jayant Sravan Tamarapalli · Sahiti Yerramilli · Yonatan Bisk -
2020 : Spotlight Talk: Jain »
Vidhi Jain