Poster
: Exploring Embodied Emotion Through A Large-Scale Egocentric Video Dataset
wang lin · Yueying Feng · WenKang Han · Tao Jin · Zhou Zhao · Fei Wu · Chang Yao · Jingyuan Chen
West Ballroom A-D #6602
Abstract:
Understanding human emotions is fundamental to enhancing human-computer interaction, especially for embodied agents that mimic human behavior. Traditional emotion analysis often takes a third-person perspective, limiting the ability of agents to interact naturally and empathetically. To address this gap, this paper presents for Exploring Embodied Emotion, the first massive first-person view video dataset. contains more than hours of video, capturing different emotion types in diverse scenarios and languages. The dataset features videos recorded by individuals in their daily lives, capturing a wide range of real-world emotions conveyed through visual, acoustic, and textual modalities. By leveraging this dataset, we define core benchmark tasks - emotion recognition, emotion classification, emotion localization, and emotion reasoning - supported by more than k manually crafted annotations, providing a comprehensive resource for training and evaluating emotion analysis models. We further present Emotion-LlaMa, which complements visual modality with acoustic modality to enhance the understanding of emotion in first-person videos. The results of comparison experiments with a large number of baselines demonstrate the superiority of Emotion-LlaMa and set a new benchmark for embodied emotion analysis. We expect that can promote advances in multimodal understanding, robotics, and augmented reality, and provide a solid foundation for the development of more empathetic and context-aware embodied agents.
Chat is not available.