7th International Workshop on Large Scale Holistic Video Understanding: Toward Video Foundation Models
Abstract
This workshop aims to advance the field of video understanding by fostering discussions around holistic and generalist video foundation models. Building upon the Holistic Video Understanding (HVU) initiative and dataset introduced in 2019, we have successfully organized eight HVU workshops and tutorials at top-tier venues such as CVPR and ICCV, uniting researchers, practitioners, and students from around the world. These efforts have played a central role in moving the community beyond narrow action recognition tasks toward multi-faceted, semantic, and generalist video understanding.With the emergence of large-scale foundation models and video large language models (Video-LLMs), the landscape of video understanding is rapidly evolving. These models enable unified reasoning across spatial, temporal, and multimodal dimensions, yet introduce new challenges in scalability, efficiency, interpretability, and responsible deployment.The HVU Workshop 2025 will provide a platform to explore these frontiers, discussing topics such as multimodal representation learning, long-context reasoning, evaluation of general-purpose video systems, efficient adaptation and scaling laws, and the ethical and societal implications of video AI. Our goal is to bring together a diverse and inclusive community to define the next chapter of holistic, generalist, and responsible video understanding.