Invited Talk 1: VideoLLMs are Lost in Time
Cees Snoek
Abstract
Despite recent advances in video-language foundation models, we find that they still lack a fundamental capability: understanding time. Even simple temporal relations such as before and after remain elusive. Meanwhile, widely used video benchmarks often fail to truly test temporal reasoning, allowing models to perform well by exploiting static frames, linguistic shortcuts, or prior world knowledge. In this talk, I will reveal why current VideoLLMs are “lost in time,” introduce a new benchmark designed to properly evaluate temporal understanding, and highlight surprising insights into how far the field still has to go.
Video
Chat is not available.
Successful Page Load