Expo Demonstration
Multimodal AI Forensic Search for Video Surveillance
Ron Tindall
Upper Level Room 29A-D
Video surveillance often requires searching for specific targets from long-duration videos using multiple cameras. Traditional tracking‑and‑detection pipelines demand heavy manual filtering, and even recent multimodal approaches such as using CLIP remain limited to shallow visual attributes (e.g., clothing color) and weak temporal reasoning. This makes forensic search labor‑intensive. x000D
x000D
We present ForeSea, a novel AI forensic search system that supports rich multimodal queries (text + image) and returns timestamped evidence of key events. ForeSea is organized as a multi‑stage pipeline that couples tracking and retrieval with time‑aware VideoLLM reasoning: (1) uses tracking model to filter out irrelevant segments (e.g., frames without people) and produces person‑centric clips; (2) retrieval constructs an index over tracked clips to form a searchable database; and (3) during inference, the multimodal query is embedded to retrieve the top N candidate clips, which are then fed into a time-aware VideoLMM that performs temporal grounding and generates precise answers from concise input. Through ForeSea's multi-stage pipeline, we can search for targets using both image and text queries (e.g., asking 'When does this person get involved in a fight?' with an image of the person). This approach eliminates the need for detailed textual descriptions and enables effective temporal understanding across long videos. x000D
x000D
To evaluate LMM based forensic search, we introduce AI Forensic‑QA, a benchmark for multimodal video question answering with temporal grounding. On this benchmark, ForeSea achieves an 8.6 % accuracy improvement and a 6.9 (IoU) gain over strong baselines. To the best of our knowledge, this is the first benchmark in this domain to support multimodal queries evaluation. Our live demo showcases multimodal search, timestamped evidence visualization, and side‑by‑side comparisons with SOTA models.
Live content is unavailable. Log in and register to view live content