Timezone: »

DriveCLIP: Zero-shot transfer for distracted driving activity understanding using CLIP
Md Zahid Hasan · Ameya Joshi · Mohammed Shaiqur Rahman · Venkatachalapathy Archana · Anuj Sharma · Chinmay Hegde · Soumik Sarkar

Distracted driving action recognition from naturalistic driving is crucial for both driver and pedestrian's safe and reliable experience. However, traditional computer vision techniques sometimes require a lot of supervision in terms of a large amount of annotated training data to detect distracted driving activities. Recently, the vision-language models have offered large-scale visual-textual pretraining that can be adapted to unsupervised task-specific learning like distracted activity recognition. The contrastive image-text pretraining models like CLIP have shown significant promise in learning natural language-guided visual representations. In this paper, we propose a CLIP-based driver activity recognition framework that predicts whether a driver is distracted or not while driving. CLIP's vision embedding offers zero-shot transfer, which can identify distracted activities by the driver from the driving videos. Our result suggests this framework offers SOTA performance on zero-shot transfer for predicting the driver's state on three public datasets. We also developed DriveCLIP, a classifier on top of the CLIP's visual representation for distracted driving detection tasks, and reported the results here.

Author Information

Md Zahid Hasan (Iowa State University)
Ameya Joshi (New York University)
Mohammed Shaiqur Rahman (Iowa State University)
Venkatachalapathy Archana (Iowa State University)
Anuj Sharma (Iowa state university)
Chinmay Hegde (New York University)
Soumik Sarkar (Iowa State University)

More from the Same Authors