Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Machine Learning for Autonomous Driving

Real-time Generalized Sensor Fusion with Transformers

Aayush Ahuja


Abstract:

3D Multi Object Tracking (MOT) is essential for the safe deployment of self-driving vehicles. While major progress has been made in 3D object detection and machine-learned tracking approaches, real time 3D MOT remains a challenging problem in dense urban scenes. Commercial deployment of self-driving cars requires high recall and redundancy often achieved by using multiple sensor modalities. While existing approaches have been shown to work well with a fixed input modality setting, it is generally hard to reconfigure the tracking pipeline for optimal performance with changes in the input sources. In this paper, we propose a generalized learnable framework for multi-modal data association leveraging Transformers. Our method encodes tracks and observations as embeddings using joint attention to capture spatio-temporal context. From these embeddings, pairwise similarity scores can be computed between tracks and observations, which are then used to classify track-observation association proposals. We experimentally demonstrate that our data-driven approach achieves better performance than heuristics-based solutions on our in-house large-scale dataset and show that it is generalizable to different combinations of input modalities without any specific hand-tuning. Our approach also has real-time performance even with a large number of inputs.

Chat is not available.