Skip to yearly menu bar Skip to main content


Poster

Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Yiming Li · Zehong Wang · Yue Wang · Zhiding Yu · Zan Gojcic · Marco Pavone · Chen Feng · Jose M. Alvarez


Abstract:

Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map, while concurrently performing 2D ephemeral object segmentation. Such environment-object decomposition exploits self-supervision from repeated traversals: the consensus and dissensus in images correspond, respectively, to the permanent environment and ephemeral objects in 3D. This is because the environment remains consistent across traversals, while objects frequently change. Specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 3D mapping and 2D segmentation without human intervention. Moreover, we curate the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify our method's effectiveness and potential for self-driving applications, such as autolabeling, camera-only 3D reconstruction and neural simulation.

Live content is unavailable. Log in and register to view live content