`

Timezone: »

 
Poster
ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction
Gengshan Yang · Deqing Sun · Varun Jampani · Daniel Vlasic · Forrester Cole · Ce Liu · Deva Ramanan

Thu Dec 09 08:30 AM -- 10:00 AM (PST) @ None #None

We introduce ViSER, a method for recovering articulated 3D shapes and dense3D trajectories from monocular videos. Previous work on high-quality reconstruction of dynamic 3D shapes typically relies on multiple camera views, strong category-specific priors, or 2D keypoint supervision. We show that none of these are required if one can reliably estimate long-range correspondences in a video, making use of only 2D object masks and two-frame optical flow as inputs. ViSER infers correspondences by matching 2D pixels to a canonical, deformable 3D mesh via video-specific surface embeddings that capture the pixel appearance of each surface point. These embeddings behave as a continuous set of keypoint descriptors defined over the mesh surface, which can be used to establish dense long-range correspondences across pixels. The surface embeddings are implemented as coordinate-based MLPs that are fit to each video via consistency and contrastive reconstruction losses.Experimental results show that ViSER compares favorably against prior work on challenging videos of humans with loose clothing and unusual poses as well as animals videos from DAVIS and YTVOS. Our code is available at viser-shape.github.io.

Author Information

Gengshan Yang (Carnegie Mellon University)
Deqing Sun (Google)
Varun Jampani (Google)
Daniel Vlasic (Massachusetts Institute of Technology)
Forrester Cole (Google Research)
Ce Liu (Microsoft)
Deva Ramanan (Carnegie Mellon University)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors