Timezone: »
The three-dimensional reconstruction of multiple interacting humans given a monocular image is crucial for the general task of scene understanding, as capturing the subtleties of interaction is often the very reason for taking a picture. Current 3D human reconstruction methods either treat each person independently, ignoring most of the context, or reconstruct people jointly, but cannot recover interactions correctly when people are in close proximity. In this work, we introduce \textbf{REMIPS}, a model for 3D \underline{Re}construction of \underline{M}ultiple \underline{I}nteracting \underline{P}eople under Weak \underline{S}upervision. \textbf{REMIPS} can reconstruct a variable number of people directly from monocular images. At the core of our methodology stands a novel transformer network that combines unordered person tokens (one for each detected human) with positional-encoded tokens from image features patches. We introduce a novel unified model for self- and interpenetration-collisions based on a mesh approximation computed by applying decimation operators. We rely on self-supervised losses for flexibility and generalisation in-the-wild and incorporate self-contact and interaction-contact losses directly into the learning process. With \textbf{REMIPS}, we report state-of-the-art quantitative results on common benchmarks even in cases where no 3D supervision is used. Additionally, qualitative visual results show that our reconstructions are plausible in terms of pose and shape and coherent for challenging images, collected in-the-wild, where people are often interacting.
Author Information
Mihai Fieraru (Google / IMAR)
Mihai Zanfir (IMAR)
Teodor Szente (Google Research)
Eduard Bazavan (Google)
Vlad Olaru (Karlsruhe Institute of Technology)
Cristian Sminchisescu (Lund University/Google)
More from the Same Authors
-
2021 Spotlight: H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion »
Hongyi Xu · Thiemo Alldieck · Cristian Sminchisescu -
2021 Poster: Relative Flatness and Generalization »
Henning Petzka · Michael Kamp · Linara Adilova · Cristian Sminchisescu · Mario Boley -
2021 Poster: H-NeRF: Neural Radiance Fields for Rendering and Temporal Reconstruction of Humans in Motion »
Hongyi Xu · Thiemo Alldieck · Cristian Sminchisescu -
2018 Poster: Deep Network for the Integrated 3D Sensing of Multiple People in Natural Images »
Andrei Zanfir · Elisabeta Marinoiu · Mihai Zanfir · Alin-Ionut Popa · Cristian Sminchisescu -
2018 Spotlight: Deep Network for the Integrated 3D Sensing of Multiple People in Natural Images »
Andrei Zanfir · Elisabeta Marinoiu · Mihai Zanfir · Alin-Ionut Popa · Cristian Sminchisescu -
2011 Poster: Probabilistic Joint Image Segmentation and Labeling »
Adrian Ion · Joao Carreira · Cristian Sminchisescu -
2011 Spotlight: Probabilistic Joint Image Segmentation and Labeling »
Adrian Ion · Joao Carreira · Cristian Sminchisescu -
2010 Poster: Convex Multiple-Instance Learning by Estimating Likelihood Ratio »
Fuxin Li · Cristian Sminchisescu -
2009 Poster: Efficient Match Kernel between Sets of Features for Visual Recognition »
Liefeng Bo · Cristian Sminchisescu -
2009 Spotlight: Efficient Match Kernel between Sets of Features for Visual Recognition »
Liefeng Bo · Cristian Sminchisescu -
2007 Poster: People Tracking with the Laplacian Eigenmaps Latent Variable Model »
Zhengdong Lu · Miguel A. Carreira-Perpinan · Cristian Sminchisescu