Skip to yearly menu bar Skip to main content


Poster

DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation

Zhiqi Li · Yiming Chen · Peidong Liu

East Exhibit Hall A-C #1201
[ ] [ Project Page ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract: Recent advancements in 2D/3D generative techniques have facilitated the generation of dynamic 3D objects from monocular videos. Previous methods mainly rely on the implicit neural radiance fields (NeRF) or explicit Gaussian Splatting as the underlying representation, and struggle to achieve satisfactory spatial-temporal consistency and high-quality surface texture. Drawing inspiration from modern 3D animation pipelines, we introduce $\textbf{DreamMesh4D}$, a novel framework that combines mesh representation with sparse-controlled deformation technique to generate high-quality 4D object from a monocular video. To overcome the limitation of classical texture representation, we bind Gaussian splats to the surface of the triangular mesh for differentiable optimization of both the texture and mesh vertices. In particular, DreamMesh4D begins with a coarse mesh provided by a single image based 3D generation method. Sparse points are then uniformly sampled across the surface of the mesh, and are used to build a deformation graph to drive the motion of the 3D object for the sake of computational efficiency and providing additional constraint. For each step, transformations of sparse control points are predicted using a deformation network, and the mesh vertices as well as the bound surface Gaussians are deformed via a geometric skinning algorithm. The skinning algorithm is a hybrid approach combining LBS (linear blending skinning) and DQS (dual-quaternion skinning), mitigating drawbacks associated with both approaches. The static surface Gaussians and mesh vertices as well as the dynamic deformation network are learned via reference view photometric loss, score distillation loss as well as other regularization losses in a two-stage manner. Extensive experiments demonstrate that our method outperforms prior video-to-4D generation methods in terms of rendering quality and spatial-temporal consistency. Furthermore, our mesh-based representation is compatible with modern geometric pipelines, showcasing its potential in the 3D gaming and film industry.

Live content is unavailable. Log in and register to view live content