Poster
in
Workshop: Structured Probabilistic Inference and Generative Modeling

Neural Universal Scene Descriptors

Alejandro Escontrela · Shrinu Kushagra · Sjoerd van Steenkiste · Yulia Rubanova · Aleksander Holynski · Kelsey Allen · Kevin Murphy · Thomas Kipf

Project Page [ OpenReview]

Abstract

Although recent progress in generative modeling has produced models capable of generating high-quality images conditioned on multiple modalities, there exists no common portable representation format for specifying conditioning signals. Instead, conditioning techniques are usually tailor-made for specific model architectures, and limit the user to a small set of control signals. In addition, common approaches are not object centric, meaning the user is not able to control individual objects in the image, and changing the conditioning signal leads to global changes. In contrast, the computer graphics community has developed standards like the Universal Scene Descriptor (USD), which represents scenes and objects in a structured, hierarchical manner. Inspired by USD, we propose the “Neural Universal Scene Descriptor” (Neural USD), a flexible conditioning structure that accommodates diverse signals, minimizes model-specific constraints, and enables per-object control over appearance, geometry, and pose. We further apply a fine-tuning approach that ensures disentangled control signals and evaluate key design considerations for a universal conditioning format, demonstrating how Neural USD enables iterative and incremental workflows.

Chat is not available.