Skip to yearly menu bar Skip to main content

Affinity Workshop: WiML Workshop 1

Self-Supervision for Scene Graph Embeddings

Brigit Schroeder · Adam Smith · Subarna Tripathi


Scene graph embeddings are used in applications suchas image retrieval, image generation and image captioning.Many of the models for these tasks are trained on largedatasets such as Visual Genome, but the collection of these human-annotated datasets is costly and onerous. We seek to improve scene graph embedding representation learning by leveraging the already available data (e.g. the scene graphs themselves) with the addition of self-supervision. In self-supervised learning, models aretrained for pretext tasks which do not depend on manual labels and use the existing available data. However, it is largely unexplored in the area of image scene graphs. In this work, starting from a baseline scene graph embedding model trained on the pretext task of layout prediction, we propose several additional self-supervised pretext tasks. The impact of these additions is evaluated on a downstream retrieval task that was originally associated with the baseline model. Experimentally, we demonstrate that the addition of each task individually and cumulatively improves on ther retrieval performance of the baseline model, resulting in near saturation when all are combined.

Chat is not available.