Workshop

5th Workshop on Self-Supervised Learning: Theory and Practice

XuDong Wang · Ishan Misra · Mathilde Caron · Tengda Han · Pengtao Xie

Project Page [ OpenReview]

Abstract

At NeurIPS from 2020 to 2024, we successfully organized the 1st, 2nd, 3rd and 4t workshops on Self-Supervised Learning – Theory and Practice. These events attracted a diverse audience from multiple domains, including vision, speech, NLP, robotics, ML theory, and industry practitioners. Building on the success of these previous workshops, we are excited to continue organizing the workshop on self-supervised learning this year. Self-supervised learning (SSL) is an approach of representation learning that does not rely on human-labeled data. Instead, it creates auxiliary tasks from unlabeled input data and learns representations by solving these tasks. SSL has shown significant success across various domains such as images (e.g., MAE, DINO, MoCo, PIRL, SimCLR), speech (e.g., wav2vec, Whisper), and text (e.g., BERT, GPT, Llama). It has also demonstrated promising results in other data modalities including graphs, time-series, and audio. Recent large language models—predominantly trained on web-scale data using self-supervised methods—have exhibited remarkable generalizability and are beginning to transform numerous research fields. SSL, without using human-provided labels, can achieve performance comparable to or even surpassing that of fully supervised methods. Furthermore, generative SSL techniques such as Imagen, Stable Diffusion, and SORA have significantly enhanced the artistic capabilities of AI models. Existing research on self-supervised learning (SSL) has primarily concentrated on enhancing empirical performance without substantial theoretical underpinnings. Although SSL approaches are empirically effective across various benchmarks, their theoretical foundations and practical applications remain less explored. Key questions such as the reasons behind the superior performance of certain auxiliary tasks, the requisite amount of unlabeled data for learning effective representations, the impact of neural architectures on SSL performance, and the practical scenarios where SSL outperforms supervised models, are still largely unanswered. Our workshop aims to address these gaps by fostering a dialogue between theory and practice, especially in the context of LLMs. We plan to gather researchers interested in SSL from diverse fields to explore the theoretical bases of empirically successful SSL methods and to discuss how these theoretical insights could further enhance SSL’s practical performance. This workshop will differentiate itself from previous SSL-related workshops by prioritizing the establishment of theoretical foundations and providing theoretical frameworks to guide the development of new SSL methods. Additionally, we will attempt to close the loop from practice to theory, by inviting practitioners to share their experiences and insights regarding the practical advantages and challenges of using SSL

Video

Chat is not available.

Schedule

Timezone: America/Los_Angeles

8:30 AM

Poster Setup

9:00 AM

Opening Remarks

XuDong Wang

Video

9:15 AM

Sherry Yang (Google DeepMind): Self-Supervised World Modeling from Internet Data

Sherry Yang

Video