NeurIPS 2024 Competition Track Program

Below you will find a brief summary of accepted competitions NeurIPS 2024.

Competitions are grouped by category, all prizes are tentative and depend solely on the organizing team of each competition and the corresponding sponsors. Please note that all information is subject to change, visit the competition websites regularly and contact the organizers of each competition directly for more information.

Physics and Scientific Computing

MyoChallenge 2024: Physiological Dexterity and Agility in Bionic Humans

Vittorio Caggiano (MyoLab), Guillaume Durandau (McGill University), Seungmoon Song (Northeastern University), Chun Kwang Tan (Northeastern University), Huiyi Wang (McGill University), Balint Hodossy (Imperial College London), Pierre Schumacher (Max-Planck Institute), Letizia Gionfrida (King's College London), Massimo Sartori (University of Twente), Vikash Kumar (MyoLab)

Contact: myosuite@gmail.com

Limb loss represents a traumatic and destabilizing event in human life, significantly impacting an individual's quality of life and independence. Advancements in bionic prosthetic limbs offer a remarkable opportunity to regain mobility and functionality. Bionic limb human users (Bionic Humans) are able to learn to use those prosthetic extensions to compensate for their lost limb, and reclaim aspects of their former motor abilities. The movement generalization and environment adaptability skills displayed by humans using bionic extensions are a testament to motor intelligence, a capability yet unmatched by current artificial intelligence agents.

To this end, we propose to organize MyoChallenge 2024: Physiological Dexterity and Agility in Bionic Humans, where we will provide a highly detailed neuromechanical and robotic simulation environment and invite experts worldwide to develop any type of controller for both the biological (muscle) and mechanical (bionic), including state-of-the-art reinforcement learning to solve a series of dexterous motor tasks involving human-to-bionic-limb interaction.

Building on the success of the MyoChallenge on the NeurIPS 2022 and 2023 editions, this year's challenge will push the boundaries on how symbiotic human-robotic interaction needs to be coordinated to produce agile and dexterous behaviours. This year MyoChallenge will have two tracks: manipulation and locomotion. The manipulation track will require bi-manual coordination of the BionicMyoArms model -- a combination of a virtual biological arm and a Modular Prosthetic Limb (MPL). The goal will be to coordinate the use of those two limbs to manipulate a series of objects. In the locomotion track, we will exploit a new BionicMyoLegs model made from the combination of a virtual bilateral biological leg with a trans-femoral amputation together with an Open Source prosthetic Leg The goal will be to coordinate the musculo-skeleto-bionic model to navigate challenging terrains and obstacles in an oval running loop. This running circuit is inspired by the Paralympic steeplechase and Cybathlon.

FAIR Universe – the challenge of handling uncertainties in fundamental science

David Rousseau (Université Paris-Saclay), Wahid Bhimji (Lawrence Berkeley National Lab), Paolo Calafiura (Lawrence Berkeley National Lab), Ragansu Chakkappai (Université Paris-Saclay), Yuan-Tang Chou (University of Washington), Sascha Diefenbacher (Lawrence Berkeley National Lab), Steven Farrell (Lawrence Berkeley National Lab), Aishik Ghosh (UC Irvine) Isabelle Guyon (ChaLearn, Google), Chris Harris (Lawrence Berkeley National Lab), Elham E Khoda (University of Washington), Benjamin Nachman (Lawrence Berkeley National Lab), Yulei Zhang (Lawrence Berkeley National Lab), Ihsan Ullah (ChaLearn)

Contact: fair-universe@lbl.gov

We propose a challenge organised in conjunction with the Fair Universe project, a collaborative effort funded by the US Department of Energy and involving the Lawrence Berkeley National Laboratory, Université Paris-Saclay, University of Washington, and ChaLearn. This initiative aims to forge an open AI ecosystem for scientific discovery. The challenge will focus on measuring the physics properties of elementary particles with imperfect simulators due to differences in modelling systematic errors. Additionally, the challenge will leverage a large-compute-scale AI platform for sharing datasets, training models, and hosting machine learning competitions. Our challenge will bring together the physics and machine learn- ing communities to advance our understanding and methodologies in handling systematic (otherwise known as epistemic) uncertainties within AI techniques.

BELKA: The Big Encoded Library for Chemical Assessment

Andrew Blevins (Leash Biosciences), Brayden J Halverson (Leash Biosciences), Nate Wilkinson (Leash Biosciences), Ian K Quigley (Leash Biosciences)

Small molecule drugs are often discovered using a brute force physical search, wherein scientists test for interactions between candidate drugs and their protein targets in a laboratory setting. As druglike chemical space is large (10^60), more efficient methods to search through this space are desirable. To enable the discovery and application of such methods, we generated the Big Encoded Library for Chemical Assessment (BELKA), roughly 3.6B physical binding measurements between 133M small molecules and 3 protein targets using DNA-encoded chemical library technology. We hope this dataset encourages the community to explore methods to represent small molecule chemistry and predict likely binders using chemical and protein target structure.

ML4CFD Competition: Harnessing Machine Learning for Computational Fluid Dynamics in Airfoil Design

Mouadh Yagoubi (IRT SystemX), David Danan (IRT SystemX), Milad Leyli-abadi (IRT SystemX), Jocelyn Ahmed Mazari (Ansys, SimAI team), Florent Bonnet (Institut des systèmes intelligents et robotique (ISIR) - Sorbonne Université), Jean-Patrick Brunet (IRT SystemX), Maroua Gmati (IRT SystemX), Asma Farjallah (NVIDIA), Paola Cinnella (Sorbonne Université), Patrick Gallinari (Sorbonne Université, Criteo AI Lab), Marc Schoenauer (INRIA)

Contact: ml4cfd-competition@irt-systemx.fr

The integration of machine learning (ML) techniques for addressing intricate physics problems is increasingly recognized as a promising avenue for expediting simulations. However, assessing ML-derived physical models poses a significant challenge for their adoption within industrial contexts. This competition is designed to promote the development of innovative ML approaches for tackling physical challenges, leveraging our recently introduced unified evaluation framework known as Learning Industrial Physical Simulations (LIPS). Building upon the preliminary edition held from November 2023 to March 20241, this iteration centers on a task fundamental to a well-established physical application: airfoil design simulation, utilizing our proposed AirfRANS dataset. The competition evaluates solutions based on various criteria encompassing ML accuracy, computational efficiency, Out-Of-Distribution performance, and adherence to physical principles. Notably, this competition represents a pioneering effort in exploring ML-driven surrogate methods aimed at optimizing the trade-off between computational efficiency and accuracy in physical simulations. Hosted on the Codabench platform, the competition offers online training and evaluation for all participating solutions.

Generative AI and Large Language Models

HAC: The Hacker-Cup AI Competition

Weiwei Yang (Miscorosft Research), Mark Saroufim (Meta), Joe Isaacson (Meta), Luca Antiga (Lightning AI), Greg Bowyer ( independent), Driss Guessous (Meta), Christian Puhrsch (Meta), Geeta Chauhan (Meta), Supriya Rao (Meta), Margaret Li ( University of Washington), David Harmeyer (Meta), Wesley May (Meta)

Contact: https://discord.gg/wWeN9hTH32

We are launching the first AI track for the popular Meta Hacker Cup programming competition, designed to assess the capabilities of Generative AI in performing autonomous code generation tasks. We aim to test the limits of AI in complex coding challenges and measure the performance gap between AI systems and human programmers. We will provide access to all Hacker Cup problems since 2011 alongside their respective solutions in a multimodal (image and text) format, and utilize the existing Hacker Cup infrastructure for competitor evaluation. Featuring both "open evaluation, open model" and "open evaluation, closed model" tracks, this competition invites diverse participation from research institutions of varied interests and resource constraints, including academic labs, AI startups, large technology companies, and AI enthusiasts. Our goal is to develop and democratize meaningful advancements in code automation with the very first open evaluation process for competitive AI programmers.

LLM Merging: Building LLMs Efficiently through Merging

Derek Tam (University of Toronto), Margaret Li (Meta), Prateek Yadav (University of North Carolina, Chapel Hill), Rickard Brüel-Gabrielsson (MIT), Jiacheng Zhu (MIT), Kristjan Greenewald (MIT-IBM Watson AI Lab, IBM Research), Mikhail Yurochkin (IBM), Mohit Bansal (University of North Carolina at Chapel Hill), Colin Raffel (University of Toronto), Leshem Choshen (IBM)

Contact: llm.merging@gmail.com

Training high-performing large language models (LLMs) from scratch is a notoriously expensive and difficult task, costing hundreds of millions of dollars in compute alone. These pretrained LLMs, however, can cheaply and easily be adapted to new tasks via fine-tuning, leading to a proliferation of models that suit specific use cases. Recent work has shown that specialized fine-tuned models can be rapidly merged to combine capabilities and generalize to new skills. This raises the question: given a new suite of desired skills and design parameters, is it necessary to fine-tune or train yet another LLM from scratch, or can similar existing models be re-purposed for a new task with the right selection or merging procedure? The LLM Merging challenge aims to spur the development and evaluation of methods for merging and reusing existing models to form stronger new models without needing additional training. Specifically, the competition focuses on merging existing publicly-released expert models from Hugging Face, using only minimal compute and additional parameters. The goal will be to develop merged models that outperform existing models and existing merging baselines. Submissions will be judged based on the average accuracy on a set of held-out multiple-choice evaluation tasks and their efficiency. To make the competition as accessible as possible and ensure that the merging procedures are more efficient than fine-tuning, we will enforce a compute budget and focus on merging models with fewer than 8B parameters. A starter kit with all necessary materials (baseline implementations, requirements, the evaluation script, etc.) will be released on May 1st.

Edge-Device Large Language Model Competition

Shiwei Liu (University of Oxford), Kai Han (Huawei Noah’s Ark Lab), Adriana Fernandez-Lopez (Meta AI), Ajay Kumar Jaiswal (University of Texas at Austin), Zahra Atashgahi (University of Twente), Boqian Wu (University of Luxembourg), Edoardo Ponti (University of Edinburgh), Callie Hao (Georgia Institute of Technology), Rebekka Burkholz (Helmholtz Center CISPA), Olga Saukh (Graz University of Technology), Lu Yin (University of Surrey), Tianjin Huang (University of Exeter), Andreas Zinonos (Imperial College London), Jared Tanner (University of Oxford), Yunhe Wang (Huawei Noah’s Ark Lab)

Contact: edgellmschallenge@gmail.com

The Edge-Device Large Language Model Competition seeks to explore the capabilities and potential of large language models (LLMs) deployed directly on edge devices. The incredible capacity of LLMs makes it extremely tantalizing to be applied to practical edge devices to enable wide applications of LLMs in various disciplines. However, the massive size of LLMs poses significant challenges for edge devices where the computing resources and memory are strictly limited. For instance, deploying a small-scale 10B LLM could require up to 20GB of main memory (DRAM) even after adopting INT8 quantization, which unfortunately has exceeded the memory of most commodity smartphones. Besides, the high energy consumption of LLMs will drain smartphones' battery quickly. To facilitate applications of LLMs in a wide range of practical scenarios, we propose this timely competition to encourage practitioners in both academia and industry to come up with effective solutions for this pressing need. By challenging participants to develop efficient and optimized models that can run on resource-constrained edge devices, the competition aims to address critical economic and environmental issues related to LLMs, foster interdisciplinary research collaborations, and enhance the privacy and security of AI systems.

Multiagent Systems and Reinforcement Learning

Auto-Bidding in Large-Scale Auctions: Learning Decision-Making in Uncertain and Competitive Games

Jian Xu (Alibaba Group), Zhilin Zhang (Alibaba Group), Zongqing Lu (Peking University), Xiaotie Deng (Peking University), Michael P. Wellman (University of Michigan), Chuan Yu (Alibaba Group), Shuai Dou (Alibaba Group), Yusen Huo (Alibaba Group), Zhiwei Xu (Alibaba Group), Zhijian Duan (Peking University), Shaopan Xiong (Alibaba Group), Chuang Liu (Alibaba Group), Ningyuan Li (Peking University), Kefan Su (Peking University), Wei Gong (Alibaba Group), Bo Zheng (Alibaba Group)

Contact: neurips2024@alibaba-inc.com

Decision-making in large-scale games is an essential research area in artificial intelligence with significant real-world impact. An agent confronts the critical task of making high-frequency strategic decisions in an uncertain and competitive environment, characterized by significant randomness and rapidly changing strategies from massive competitors. However, the shortage of large-scale, realistic game systems and datasets has hindered research progress in this area. To provide opportunities for in-depth research on this highly valuable problem, we present the Auto-Bidding in Large-Scale Auctions challenge derived from online advertising, a booming $626.8 billion industry in 2023. We have developed a standardized ad auction system for the competition, which reproduces the characteristics of real-world large-scale games and incorporates essential features that deserve research attention. We also provide a training framework with a 500-million-record dataset and several industry-proven methods as baselines to help participants quickly start and deeply optimize their strategies. Furthermore, we have prepared a comprehensive promotional strategy, raised sufficient funds, and offered varied incentives to attract more participants from diverse backgrounds. We believe that the proposed competition will provide opportunities for more researchers to gain insights and conduct research in this field, driving technical innovation for both research and real-world practical applications.

Lux AI Season 3: Multi-Agent Meta Learning at Scale

Stone Tao (University of California, San Diego), Akarsh Kumar (MIT), Bovard Doerschuk-Tiberi (Kaggle), Isabelle Pan (University of California, San Diego), Addision Howard (Kaggle), Hao Su (University of California, San Diego)

Contact: luxaichallenge@gmail.com

The proposed competition revolves around testing the limits of agents (e.g rule-based or Meta RL agents) when it comes to adapting to a game with changing dynamics. We propose a unique 1v1 competition format where both teams face off in a sequence of 5 games. The game mechanics, along with partial observability are designed to ensure that optimal gameplay requires agents to efficiently explore and discover the game dynamics. They ensure that the strongest agents may play "suboptimally" in game 1 to explore, but then win easily in games 2 to 5 by leveraging information gained through game 1 and adapting. This competition provides a GPU parallelized game environment via jax to enable fast training/evaluation on a single GPU, lowering barriers of entry to typically industry-level scales of research. Participants can submit their agents to compete against other submitted agents on a online leaderboard hosted by Kaggle ranked by a Trueskill ranking system. The results of the competition will provide a dataset of top open-sourced rule-based agents as well as many game episodes that can lead to unique analysis (e.g. quantifying emergence/surprise) past competitions cannot usually provide thanks to the number of competitors the Lux AI Challenges often garner

The Concordia Contest: Advancing the Cooperative Intelligence of Language Model Agents

Chandler Smith (MATS), Rakshit S. Trivedi (MIT), Jesse Clifton (Cooperative AI Foundation, Center on Long-Term Risk), Lewis Hammond (Cooperative AI Foundation, Oxford), Akbir Khan (Cooperative AI Foundation, UCL), Marwa Abdulhai (UC Berkely), Alexander Sasha Vezhnevets (Google Deepmind), John P. Agapiou (Google Deepmind), Edgar A. Duéñez-Guzmán (Google Deepmind), Jayd Matyas (Google Deepmind), Danny Karmon (Google Research), Dylan Hadfield-Menell (MIT), Natasha Jaques (Google Deepmind, UW), Tim Baarslag (Centrum Wiskunde & Informatica, Utrecht University), Joel Z. Leibo (Google Deepmind)

Contact: contest@cooperativeai.org

Competition hosting Site: https://www.codabench.org/competitions/3888/

Building on the success of the Melting Pot contest at NeurIPS 2023, which challenged participants to develop multi-agent reinforcement learning agents capable of cooperation in groups, we are excited to propose a new contest centered on cooperation between language model (LM) agents in intricate, text-mediated environments. Our goal is to advance research on the cooperative intelligence of such LM agents. Of particular interest are the agents capable of using natural language to effectively cooperate with each other in complex environments, even in the face of challenges such as competing interests, differing values, and potential miscommunication. To this end, we will leverage the recently released Concordia framework, an open-source library for defining open-ended environments where LM agents like those of Park et al. (2023) can interact with one another by generating free-form natural text describing what they intend to do or say. Concordia provides a suite of mixed-motive social dilemma scenarios where cooperation is valuable but hard to achieve. The proposed contest will challenge the participants to develop LM agents that exhibit cooperative intelligence in a variety of Concordia scenarios designed to assess multiple distinct skills of cooperation, including promise-keeping, negotiation, reciprocity, reputation, partner choice, compromise, and sanctioning. Participants will be scored based on the ability of their trained agents in executing skillful cooperation, particularly in the presence of new co-players in unforeseen (held-out) scenarios. Given the rapid development of LMs and the anticipated increase in the use of personalised LM agents, we contend that their propensity and ability to cooperate well with a diverse array of other actors (human or machine) will soon be of critical importance.

Signal Reconstruction and Enhancement

URGENT Challenge

Wangyou Zhang, Robin Scheibler, Kohei Saijo, Samuele Cornell, Chenda Li, Zhaoheng Ni, Anurag Kumar, Marvin Sach, Wei Wang, Yihui Fu, Shinji Watanabe, Tim Fingscheidt, Yanmin Qian

Contact: urgent.challenge@gmail.com

Speech enhancement (SE) is the task of improving the quality of the desired speech while suppressing other interference signals. Tremendous progress has been achieved in the past decade in deep learning-based SE approaches. However, existing SE studies are often limited in one or multiple aspects of the following: coverage of SE sub-tasks, diversity and amount of data (especially real-world evaluation data), and diversity of evaluation metrics. As the first step to fill this gap, we establish a novel SE challenge, called URGENT, to promote research towards universal SE. It concentrates on the universality, robustness, and generalizability of SE approaches. In the challenge, we extend the conventionally narrow SE definition to cover different sub-tasks, thus allowing the exploration of the limits of current SE models. We start with four SE sub-tasks, including denoising, dereverberation, bandwidth extension, and declipping. Note that handling the above sub-tasks within a single SE model has been challenging and underexplored in the SE literature due to the distinct data formats in different tasks. As a result, most existing SE approaches are only designed for a specific subtask. To address this issue, we propose a technically novel framework to unify all these sub-tasks in a single model, which is compatible to most existing SE approaches. Several state-of-the-art baselines with different popular architectures have been provided for this challenge, including TF-GridNet, BSRNN, and Conv-TasNet. We also take care of the data diversity and amount by collecting abundant public speech and noise data from different domains. This allows for the construction of diverse training and evaluation data. Additional real recordings are further used for evaluating robustness and generalizability. Different from existing SE challenges, we adopt a wide range of evaluation metrics to provide comprehensive insights into the true capability of both generative and discriminative SE approaches. We expect this challenge would not only provide valuable insights into the current status of SE research, but also attract more research towards building universal SE models with strong robustness and good generalizability.

Weather4cast 2024 – Multi-task Challenges for Rain Movie Prediction on the Road to Hi-Res Foundation Models

Aleksandra Gruca (Silesian Unviversity of Technology), Pilar Rípodas (AEMET - Agencia Estatal de Meteorología), Xavier Calbet (AEMET - Agencia Estatal de Meteorología), Llorenç Lliso (AEMET - Agencia Estatal de Meteorología), Federico Serva (Institute of Marine Sciences (CNR-ISMAR), Italy), Bertrand Le Saux (European Space Agency), David P. Kreil (Boku University Vienna), Sepp Hochreiter (IARAI)

Contact: w4c24@weather4cast.org

The competition will advance modern algorithms in AI and machine learning through a highly topical interdisciplinary competition challenge: The prediction of hi-res rain radar movies from multi-band satellite sensors requires data fusion of complementary signal sources, multi-channel video frame prediction, as well as super-resolution techniques. To reward models that extract relevant mechanistic patterns reflecting the underlying complex weather systems our evaluation incorporates spatio-temporal shifts: Specifically, algorithms need to forecast several hours of ground-based hi-res precipitation radar from lo-res satellite spectral images in a unique cross-sensor prediction challenge. Models are evaluated within and across regions on Earth with diverse climate and different distributions of heavy precipitation events. Conversely, robustness over time is achieved by testing predictions on data one year after the training period.

Now, in its third year, Weather4acst 2024 aims to improve rain forecasts world-wide on an expansive data set with over a magnitude more hi-res rain radar data, allowing a move towards Foundation Models through multi-modality, multi-scale, multi-task challenges. Accurate rain predictions are becoming ever more critical for everyone, with climate change increasing the frequency of extreme precipitation events. Notably, the new models and insights will have a particular impact for the many regions on Earth where costly weather radar data are not available. Join us on www.weather4cast.net!

Ariel Data Challenge 2024: Extracting exoplanetary signals from the Ariel Space Telescope

Kai Hou Yip (University College London), Lorenzo V. Mugnai (Cardiff University), Andrea Bocchieri (Sapienza Università di Roma), Andreas Papageorgiou (Cardiff University), Orphée Faucoz (Centre National d’Etudes Spatiales), Tara Tahseen (University College London), Virginie Batista (Institut d'astrophysique de Paris), Angèle Syty (Université Paris-Saclay), Enzo Pascale (Sapienza Università di Roma), Quentin Changeat (European Space Agency), Billy Edwards (SRON, Netherlands Institute for Space Research), Paul Eccleston (STFC RAL), Clare Jenner (Distributed Research utilising Advanced Computing (DiRAC)), Ryan King (UK Space Agency), Theresa Lueftinger (European Space Agency), Nikolaos Nikolaou (University College London), Pascale Danto (CNES), Sudeshna Boro Saikia (University of Vienna), Luís F. Simões (ML Analytics), Giovanna Tinetti (University College London), Ingo P. Waldmann (University College London / The Alan Turing Institute)

Contact: https://www.ariel-datachallenge.space/

The Ariel Data Challenge 2024 tackles one of astronomy's hardest data analysis problems - extracting faint exoplanetary signals from noisy space telescope observations like the upcoming Ariel Mission. A major obstacle are systematic noise sources, such as jitter noise" arising from spacecraft vibrations, which corrupts spectroscopic data used to study exoplanet atmospheres. This complex spatio-temporal noise challenges conventional parametric denoising techniques. In this challenge, the jitter time series is simulated based on Ariel's payload design and other noise effects are taken from in-flight data from JWST, in order to provide a realistic representation of the effect.

To recover minute signals from the planet's atmosphere, participants must push boundaries of current approaches to denoise this multimodality data across image, time, and spectral domains. This requires novel solutions for non-Gaussian noise, data drifts, uncertainty quantification, and limited ground truth. Success will directly improve the Ariel pipeline design and enable new frontiers in characterising exoplanet atmospheres - a key science priority in the coming decades for understanding planetary formation, evolution, and habitability.

Responsible AI and Security

CLAS 2024: The LLM and Agent Safety Competition

Zhen Xiang (UIUC), Yi Zeng (VT), Mintong Kang (UIUC), Chejian Xu (UIUC), Jiawei Zhang (UIUC), Zhuowen Yuan (UIUC), Zhaorun Chen (UChicago), Chulin Xie (UIUC), Fengqing Jiang (UW), Minzhou Pan (Northeastern University), Junyuan Hong (UT Austin), Ruoxi Jia (VT), Radha Poovendran (UW), Bo Li (UIUC, UChicago)

Contact: clas2024-updates@googlegroups.com

Ensuring safety emerges as a pivotal objective in developing large language models (LLMs) and LLM-powered agents. The Competition for LLM and Agent Safety (CLAS) aims to advance the understanding of the vulnerabilities in LLMs and LLM-powered agents and to encourage methods for improving their safety. The competition features three main tracks linked through the methodology of prompt injection, with tasks designed to amplify societal impact by involving practical adversarial objectives for different domains. In the Jailbreaking Attack track, participants are challenged to elicit harmful outputs in guardrail LLMs via prompt injection. In the Backdoor Trigger Recovery for Models track, participants are given a CodeGen LLM embedded with hundreds of domain-specific backdoors. They are asked to reverse-engineer the trigger for each given target. In the Back- door Trigger Recovery for Agents track, trigger reverse engineering will be focused on eliciting specific backdoor targets based on malicious agent actions. As the first competition addressing the safety of both LLMs and LLM agents, CLAS 2024 aims to foster collaboration between various communities promoting research and tools for enhancing the safety of LLMs and real-world AI systems.

Erasing the Invisible: A Stress-Test Challenge for Image Watermarks

Mucong Ding (University of Maryland, College Park), Tahseen Rabbani (University of Maryland, College Park), Bang An (University of Maryland, College Park), Souradip Chakraborty (University of Maryland, College Park), Chenghao Deng (University of Maryland, College Park), Mehrdad Saberi (University of Maryland, College Park), Yuxin Wen (University of Maryland, College Park), Xuandong Zhao (UC Santa Barbara), Mo Zhou (Johns Hopkins University), Anirudh Satheesh (University of Maryland, College Park), Mary-Anne Hartley (Yale University), Lei Li ( Carnegie Mellon University), Yu-Xiang Wang (UC Santa Barbara), Vishal M. Patel (Rutgers University), Soheil Feizi (University of Maryland, College Park), Tom Goldstein (University of Maryland, College Park), Furong Huang (University of Maryland, College Park)

Contact: erasinginvisible@googlegroups.com

"Erasing the Invisible" is a pioneering competition designed to rigorously stress-test image watermarks, aiming to enhance their robustness significantly. Its standout feature is the introduction of dual tracks for black-box and beige-box attacks, providing a nuanced approach to validate the reliability and robustness of watermarks under varied conditions of visibility and knowledge. The competition spans from July 18 to October 31, inviting individuals and teams to register and participate in a dynamic challenge. Throughout the competition, employing a dataset of 10k images accessed through the Hugging Face API, competitors will receive updated evaluation results on a rolling basis and submit their refined techniques for the final evaluation, which will be conducted on an extensive dataset of 50k images. The evaluation process of this competition not only emphasizes the effectiveness of watermark removal but also highlights the critical importance of maintaining image quality, with results reflected on a continuously updated leaderboard. "Erasing the Invisible" promises to elevate watermarking technology to new heights of resilience, setting a precedent for future research and application in digital content security and safeguarding against unauthorized use and misinformation in the digital age.

The NeurIPS 2024 LLM Privacy Challenge

Qinbin Li (UC Berkeley), Junyuan Hong (UT Austin), Chulin Xie (UIUC), Junyi Hou (NUS), Yiqun Diao (NUS), Zhun Wang (UC Berkeley), Dan Hendrycks (Center for AI Safety), Zhangyang Wang (UT Austin), Bo Li (UChicago), Bingsheng He (NUS), Dawn Song (UC Berkeley)

Contact: llmpc2024.info@gmail.com

The NeurIPS 2024 LLM Privacy Challenge is designed to address the critical issue of privacy in the use of Large Language Models (LLMs), which have become fundamental in a wide array of artificial intelligence applications. This competition acknowledges the potential privacy risks posed by the extensive datasets used to train these models, including the inadvertent leakage of sensitive information. To mitigate these risks, the challenge is structured around two main tracks: the Red Team, focusing on identifying and exploiting privacy vulnerabilities, and the Blue Team, dedicated to developing defenses against such vulnerabilities. Participants will have the option to work with LLMs fine-tuned on synthetic private data or LLMs interacting with private system/user prompts, thus offering a versatile approach to tackling privacy concerns. The competition will provide participants with access to a toolkit designed to facilitate the development of privacy-enhancing methods, alongside baselines for comparison. Submissions will be evaluated based on attack accuracy, efficiency, and the effectiveness of defensive strategies, with prizes awarded to the most innovative and impactful contributions. By fostering a collaborative environment for exploring privacy-preserving techniques, the NeurIPS 2024 LLM Privacy Challenge aims to catalyze advancements in the secure and ethical deployment of LLMs, ensuring their continued utility in sensitive applications without compromising user privacy.

Program Committee

We are very grateful to the colleagues that helped us review and select the competition proposals for this year

Aleksandr Panov (Artificial Intelligence Research Institute)
Aleksandra Gruca (Silesian Unviversity of Technology)
Annika Reinke (German Cancer Research Center)
Aravind Mohan (Roblox)
Ashwin Hegde (NobleAI)
Björn Schuller (Technische Universität München)
Byron V Galbraith (Bloomfire)
Chris Cameron (,University of British Columbia)
Christian Eichenberger (iarai.ac.at)
David P Kreil (Boku University Vienna)
David Rousseau (IJCLab)
Dina Bashkirova (Boston University)
Dominik Baumann (Aalto University)
Emilio Cartoni (Istituto di Scienze e Tecnologie della Cognizione)
Erhan Bilal (International Business Machines)
Evelyne Viegas (University of Washington)
Geoffrey Siwo (University of Michigan - Ann Arbor)
Gregory Clark (Google)
Haozhe Sun (Université Paris-Saclay)
Harald Carlens (ML Contests)
Iuliia Kotseruba (York University)
Jean-roch Vlimant (California Institute of Technology)
Jun Ma (University of Toronto)
Karolis Jucys (University of Bath)
Louis-Guillaume Gagnon (University of California, Berkeley)
Mantas Mazeika (University of Illinois, Urbana-Champaign)
Mikhail Burtsev (London Institute for Mathematical Sciences)
Moritz Neun (Kaiko)
Odd Erik Gundersen (Norwegian University of Science and Technology)
Parth Patwa (Amazon)
Pranay Manocha (Princeton University)
Ryan Holbrook (Kaggle)
Sahika Genc (University of Michigan)
Tabitha Edith Lee (Lockheed Martin)
Tianjian Zhang (The Chinese University of Hong Kong, Shenzhen)
Yingshan Chang (Carnegie Mellon University)
Zhen Xu (Tsinghua University)
Ziqian Luo (Oracle)