NeurIPS 2026 Competition Track Reviewing Guidelines

Purpose

A strong competition should generate scientific insight and lasting community resources, not just a leaderboard. Evaluate proposals as infrastructure for advancing the field, not as research papers. A strong proposal enables scientific progress, provides rigorous and fair evaluation, and delivers lasting community value.

Evaluation Criteria

Evaluate proposals along the following dimensions.

1. Scientific relevance and task

Is the task motivated by a clear scientific question, relevant and timely for NeurIPS?
Is the expected impact (scientific, technical, or societal) credible?
Is it meaningfully different from prior competitions and earlier iterations?

2. Data, environments, and resources

Are the artifacts (datasets, environments, evaluators) appropriate, sufficient, and well-described, including governance such as consent and de-identification?
If reused from prior or public sources, what is genuinely new?
Are sources and annotations reliable (e.g., labeling protocol where relevant)?
Are licensing and availability clear, with open release where possible, and external resources (data, pretrained models, tools) appropriately governed?
Are leakage and contamination risks addressed (e.g., feature leakage, training-data contamination, memorization)?

3. Evaluation protocol

Are task inputs, outputs, and success criteria defined without ambiguity?
Are metrics justified and aligned with the scientific question? For human or LLM judging, is the protocol clear?
Are baselines available and credible?
Are held-out components and ranking rules (including tie-breakers) unambiguous?
Are there safeguards against overfitting, gaming, reward hacking, and evaluator manipulation?
Do the rules level the playing field for less well-resourced groups (e.g., compute or model-size limits)?

4. Logistics and organization

Is the timeline realistic, including time for post-competition analysis?
Does the team have the expertise, infrastructure, and diversity to deliver?
Is there a concrete plan to attract participants, including underrepresented groups?
Is there a contingency plan?

5. Ethics and risk

Are privacy, consent, bias, misuse, and harms identified and mitigated, per the NeurIPS Code of Ethics? For agentic competitions, include risks from agent actions.
For humanitarian competitions, is the affected community meaningfully involved, avoiding "parachute science"?

Recommendation and confidence

Please provide:

An overall recommendation: Strong Accept (6), Accept (5), Borderline Accept (4), Borderline Reject (3), Reject (2), or Strong Reject (1)
A confidence score: from educated guess (1) to absolutely certain (5)
A justification grounded in the dimensions above

Common concerns to watch for

Weak or unclear evaluation design
Data leakage, contamination, or evaluator manipulation
Leaderboard overfitting, reward hacking, or gaming
Limited novelty or impact
Unrealistic scope, timeline, or resources
Insufficient plan for participation and inclusion
Unaddressed ethical risks, including from agent actions

Reviewer use of agents and large language models

Our reviewing policy reflects (a) the importance of protecting the confidentiality of submissions, which are made by authors in trust that their work will not be leaked beyond the circle of reviewers and PCs responsible for its review, and (b) that we are still very much in the early stages of identifying where agents and LLMs are useful in improving the quality and efficiency of reviewing, and where they have the opposite effect.

For the Competition Track:

Reviewers may use LLM-based tools to assist their review only if those tools preserve the confidentiality of submissions. Submission material must never be shared with any tool or service that does not meet this standard, consistent with the confidentiality expectations in the NeurIPS Main Track Handbook.
The reviewer is responsible for the entire content of the review, and for ensuring that any tools are used in a scientifically responsible manner.
LLMs may be used only as an aid. Fully automated reviewing is not allowed: we want the reviewer's own genuine, careful engagement with the proposal, not an LLM's assessment or lightly edited LLM output.

Desk-rejectable issues

If you spot a problem that may warrant desk rejection (e.g., the proposal is anonymized despite single-blind review, the page limit is exceeded, organizer bios are missing, or it departs substantially from the call and template), please message the Competition Chairs directly in OpenReview.

General policies

We adopt the NeurIPS Main Track Handbook for all general policies (including conflicts of interest, anti-collusion, and confidentiality of submissions), except where Competition Track–specific rules are explicitly stated (e.g., page limits, single-blind review, use of LLMs).