NeurIPS 2025 Datasets and Benchmarks FAQ

This FAQ will be continually updated. Please bookmark this page and review it before submitting any questions.

Note: Authors are also advised to consult the NeurIPS Main Track FAQs, as general policies apply to D&B submissions as well.

General FAQs

What is the LaTeX template for D&B track?

It’s the same as the main track template. Check “Paper Formatting Instructions” here (https://neurips.cc/Conferences/2025/CallForPapers)

Are there guidelines for submissions which are from the 2024 Competitions track, e.g., reporting on competition results?

No, there are no special guidelines. Please follow the D&B CFP (this page) and data hosting guidelines. Your submission will be reviewed according to the same standards alongside all other D&B track submissions.

KDD 2025 notification date is May 16th which is a day after the NeurIPS D&B track full paper submission deadline. Will this change?

No. We will not be making changes to the D&B track deadlines as they align with the main conference.

Is my paper a good fit for D&B track?

Please carefully read the CFP (this page) and use your best judgment. Track chairs cannot advise on the relevance of your paper.

Are dataset/code submissions due at May 15th (full paper deadline) or May 22nd (appendices/supplementary material deadline)?

Datasets/code are not supplementary materials in the D&B track. If your submission includes data/code, it needs to be submitted in full and final form by May 15th along with the full paper submission.

What is the LaTeX configuration for a single-blind submission?

Please use \usepackage[preprint]{neurips_2025} if you wish to make your submission single-blind for the Datasets & Benchmarks track.

My submission is a benchmark consisting of an environment for evaluation only. Do I need to follow the data-hosting guidelines?

No. If your submission is not a dataset, you do not need to follow data-hosting guidelines. You do need to follow code-hosting guidelines.

Dataset hosting FAQs

The Croissant format can’t handle the file type(s) in my dataset submission. What should I do?

You should still submit a Croissant file. You can choose to only provide dataset-level metadata and a description of the resources contained in the dataset (FileObject and FileSet). You can omit RecordSets in this scenario. The recommended Croissant-compatible data hosting platforms should handle this gracefully for you, but you will need to manually address this in case you decide to self-host your dataset.

I have a submission consisting of multiple datasets, how do I submit the Croissant files?

You should submit a Croissant file for every dataset (and check whether they are all valid). You can combine the .json files into a .zip folder and upload that during submission. In the dataset URL, refer to a webpage that documents the collection of datasets as a whole. The URLs for the individual datasets must be included in the Croissant files.

How do we handle our submission which includes a private hold-out set which we wish to keep private and unreleased, e.g., to avoid potential contamination?

You should mention that you have a private hold-out set and describe it in your paper, but the main contribution of your paper should be the publicly released portion of your dataset. The publicly released portion of your dataset needs to conform to the data hosting guidelines. It also may contain a public validation and test set collected with the same protocol as the private one.

Can my submission be a synthetic dataset?

Yes. All data hosting guidelines apply to synthetic datasets, too.

How should I include code as part of my submission?

Please see the guidelines for code. You will be asked to provide a URL to a hosting platform (e.g., GitHub, Bitbucket). If that is not an option, you can alternatively attach it as a ZIP file as supplementary material with your paper. All code should be documented and executable.

I don’t want to make my dataset publicly accessible at the time of submission. What are my options?

Harvard Dataverse and Kaggle platforms both offer private URL preview link sharing. This means your dataset is only accessible to those with whom you share its special URL, e.g., reviewers. Note that you will be required to make your dataset public by the camera-ready deadline. Failure to do so may result in removal from the conference and proceedings.

Can I make changes to my dataset after I’ve made my submission to Open Review?

You can make changes until the submission deadline. After the submission deadline, we will perform automated verification checks of your dataset to assist in streamlining and standardizing reviews. If it changes in a way that invalidates the original reviews at any time between the submission deadline and by the camera ready deadline or publication of proceedings, we reserve the right to remove it from the conference or proceedings.

I’m experiencing problems with the platform I’m using to release my dataset. What should I do?

We have worked with maintainers of the dataset hosting platforms to identify the appropriate contact information authors should use to contact for support in case of issues or help with workarounds for storage quotas, etc. Find this contact information in the section above “How to Publish on Preferred Hosting Platforms”.

I need to require credentialized (AKA gated) access to my dataset

This will be possible on the condition that a credentialization is necessary for the public good (e.g. because of ethically sensitive medical data), and that an established credentialization procedure is in place that is 1) open to a large section of the public; 2) provides rapid response and access to the data; and 3) is guaranteed to be maintained for many years. A good example here is PhysioNet Credentialing, where users must first understand how to handle data with human subjects, yet is open to anyone who has learned and agrees with the rules.
This should be seen as an exceptional measure, and NOT as a way to limit access to data for other reasons (e.g., to shield data behind a Data Transfer Agreement). Misuse would be grounds for desk rejection. During submission, you can indicate that your dataset involves open credentialized access, in which case the necessity, openness, and efficiency of the credentialization process itself will also be checked.

Our dataset requires credentialized access. How do we preserve single-blind review, i.e., ensure the identities of reviewers isn’t shared with authors?

If it’s possible to share a private preview link rather than requiring credentials, you may try that. Or, you can make an account, give it view access to the dataset, and share login details with reviewers. After submission, you can send a private message visible only to reviewers on Open Review.

I have an extremely large (> 1 TB) dataset. How do I allow reviewers to properly evaluate it?

Please make sure that the full dataset is available at submission time. You can *in addition* provide ways to help reviewers explore your dataset. This could be a notebook that downloads a portion of the data and helps you explore it, a smaller data sample (ideally hosted in the same way), or a bespoke solution appropriate for your dataset. If you make a sample, also motivate how you created that sample.

Our submission involves using existing public datasets. Do we need to host these according to the data hosting guidelines?

No, but you should make any code used to modify or otherwise use the public datasets, e.g., for a new benchmark that you’re submitting, accessible and executable (meaning you will need to provide publicly accessible links to the data sources used). You also should not claim the existing public datasets as part of your submission.

The online app offered to check the validity of croissant files runs for a long time and times out.

This can happen when you have a dataset on Hugging Face. The app may be rate-limited which causes an error and automatic restarts. If this happens, we recommend validating your Croissant file locally. You can click the three dots at the top right of the app to get code to run it locally, or clone the repository and run it in your own HF Space.