Skip to yearly menu bar Skip to main content


NeurIPS 2025 Datasets and Benchmarks FAQ

 

This FAQ will be continually updated. Please bookmark this page and review it before submitting any questions.

Note: Authors are also advised to consult the NeurIPS Main Track FAQs, as general policies apply to D&B submissions as well.

 

General FAQs
 

What is the LaTeX template for D&B track?

It’s the same as the main track template. Check “Paper Formatting Instructions” here (https://neurips.cc/Conferences/2025/CallForPapers)

 

Are there guidelines for submissions which are from the 2024 Competitions track, e.g., reporting on competition results?

No, there are no special guidelines. Please follow the D&B CFP (this page) and data hosting guidelines. Your submission will be reviewed according to the same standards alongside all other D&B track submissions.

 

KDD 2025 notification date is May 16th which is a day after the NeurIPS D&B track full paper submission deadline. Will this change?

No. We will not be making changes to the D&B track deadlines as they align with the main conference.

 

Is my paper a good fit for D&B track?

Please carefully read the CFP (this page) and use your best judgment. Track chairs cannot advise on the relevance of your paper.

 

Are dataset/code submissions due at May 15th (full paper deadline) or May 22nd (appendices/supplementary material deadline)?

Datasets/code are not supplementary materials in the D&B track. If your submission includes data/code, it needs to be submitted in full and final form by May 15th along with the full paper submission.

 

 

Dataset hosting FAQs
 

The Croissant format can’t handle the file type(s) in my dataset submission. What should I do?

You should still submit a Croissant file. You can choose to only provide dataset-level metadata and a description of the resources contained in the dataset (FileObject and FileSet). You can omit RecordSets in this scenario. The recommended Croissant-compatible data hosting platforms should handle this gracefully for you, but you will need to manually address this in case you decide to self-host your dataset.

 

How do we handle our submission which includes a private hold-out set which we wish to keep private and unreleased, e.g., to avoid potential contamination?

You should mention that you have a private hold-out set and describe it in your paper, but the main contribution of your paper should be the publicly released portion of your dataset. The publicly released portion of your dataset needs to conform to the data hosting guidelines. It also may contain a public validation and test set collected with the same protocol as the private one.

 

Can my submission be a synthetic dataset?

Yes. All data hosting guidelines apply to synthetic datasets, too.

 

How should I include code as part of my submission?

Please see the guidelines for code. You will be asked to provide a URL to a hosting platform (e.g., GitHub, Bitbucket). You can also attach it as a ZIP file as supplementary material with your paper. All code should be documented and executable.

 

I don’t want to make my dataset publicly accessible at the time of submission. What are my options?

Harvard Dataverse and Kaggle platforms both offer private URL preview link sharing. This means your dataset is only accessible to those with whom you share its special URL, e.g., reviewers. Note that you will be required to make your dataset public by the camera-ready deadline. Failure to do so may result in removal from the conference and proceedings.

 

Can I make changes to my dataset after I’ve made my submission to Open Review?

You can make changes until the submission deadline. After the submission deadline, we will perform automated verification checks of your dataset to assist in streamlining and standardizing reviews. If it changes in a way that invalidates the original reviews at any time between the submission deadline and by the camera ready deadline or publication of proceedings, we reserve the right to remove it from the conference or proceedings.

 

I’m experiencing problems with the platform I’m using to release my dataset. What should I do?

We have worked with maintainers of the dataset hosting platforms to identify the appropriate contact information authors should use to contact for support in case of issues or help with workarounds for storage quotas, etc. Find this contact information in the section above “How to Publish on Preferred Hosting Platforms”.

 

I need to require credentialized (AKA gated) access to my dataset

This will be possible on the condition that a credentialization is necessary for the public good (e.g. because of ethically sensitive medical data), and that an established credentialization procedure is in place that is 1) open to a large section of the public; 2) provides rapid response and access to the data; and 3) is guaranteed to be maintained for many years. A good example here is PhysioNet Credentialing, where users must first understand how to handle data with human subjects, yet is open to anyone who has learned and agrees with the rules. 
This should be seen as an exceptional measure, and NOT as a way to limit access to data for other reasons (e.g., to shield data behind a Data Transfer Agreement). Misuse would be grounds for desk rejection. During submission, you can indicate that your dataset involves open credentialized access, in which case the necessity, openness, and efficiency of the credentialization process itself will also be checked.

 

I have an extremely large (> 1 TB) dataset. How do I allow reviewers to properly evaluate it?

Please make sure that the full dataset is available at submission time. You can *in addition* provide ways to help reviewers explore your dataset. This could be a notebook that downloads a portion of the data and helps you explore it, a smaller data sample (ideally hosted in the same way), or a bespoke solution appropriate for your dataset. If you make a sample, also motivate how you created that sample.

 

Our submission involves using existing public datasets. Do we need to host these according to the data hosting guidelines?

No, but you should make any code used to modify or otherwise use the public datasets, e.g., for a new benchmark that you’re submitting, accessible and executable (meaning you will need to provide publicly accessible links to the data sources used). You also should not claim the existing public datasets as part of your submission.