Workshop
Critiquing and Correcting Trends in Machine Learning
Thomas Rainforth · Matt Kusner · Benjamin Bloem-Reddy · Brooks Paige · Rich Caruana · Yee Whye Teh

Fri Dec 7th 08:00 AM -- 06:30 PM @ Room 511 ABDE

Workshop Webpage: https://ml-critique-correct.github.io/

Recently there have been calls to make machine learning more reproducible, less hand-tailored, fair, and generally more thoughtful about how research is conducted and put into practice. These are hallmarks of a mature scientific field and will be crucial for machine learning to have the wide-ranging, positive impact it is expected to have. Without careful consideration, we as a field risk inflating expectations beyond what is possible. To address this, this workshop aims to better understand and to improve all stages of the research process in machine learning.

A number of recent papers have carefully considered trends in machine learning as well as the needs of the field when used in real-world scenarios [1-18]. Each of these works introspectively analyzes what we often take for granted as a field. Further, many propose solutions for moving forward. The goal of this workshop is to bring together researchers from all subfields of machine learning to highlight open problems and widespread dubious practices in the field, and crucially, to propose solutions. We hope to highlight issues and propose solutions in areas such as:
- Common practices [1, 8]
- Implicit technical and empirical assumptions that go unquestioned [2, 3, 5, 7, 11, 12, 13, 17, 18]
- Shortfalls in publication and reviewing setups [15, 16]
- Disconnects between research focus and application requirements [9, 10, 14]
- Surprising observations that make us rethink our research priorities [4, 6]

The workshop program is a collection of invited talks, alongside contributed posters and talks. For some of these talks, we plan a unique open format of 10 minutes of talk + 10 minutes of follow up discussion. Additionally, a separate panel discussion will collect researchers with a diverse set of viewpoints on the current challenges and potential solutions. During the panel, we will also open the conversation to the audience. The discussion will further be open to an online Q&A which will be solicited prior to the workshop.

A key expected outcome of the workshop is a collection of important open problems at all levels of machine learning research, along with a record of various bad practices that we should no longer consider to be acceptable. Further, we hope that the workshop will make inroads in how to address these problems, highlighting promising new frontiers for making machine learning practical, robust, reproducible, and fair when applied to real-world problems.

Call for Papers:

Deadline: October 30rd, 2018, 11:59 UTC

The one day NIPS 2018 Workshop: Critiquing and Correcting Trends in Machine Learning calls for papers that critically examine current common practices and/or trends in methodology, datasets, empirical standards, publication models, or any other aspect of machine learning research. Though we are happy to receive papers that bring attention to problems for which there is no clear immediate remedy, we particularly encourage papers which propose a solution or indicate a way forward. Papers should motivate their arguments by describing gaps in the field. Crucially, this is not a venue for settling scores or character attacks, but for moving machine learning forward as a scientific discipline.

To help guide submissions, we have split up the call for papers into the follows tracks. Please indicate the intended track when making your submission. Papers are welcome from all subfields of machine learning. If you have a paper which you feel falls within the remit of the workshop but does not clearly fit one of these tracks, please contact the organizers at: ml.critique.correct@gmail.com.

Bad Practices (1-4 pages)
Papers that highlight common bad practices or unjustified assumptions at any stage of the research process. These can either be technical shortfalls in a particular machine learning subfield, or more procedural bad practices of the ilk of those discussed in [17].

Flawed Intuitions or Unjustified Assumptions (3-4 pages)
Papers that call into question commonly held intuitions or provide clear evidence either for or against assumptions that are regularly taken for granted without proper justification. For example, we would like to see papers which provide empirical assessments to test out metrics, verify intuitions, or compare popular current approaches with historic baselines that may have unfairly fallen out of favour (see e.g. [2]). We would also like to see work which provides results which makes us rethink our intuitions or the assumptions we typically make.

Negative Results (3-4 pages)
Papers which show failure modes of existing algorithms or suggest new approaches which one might expect to perform well but which do not. The aim of the latter of these is to provide a venue for work which might otherwise go unpublished but which is still of interest to the community, for example by dissuading other researchers from similar ultimately unsuccessful approaches. Though it is inevitably preferable that papers are able to explain why the approach performs poorly, this is not essential if the paper is able to demonstrate why the negative result is of interest to the community in its own right.

Research Process (1-4 pages)
Papers which provide carefully thought through critiques, provide discussion on, or suggest new approaches to areas such as the conference model, the reviewing process, the role of industry in research, open sourcing of code and data, institutional biases and discrimination in the field, research ethics, reproducibility standards, and allocation of conference tickets.

Debates (1-2 pages)
Short proposition papers which discuss issues either affecting all of machine learning or significantly sized subfields (e.g. reinforcement learning, Bayesian methods, etc). Selected papers will be used as the basis for instigating online forum debates before the workshop, leading up to live discussions on the day itself.

Open Problems (1-4 papers/short talks)
Papers that describe either (a) unresolved questions in existing fields that need to be addressed, (b) desirable operating characteristics for ML in particular application areas that have yet to be achieved, or (c) new frontiers of machine learning research that require rethinking current practices (e.g., error diagnosis for when many ML components are interoperating within a system, automating dataset collection/creation).

Submission Instructions Papers should be submitted as pdfs using the NIPS LaTeX style file. Author names should be anonymized.

All accepted papers will be made available through the workshop website and presented as a poster. Selected papers will also be given contributed talks. We have a small number of complimentary workshop registrations to hand out to students. If you would like to apply for one of these, please email a one paragraph supporting statement. We also have a limited number of reserved tickets slots to assign to authors of accepted papers. If any authors are unable to attend the workshop due to ticketing, visa, or funding issues, they will be allowed to provide a video presentation for their work that will be made available through the workshop website in lieu of a poster presentation.

Please submit papers here: https://easychair.org/conferences/?conf=cract2018

Deadline: October 30rd, 2018, 11:59 UTC

References

[1] Mania, H., Guy, A., & Recht, B. (2018). Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055.
[2] Rainforth, T., Kosiorek, A. R., Le, T. A., Maddison, C. J., Igl, M., Wood, F., & Teh, Y. W. (2018). Tighter variational bounds are not necessarily better. ICML.
[3] Torralba, A., & Efros, A. A. (2011). Unbiased look at dataset bias. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp. 1521-1528). IEEE.
[4] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.
[5] Mescheder, L., Geiger, A., Nowozin S. (2018) Which Training Methods for GANs do actually Converge? ICML
[6] Daumé III, H. (2009). Frustratingly easy domain adaptation. arXiv preprint arXiv:0907.1815
[7] Urban, G., Geras, K. J., Kahou, S. E., Wang, O. A. S., Caruana, R., Mohamed, A., ... & Richardson, M. (2016). Do deep convolutional nets really need to be deep (or even convolutional)?.
[8] Henderson, P., Islam, R., Bachman, P., Pineau, J., Precup, D., & Meger, D. (2017). Deep reinforcement learning that matters. arXiv preprint arXiv:1709.06560.
[9] Narayanan, M., Chen, E., He, J., Kim, B., Gershman, S., & Doshi-Velez, F. (2018). How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation. arXiv preprint arXiv:1802.00682.
[10] Schulam, S., Saria S. (2017). Reliable Decision Support using Counterfactual Models. NIPS.
[11] Rahimi, A. (2017). Let's take machine learning from alchemy to electricity. Test-of-time award presentation, NIPS.
[12] Lucic, M., Kurach, K., Michalski, M., Gelly, S., Bousquet, O. (2018). Are GANs Created Equal? A Large-Scale Study. arXiv preprint arXiv:1711.10337.
[13] Le, T.A., Kosiorek, A.R., Siddharth, N., Teh, Y.W. and Wood, F., (2018). Revisiting Reweighted Wake-Sleep. arXiv preprint arXiv:1805.10469.
[14] Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J. and Mané, D., (2016). Concrete problems in AI safety. arXiv preprint arXiv:1606.06565.
[15] Sutton, C. (2018) Making unblinding manageable: Towards reconciling prepublication and double-blind review. http://www.theexclusive.org/2017/09/arxiv-double-blind.html
[16] Langford, J. (2018) ICML Board and Reviewer profiles. http://hunch.net/?p=8962378

08:30 AM Opening Remarks (Talk)
08:40 AM Zachary Lipton (Invited Talk) Zachary Lipton
09:05 AM Kim Hazelwood (Invited Talk) Kim Hazelwood
09:30 AM Expanding search in the space of empirical ML (Contributed Talk) Bronwyn Woods
09:40 AM Opportunities for machine learning research to support fairness in industry practice (Contributed Talk) Kenneth Holstein
09:50 AM Spotlights - Papers 2, 23, 24, 36, 40, 44 (Contributed Talk)
10:20 AM Poster Session 1 (note there are numerous missing names here, all papers appear in all poster sessions) (Poster Session)
Akhilesh Gotmare, Kenneth Holstein, Jan Brabec, Michal Uricar, Kaleigh Clary, Cynthia Rudin, Sam Witty, Andrew Ross, Shayne O'Brien, Babak Esmaeili, Jessica Forde, Massimo Caccia, Ali Emami, Scott Jordan, Bronwyn Woods, D. Sculley, Rebekah Overdorf, Nicolas Le Roux, Peter Henderson, Brandon Yang, Joyce Liu, David Jensen, Nic Dalmasso, Weitang Liu, Paul TRICHELAIR, Jun Lee, Akanksha Atrey, Matt Groh, Yotam Hechtlinger, Emma Tosch
11:10 AM Finale Doshi-Velez (Invited Talk) Finale Doshi-Velez
11:35 AM Suchi Saria (Invited Talk) Suchi Saria
12:00 PM Lunch (Break)
01:30 PM Sebastian Nowozin (Invited Talk) Sebastian Nowozin
01:55 PM Using Cumulative Distribution Based Performance Analysis to Benchmark Models (Contributed Talk) Scott Jordan
02:05 PM Charles Sutton (Invited Talk) Charles Sutton
02:30 PM On Avoiding Tragedy of the Commons in the Peer Review Process (Contributed Talk) D. Sculley
02:40 PM Spotlights - Papers 10, 20, 35, 42 (Contributed Talk)
03:00 PM Coffee Break and Posters (Break)
03:30 PM Panel on research process (Panel) Zachary Lipton, Charles Sutton, Finale Doshi-Velez, Hanna Wallach, Suchi Saria, Rich Caruana, Tom Rainforth
04:30 PM Poster Session 2 (Poster Session)

Author Information

Tom Rainforth (University of Oxford)
Matt Kusner (University of Oxford)
Ben Bloem-Reddy (University of Oxford)
Brooks Paige (Alan Turing Institute / University of Cambridge)
Rich Caruana (Microsoft)
Yee Whye Teh (University of Oxford, DeepMind)

I am a Professor of Statistical Machine Learning at the Department of Statistics, University of Oxford and a Research Scientist at DeepMind. I am also an Alan Turing Institute Fellow and a European Research Council Consolidator Fellow. I obtained my Ph.D. at the University of Toronto (working with Geoffrey Hinton), and did postdoctoral work at the University of California at Berkeley (with Michael Jordan) and National University of Singapore (as Lee Kuan Yew Postdoctoral Fellow). I was a Lecturer then a Reader at the Gatsby Computational Neuroscience Unit, UCL, and a tutorial fellow at University College Oxford, prior to my current appointment. I am interested in the statistical and computational foundations of intelligence, and works on scalable machine learning, probabilistic models, Bayesian nonparametrics and deep learning. I was programme co-chair of ICML 2017 and AISTATS 2010.

More from the Same Authors