Demonstration

Reproducing Machine Learning Research on Binder

Jessica Forde · Tim Head · Chris Holdgraf · M Pacer · Félix-Antoine Fortin · Fernando Perez

2018 Demonstration

Abstract

Full author list is:

Jessica Zosa Forde Matthias Bussonnier Félix-Antoine Fortin Brian Granger Tim Head Chris Holdgraf Paul Ivanov Kyle Kelley Fernando Perez M Pacer Yuvi Panda Gladys Nalvarte Min Ragan-Kelley Zach Sailer Steven Silvester Erik Sundell Carol Willing

Researchers have encouraged the machine learning community to produce reproducible, complete pipelines for code. Binder is an open-source service that lets users share interactive, reproducible science. It uses standard configuration files in software engineering to create interactive versions of research that exist on sites like GitHub with minimal additional effort. By leveraging tools such as Kubernetes, it manages the technical complexity around creating containers to capture a repository and its dependencies, generating user sessions, and providing public URLs to share the built images with others. It combines two open-source projects within the Jupyter ecosystem: repo2docker and JupyterHub. repo2docker builds the Docker image of the git repository specified by the user, installs dependencies, and provides various front-ends to explore the image. JupyterHub then spawns and serves instances of these built images using Kubernetes to scale as needed. Our free public deployment, mybinder.org, features over 3,000 repos on topics such LIGO’s gravational waves, textbooks on Kalman Filters, and open-source libraries such as PyMC3. As of September 2018, it serves an average of 8,000 users per day and has served as many as 22,000 a given day. Our demonstration shares a Binder deployment that features machine learning research papers from GitHub.

Chat is not available.