Skip to yearly menu bar Skip to main content


Spotlight Poster

MassSpecGym: A benchmark for the discovery and identification of molecules

Roman Bushuiev · Anton Bushuiev · Niek de Jonge · Adamo Young · Fleming Kretschmer · Raman Samusevich · Janne Heirman · Fei Wang · Luke Zhang · Kai Dührkop · Marcus Ludwig · Nils Haupt · Apurva Kalia · Corinna Brungs · Robin Schmid · Russell Greiner · Bo Wang · David Wishart · Liping Liu · Juho Rousu · Wout Bittremieux · Hannes Rost · Tytus Mak · Soha Hassoun · Florian Huber · Justin J.J. van der Hooft · Michael Stravs · Sebastian Böcker · Josef Sivic · Tomáš Pluskal

West Ballroom A-D #5110
[ ] [ Project Page ]
Fri 13 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract:

The discovery and identification of molecules in biological and environmental samples is crucial for advancing biomedical and chemical sciences. Tandem mass spectrometry (MS/MS) is the leading technique for high-throughput elucidation of molecular structures. However, decoding a molecular structure from its mass spectrum is exceptionally challenging, even when performed by human experts. As a result, the vast majority of acquired MS/MS spectra remain uninterpreted, thereby limiting our understanding of the underlying (bio)chemical processes. Despite decades of progress in machine learning applications for predicting molecular structures from MS/MS spectra, the development of new methods is severely hindered by the lack of standard datasets and evaluation protocols. To address this problem, we propose MassSpecGym -- the first comprehensive benchmark for the discovery and identification of molecules from MS/MS data. Our benchmark comprises the largest publicly available collection of high-quality MS/MS spectra and defines three MS/MS annotation challenges: \textit{de novo} molecular structure generation, molecule retrieval, and spectrum simulation. It includes new evaluation metrics and a generalization-demanding data split, therefore standardizing the MS/MS annotation tasks and rendering the problem accessible to the broad machine learning community. MassSpecGym is publicly available at \url{https://github.com/pluskal-lab/MassSpecGym}.

Live content is unavailable. Log in and register to view live content