Polyphonic piano transcription using deep neural networks
in
Workshop: Machine Learning for Audio Signal Processing (ML4Audio)
Abstract
I'll discuss the problem of transcribing polyphonic piano music with an emphasis on generalizing to unseen instruments. We optimize for two objectives. We first predict pitch onset events and then conditionally predict pitch at the frame level. I'll discuss the model architecture, which combines CNNs and LSTMs. I'll also discuss challenges faced in robust piano transcription, such as obtaining enough data to train a good model I'll also provide some demos and links to working code. This collaboration was led by Curtis Hawthorne, Erich Elsen and Jialin Song (https://arxiv.org/abs/1710.11153).
Douglas Eck works at the Google Brain team on the Magenta project, an effort to generate music, video, images and text using machine intelligence. He also worked on music search and recommendation for Google Play Music. I was an Associate Professor in Computer Science at University of Montreal in the BRAMS research center. He also worked on music performance modeling.