Timezone: »
One of the primary purposes of video is to capture people and their unique activities. It is often the case that the experience of watching the video can be enhanced by adding a musical soundtrack that is in-sync with the rhythmic features of these activities. How would this soundtrack sound? Such a problem is challenging since little is known about capturing the rhythmic nature of free body movements. In this work, we explore this problem and propose a novel system, called `RhythmicNet', which takes as an input a video which includes human movements and generates a soundtrack for it. RhythmicNet works directly with human movements by extracting skeleton keypoints and implements a sequence of models which translate the keypoints to rhythmic sounds.RhythmicNet follows the natural process of music improvisation which includes the prescription of streams of the beat, the rhythm and the melody. In particular, RhythmicNet first infers the music beat and the style pattern from body keypoints per each frame to produce rhythm. Next, it implements a transformer-based model to generate the hits of drum instruments and implements a U-net based model to generate the velocity and the offsets of the instruments. Additional types of instruments are added to the soundtrack by further conditioning on the generated drum sounds. We evaluate RhythmicNet on large scale datasets of videos that include body movements with inherit sound association, such as dance, as well as 'in the wild' internet videos of various movements and actions. We show that the method can generate plausible music that aligns well with different types of human movements.
Author Information
Kun Su (University of Washington)
Xiulong Liu (University of Washington)
Eli Shlizerman (Departments of Applied Mathematics and Electrical & Computer Engineering, University of Washington Seattle)
More from the Same Authors
-
2022 Poster: STNDT: Modeling Neural Population Activity with Spatiotemporal Transformers »
Trung Le · Eli Shlizerman -
2022 Poster: INRAS: Implicit Neural Representation for Audio Scenes »
Kun Su · Mingfei Chen · Eli Shlizerman -
2020 Poster: Audeo: Audio Generation for a Silent Performance Video »
Kun Su · Xiulong Liu · Eli Shlizerman -
2019 : Opening Remarks »
Guillaume Lajoie · Jessica Thompson · Maximilian Puelma Touzel · Eli Shlizerman · Konrad Kording -
2019 Workshop: Real Neurons & Hidden Units: future directions at the intersection of neuroscience and AI »
Guillaume Lajoie · Eli Shlizerman · Maximilian Puelma Touzel · Jessica Thompson · Konrad Kording