Sat 6:30 a.m. - 6:40 a.m.
|
Opening remarks
(
Opening remarks
)
>
SlidesLive Video
|
Brian Kulis
馃敆
|
Sat 6:40 a.m. - 7:00 a.m.
|
Computer Audition Disrupted 2.0: The Foundation Models Era
(
Invited talk
)
>
SlidesLive Video
|
Bjoern Schuller
馃敆
|
Sat 7:00 a.m. - 7:20 a.m.
|
Explainable AI for Audio via Virtual Inspection Layers
(
Oral
)
>
SlidesLive Video
|
Johanna Vielhaben 路 Sebastian Lapuschkin 路 Gr茅goire Montavon 路 Wojciech Samek
馃敆
|
Sat 7:20 a.m. - 7:40 a.m.
|
Self-Supervised Speech Enhancement using Multi-Modal Data
(
Oral
)
>
SlidesLive Video
|
Yu-Lin Wei 路 Rajalaxmi Rajagopalan 路 Bashima Islam 路 Romit Roy Choudhury
馃敆
|
Sat 7:40 a.m. - 8:10 a.m.
|
A multi-view approach for audio-based speech emotion recognition
(
Invited talk
)
>
SlidesLive Video
|
Dimitra Emmanouilidou
馃敆
|
Sat 8:10 a.m. - 8:50 a.m.
|
Coffee break
|
馃敆
|
Sat 8:50 a.m. - 9:10 a.m.
|
Audio Language Models
(
Invited talk
)
>
SlidesLive Video
|
Neil Zeghidour
馃敆
|
Sat 9:10 a.m. - 9:30 a.m.
|
Zero-shot audio captioning with audio-language model guidance and audio context keywords
(
Oral
)
>
SlidesLive Video
|
Leonard Salewski 路 Stefan Fauth 路 A. Sophia Koepke 路 Zeynep Akata
馃敆
|
Sat 9:30 a.m. - 10:00 a.m.
|
Lark: A Multimodal Foundation Model for Music
(
Invited talk
)
>
SlidesLive Video
|
Rachel Bittner
馃敆
|
Sat 10:00 a.m. - 11:30 a.m.
|
Lunch break
|
馃敆
|
Sat 11:30 a.m. - 1:00 p.m.
|
Poster & Demo Session
(
Poster Session
)
>
|
馃敆
|
Sat 1:00 p.m. - 1:30 p.m.
|
Coffee break
|
馃敆
|
Sat 1:30 p.m. - 2:00 p.m.
|
Uninformative Gradients: Optimisation pathologies in differentiable digital signal processing
(
Invited talk
)
>
SlidesLive Video
|
Ben Hayes
馃敆
|
Sat 2:00 p.m. - 2:20 p.m.
|
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
(
Oral
)
>
SlidesLive Video
|
Ge Zhu 路 Yutong Wen 路 Marc-Andr茅 Carbonneau 路 Zhiyao Duan
馃敆
|
Sat 2:20 p.m. - 2:40 p.m.
|
Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech
(
Oral
)
>
SlidesLive Video
|
Mohamed Osman 路 Tamer Nadeem 路 Ghada khoriba
馃敆
|
Sat 2:40 p.m. - 3:00 p.m.
|
Audio Personalization through Human-in-the-loop Optimization
(
Oral
)
>
SlidesLive Video
|
Rajalaxmi Rajagopalan 路 Yu-Lin Wei 路 Romit Roy Choudhury
馃敆
|
Sat 3:00 p.m. - 3:20 p.m.
|
Multi-channel speech enhancement for moving sources
(
Invited talk
)
>
SlidesLive Video
|
Shoko Araki
馃敆
|
-
|
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis
(
Poster
)
>
|
Ge Zhu 路 Yutong Wen 路 Marc-Andr茅 Carbonneau 路 Zhiyao Duan
馃敆
|
-
|
Explainable AI for Audio via Virtual Inspection Layers
(
Poster
)
>
|
Johanna Vielhaben 路 Sebastian Lapuschkin 路 Gr茅goire Montavon 路 Wojciech Samek
馃敆
|
-
|
Audio classification with Dilated Convolution with Learnable Spacings
(
Poster
)
>
link
|
Ismail Khalfaoui Hassani 路 Timoth茅e Masquelier 路 Thomas Pellegrini
馃敆
|
-
|
Creative Text-to-Audio Generation via Synthesizer Programming
(
Poster
)
>
|
Nikhil Singh 路 Manuel Cherep 路 Jessica Shand
馃敆
|
-
|
Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation
(
Poster
)
>
|
Ye Bai 路 Chenxing Li 路 Xiaorui Wang 路 Yuanyuan Zhao 路 Hao Li
馃敆
|
-
|
Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion
(
Poster
)
>
|
Xueyao Zhang 路 Yicheng Gu 路 Haopeng Chen 路 Zihao Fang 路 Lexiao Zou 路 Liumeng Xue 路 Zhizheng Wu
馃敆
|
-
|
Diffusion Models as Masked Audio-Video Learners
(
Poster
)
>
|
Elvis Nunez 路 Yanzi Jin 路 Mohammad Rastegari 路 Sachin Mehta 路 Maxwell Horton
馃敆
|
-
|
InstrumentGen: Generating Sample-Based Musical Instruments From Text
(
Poster
)
>
link
|
Shahan Nercessian 路 Johannes Imort
馃敆
|
-
|
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization
(
Poster
)
>
|
Edward Fish 路 Jon Weinbren 路 Andrew Gilbert
馃敆
|
-
|
Composing and Validating Large-Scale Datasets for Training Open Foundation Models for Audio
(
Poster
)
>
|
Marianna Nezhurina 路 Ke Chen 路 Yusong Wu 路 Tianyu Zhang 路 Haohe Liu 路 Yuchen Hui 路 Taylor Berg-Kirkpatrick 路 Shlomo Dubnov 路 Jenia Jitsev
馃敆
|
-
|
Unsupervised Musical Object Discovery from Audio
(
Poster
)
>
|
Joonsu Gha 路 Vincent Herrmann 路 Benjamin F. Grewe 路 J眉rgen Schmidhuber 路 Anand Gopalakrishnan
馃敆
|
-
|
Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data
(
Poster
)
>
link
|
Tashi Namgyal 路 Alexander Hepburn 路 Raul Santos-Rodriguez 路 Valero Laparra 路 Jes煤s Malo
馃敆
|
-
|
Self-Supervised Speech Enhancement using Multi-Modal Data
(
Poster
)
>
|
Yu-Lin Wei 路 Rajalaxmi Rajagopalan 路 Bashima Islam 路 Romit Roy Choudhury
馃敆
|
-
|
Improved sound quality human-inspired DNN-based audio applications
(
Poster
)
>
|
Chuan Wen 路 Sarah Verhulst
馃敆
|
-
|
Audio Personalization through Human-in-the-loop Optimization
(
Poster
)
>
|
Rajalaxmi Rajagopalan 路 Yu-Lin Wei 路 Romit Roy Choudhury
馃敆
|
-
|
Synthia's Melody: A Benchmark Framework for Unsupervised \\Domain Adaptation in Audio
(
Poster
)
>
|
Harry Coppock 路 Chia-Hsin Lin
馃敆
|
-
|
Zero-shot audio captioning with audio-language model guidance and audio context keywords
(
Poster
)
>
|
Leonard Salewski 路 Stefan Fauth 路 A. Sophia Koepke 路 Zeynep Akata
馃敆
|
-
|
AttentionStitch: How Attention Solves the Speech Editing Problem
(
Poster
)
>
|
Antonios Alexos 路 Pierre Baldi
馃敆
|
-
|
MusT3: Unified Multi-Task Model for Fine-Grained Music Understanding
(
Poster
)
>
|
Martin Kukla 路 Minz Won 路 Yun-Ning Hung 路 Duc Le
馃敆
|
-
|
Benchmarks and deep learning models for localizing rodent vocalizations in social interactions
(
Poster
)
>
|
Ralph Peterson 路 Aramis Tanelus 路 Aman Choudhri 路 Violet Ivan 路 Aaditya Prasad 路 David Schneider 路 Dan Sanes 路 Alex Williams
馃敆
|
-
|
Towards Generalizable SER: Soft Labeling and Data Augmentation for Modeling Temporal Emotion Shifts in Large-Scale Multilingual Speech
(
Poster
)
>
|
Mohamed Osman 路 Tamer Nadeem 路 Ghada khoriba
馃敆
|
-
|
The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
(
Poster
)
>
|
13 presenters
Ilaria Manco 路 Benno Weck 路 Seungheon Doh 路 Yixiao Zhang 路 Dmitry Bogdanov 路 Yusong Wu 路 Ke Chen 路 Philip Tovstogan 路 Emmanouil Benetos 路 Elio Quinton 路 George Fazekas 路 Juhan Nam 路 Minz Won
馃敆
|
-
|
ScripTONES: Sentiment-Conditioned Music Generation for Movie Scripts
(
Poster
)
>
|
Vishruth Veerendranath 路 Vibha Masti 路 Utkarsh Gupta 路 Hrishit Chaudhuri 路 Gowri Srinivasa
馃敆
|
-
|
Self-Supervised Music Source Separation Using Vector-Quantized Source Category Estimates
(
Poster
)
>
|
Stefan Lattner 路 Marco Pasini
馃敆
|
-
|
Deep Generative Models of Music Expectation
(
Poster
)
>
|
Ninon Liz茅 Masclef 路 Andy Keller
馃敆
|
-
|
mir_ref: A Representation Evaluation Framework for Music Information Retrieval Tasks
(
Poster
)
>
link
|
Christos Plachouras 路 Dmitry Bogdanov 路 Pablo Alonso-Jim茅nez
馃敆
|