Skip to yearly menu bar Skip to main content

Workshop: Machine Learning for Audio

Audio Personalization through Human-in-the-loop Optimization

Rajalaxmi Rajagopalan · Yu-Lin Wei · Romit Roy Choudhury

[ ]
Sat 16 Dec 2:40 p.m. PST — 3 p.m. PST
presentation: Machine Learning for Audio
Sat 16 Dec 6:20 a.m. PST — 3:30 p.m. PST

Abstract: We consider the problem of personalizing audio to maximize user experience. Briefly, we aim to find a filter $h^*$, which applied to any music or speech, will maximize the user's satisfaction. This is a black-box optimization problem since the user's satisfaction function is unknown. The key idea is to play audio samples to the user, each shaped by a different filter $h_i$, and query the user for their satisfaction scores $f(h_i)$. A family of ``surrogate'' functions is then designed to fit these scores and the optimization method gradually refines these functions to arrive at the filter $\hat{h}^*$ that maximizes satisfaction. In this paper, we observe that a second type of querying is possible where users can tell us the individual elements $h^*[j]$ of the optimal filter $h^*$. Given a budget of $B$ queries, where a query can be of either type, our goal is to find the filter that will maximize this user's satisfaction. Our proposal builds on Sparse Gaussian Process Regression (GPR) and shows how a hybrid approach can outperform any one type of querying. Our results are validated through simulations and real-world experiments, where volunteers gave feedback on music/speech audio and were able to achieve high satisfaction levels. We believe this idea of hybrid querying opens new problems in black-box optimization, and solutions can benefit other applications beyond audio personalization.

Chat is not available.