Timezone: »
Voice profiling aims at inferring various human parameters from their speech, e.g. gender, age, etc. In this paper, we address the challenge posed by a subtask of voice profiling - reconstructing someone's face from their voice. The task is designed to answer the question: given an audio clip spoken by an unseen person, can we picture a face that has as many common elements, or associations as possible with the speaker, in terms of identity?
To address this problem, we propose a simple but effective computational framework based on generative adversarial networks (GANs). The network learns to generate faces from voices by matching the identities of generated faces to those of the speakers, on a training set. We evaluate the performance of the network by leveraging a closely related task - cross-modal matching. The results show that our model is able to generate faces that match several biometric characteristics of the speaker, and results in matching accuracies that are much better than chance. The code is publicly available in https://github.com/cmu-mlsp/reconstructingfacesfrom_voices
Author Information
Yandong Wen (Carnegie Mellon University)
Bhiksha Raj (Carnegie Mellon University)
Rita Singh (Carnegie Mellon University)
More from the Same Authors
-
2022 Poster: USB: A Unified Semi-supervised Learning Benchmark for Classification »
Yidong Wang · Hao Chen · Yue Fan · Wang SUN · Ran Tao · Wenxin Hou · Renjie Wang · Linyi Yang · Zhi Zhou · Lan-Zhe Guo · Heli Qi · Zhen Wu · Yu-Feng Li · Satoshi Nakamura · Wei Ye · Marios Savvides · Bhiksha Raj · Takahiro Shinozaki · Bernt Schiele · Jindong Wang · Xing Xie · Yue Zhang -
2021 : HEAR 2021: Holistic Evaluation of Audio Representations + Q&A »
Joseph Turian · Jordan Shier · Bhiksha Raj · Bjoern Schuller · Christian Steinmetz · George Tzanetakis · Gissel Velarde · Kirk McNally · Max Henry · Nicolas Pinto · Yonatan Bisk · George Tzanetakis · Camille Noufi · Dorien Herremans · Jesse Engel · Justin Salamon · Prany Manocha · Philippe Esling · Shinji Watanabe -
2020 Poster: Is normalization indispensable for training deep neural network? »
Jie Shao · Kai Hu · Changhu Wang · Xiangyang Xue · Bhiksha Raj -
2020 Oral: Is normalization indispensable for training deep neural network? »
Jie Shao · Kai Hu · Changhu Wang · Xiangyang Xue · Bhiksha Raj -
2017 : Poster Session Music and environmental sounds »
Oriol Nieto · Jordi Pons · Bhiksha Raj · Tycho Tax · Benjamin Elizalde · Juhan Nam · Anurag Kumar -
2012 Poster: Unsupervised Structure Discovery for Semantic Analysis of Audio »
Sourish Chaudhuri · Bhiksha Raj -
2010 Poster: Multiparty Differential Privacy via Aggregation of Locally Trained Classifiers »
Manas A Pathak · Shantanu Rane · Bhiksha Raj -
2009 Poster: A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds »
Paris Smaragdis · Madhusudana Shashanka · Bhiksha Raj