Deep neural network model of sound localization replicates “what” and “where” representations in auditory cortex
Abstract
Unlike visual cortex, whether the auditory cortex has parallel pathways for sound identification (“what”) and localization (“where”) is debated. It also lacks a topographic map of auditory space, like the retinotopy in visual cortex. Here, we built a deep neural network to model auditory “what” and “where” representations. We trained our model for localization only, using two-channel audio waveforms from six sound types presented from 394 locations at three sound levels. Surprisingly, the model learned six well-separated clusters by sound type, but not by sound level, in the middle layer. In the model’s last layer, sounds were further organized by spectrogram similarity: harmonic types clustered together, single-band types formed a separate group, and broadband noise lay apart from the single-band group. Sound-location representations were random in the first layer but gradually organized into patches, and occasionally into a map, in the last layer. However, formation of a spatial map did not improve localization performance. Together, our model suggests that the auditory cortex does not need to dissociate “what” and “where” or create a space map.