Humans manipulate objects using all of their senses, including sound and touch: audio can indicate whether or not the door has been unlocked or an egg has been properly cracked. Prior work has shown that humans can use auditory feedback alone to categorize types of events and infer continuous aspects of these events, such as the length of a wooden dowel being struck . However, microphones remain underexplored in robotics, especially their potential as tactile vibration sensors.In this work, we investigate contact audio as an alternative tactile modality for complex manipulation tasks that are challenging from vision alone. Contact microphones record vibrations of anything in direct contact at a high-frequency (1000 times higher frequency than the next common tactile sensor ). This makes them well-suited to use as tactile sensors when interacting with objects in manipulation. Furthermore, contact audio is immune to many aspects of environment variation that vision is plagued by, such as lighting and color variation, making it promising for transfer learning and multi-task settings that are common in robotics.