Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing
Abstract
Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose \textbf{Robot Synesthesia}, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance.