Diffusion distance is a spectral method for measuring distance among nodes on graph considering global data structure. In this work, we propose a spec-diff-net for computing diffusion distance on graph based on approximate spectral decomposition. The network is a differentiable deep architecture consisting of feature extraction and diffusion distance modules for computing diffusion distance on image by end-to-end training. We design low resolution kernel matching loss and high resolution segment matching loss to enforce the network's output to be consistent with human-labeled image segments. To compute high-resolution diffusion distance or segmentation mask, we design an up-sampling strategy by feature-attentional interpolation which can be learned when training spec-diff-net. With the learned diffusion distance, we propose a hierarchical image segmentation method outperforming previous segmentation methods. Moreover, a weakly supervised semantic segmentation network is designed using diffusion distance and achieved promising results on PASCAL VOC 2012 segmentation dataset.