Timezone: »

Deep Multimodal Multilinear Fusion with High-order Polynomial Pooling
Ming Hou · Jiajia Tang · Jianhai Zhang · Wanzeng Kong · Qibin Zhao

Wed Dec 11 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #61

Tensor-based multimodal fusion techniques have exhibited great predictive performance. However, one limitation is that existing approaches only consider bilinear or trilinear pooling, which fails to unleash the complete expressive power of multilinear fusion with restricted orders of interactions. More importantly, simply fusing features all at once ignores the complex local intercorrelations, leading to the deterioration of prediction. In this work, we first propose a polynomial tensor pooling (PTP) block for integrating multimodal features by considering high-order moments, followed by a tensorized fully connected layer. Treating PTP as a building block, we further establish a hierarchical polynomial fusion network (HPFN) to recursively transmit local correlations into global ones. By stacking multiple PTPs, the expressivity capacity of HPFN enjoys an exponential growth w.r.t. the number of layers, which is shown by the equivalence to a very deep convolutional arithmetic circuits. Various experiments demonstrate that it can achieve the state-of-the-art performance.

Author Information

Ming Hou (RIKEN AIP)
Jiajia Tang (Hangzhou Dianzi University / RIKEN AIP)
Jianhai Zhang (Hangzhou Dianzi University)
Wanzeng Kong (Hangzhou Dianzi University)
Qibin Zhao (RIKEN AIP)

More from the Same Authors