A Compression Algorithm for Distributed LMMs with Different Information Fusion Techniques
Abstract
A typical structure of a large multimodal model (LMM) is composed of multiple encoders, one for each modality, for contextual encoding, and a decoder to combine all the sources before the generative process. Transmission of the encoder outputs, potentially residing at different devices, can lead to significant and intolerable communication overhead for resource-constrained environments. Motivated by Wyner-Ziv coding -- which indicates considerable compression of multiple correlated sources -- we propose a novel compression algorithm and examine it in terms of semantic efficiency. The developed algorithm is applied to two architectures in terms of performance-complexity tradeoff, namely incorporation of sources (i) at the beginning (for best performance) and (ii) at the later layers (for fast inference) of a decoder. The results indicate that the compression for the fast inference has less impact for bad (noisy/low throughput) channels than the best performance case, and the semantic similarity can be moderately preserved under certain circumstances. Additionally, the performance drop is negligible after some compression ratios for both approaches.