Skip to yearly menu bar Skip to main content


Efficient Generative Multimodal Integration (EGMI): Enabling Audio Generation from Text-Image Pairs through Alignment with Large Language Models

Taemin Kim · Wooyeol Baek · Heeseok Oh

Abstract

Video

Chat is not available.