GenAIM: A Multimodal Artificial Intelligence Music Generation Web Tool Using Lyrics or Images
Abstract
As generative Artificial Intelligence (AI) rises, AI for music generation has emerged as an increasingly prominent area of research. However, many existing neural-based music generation models heavily depend on large datasets, raising concerns about potential copyright infringement due to data training and increased costs to improve performance. In contrast, we propose Generate AI Music (GenAIM), an innovative multimodal AI music generation web tool for lyric-to-song, text-to-music, image-to-music, and image-to-song generation powered by a novel pure algorithm-driven music core. This music core incorporates both lyrical and non-lyrical inputs, such as text and images, and its pure algorithms overall effectively mitigate the risk of copyright infringement. The novel, purely algorithmic music generation process generates a pleasant and flowing melody abiding music theory, lyrical, and rhythmic conventions. Users can generate music through the webpage nearly instantly. The webpage provides generated sheet music and the ability to play the music audio for listening. Overall, GenAIM can serve as a co-pilot tool to inspire current composers and lower the entry gap for musicians aspiring to transform thoughts to music by utilizing lyrics or visual imagery as a starting point. It also does not encroach upon human creativity, becoming a reliable music composition assistant and potential educational composition tutor. This tool is designed for individuals of all backgrounds, requiring no formal music training, and provides a range of benefits, including entertainment, relaxation, and support for mental well-being. It can also be integrated into large language models (LLMs) to enable new music and audio features, which can advance mutlimodal LLMs.