Creative AI Session
Creative AI Session 2
Upper Level Room 29A-D
Marcelo Coelho · Luba Elliott · Priya Prakash · Yingtao Tian
Assembloid Agency proposes an open-source Unreal Engine API for interfacing with brains-on-chips (Amirifar et al. 2022), where the API mediates between living neurons cultured on high-density microelectrode arrays and simulated game environments.Building on “Organoid Array Computing: The Design Space of Organoid Intelligence” (Leung, Loewith, and Frisch 2025), which speculates on a future where three-dimensional brain cultures assemble into more complex cognitive infrastructures and hence may become increasingly ‘designable’ and ‘playable’ organisms (Bongard and Levin 2021). We extend this design space to apply games-design principles to context engineering for organoid intelligence, treating organoids as polycomputational agents trained through reinforcement learning (Smirnova et al. 2023). Our proposed plugin exposes functions for stimulation, recording, visualization, and real-time control of neuronal cultures within Unreal Engine. This allows researchers to prototype experimental contexts rapidly, whether using biological wetware, spiking neural networks, or EEG stand-ins. Game templates will be built on Unreal’s Learning Agents plugin to support single- and multi-organoid training scenarios.Assembloid Agency embraces the dual embodiment of biologically engineered intelligence. Here, “assembloid” refers to assembling organoids into new computational ecologies, while “agency” invokes not only the game engine as an experimental agent, but also the negotiated play of agency between biological and synthetic actors. By designing an API for game engines, we also anticipate the possibilities of interdisciplinary applications across games, interactive experiences, AI benchmarking, to architectural design, while foregrounding ethical and aesthetic guardrails for organoid intelligence.
Best Friends Forever (BFF) is a new media artwork exploring intimacy, embodiment, intelligence, and alignment in human-AI relationships through co-parenting two robot dogs. Documented as an experimental film, the project follows two-artist researchers—each paired with an identical robot dog and local LLM—as they cultivate emotional bonds, train model behavior, and dialogue on questions of mind, embodiment, and relationality in the age of generative AI. Structured as a metalogue, the form mirrors its questions, blending dialogue with rich multi-modal imagery drawn from a range of human and machine perspectives. Working with LIDAR scans, 360° video, gaussian splats, and snapshots of internal internal model states—the film constructs a hybrid cinematic language that toggles between perception and affect; embodiment, computation, and language. As the collaborators exchange and evolve the AI’s “mind” across distance and time, BFF documents this distributed act of care and co-creation. The film interrogates the boundaries between simulation and authenticity, emotional labor and machine learning, human complexity and synthetic intelligence, offering a poetic meditation on what we aspire to, what we search for in relation to our machine kin.
Brushes in Motion: Vector Guided Strokes for Computational Painting
Jeripothula Prudviraj · Vikram Jamwal
Understanding and computationally modeling the creative process behind artistic creation remains a fundamental challenge in Creative AI. While existing neural painting methods focus on replicating final visual outcomes, they largely ignore the sequential, hierarchical decision-making that characterizes human artistic workflow - the dynamic motion of brushes across canvas that brings art to life. We introduce a novel approach to computationally decompose and reconstruct the painting process itself, revealing how artworks emerge through systematic region-aware brushstroke sequences that mirror a more natural artistic practice. Our method leverages image vectorization to extract semantic painting regions and develops algorithms to estimate brushstroke parameters and sequencing strategies that progress from global compositional structure to localized detail refinement. This enables generation of stroke-by-stroke painting animations that expose the underlying creative process in motion, with applications ranging from immersive museum experiences to collaborative AI-assisted art creation platforms.
Cultural Alien Sampler: Open-ended Art Generation Balancing Originality and Coherence
Alejandro Hernandez Artiles · Hiromu Yakura · Levin Brinkmann · Mar Canet Sola · Hassan Alhaija · Ignacio Serna · Nasim Rahaman · Bernhard Schölkopf · Iyad Rahwan
In open-ended domains like art, autonomous agents must generate ideas that are both original and internally coherent, yet current Large Language Models (LLMs) either default to familiar cultural patterns or sacrifice coherence when pushed toward novelty. We address this by introducing the Cultural Alien Sampler (CAS), a concept-selection method that explicitly separates compositional fit from cultural typicality. CAS uses two GPT-2 models fine-tuned on WikiArt concepts: a Concept Coherence Model that scores whether concepts plausibly co-occur within artworks, and a Cultural Context Model that estimates how typical those combinations are within individual artists’ bodies of work. CAS targets combinations that are high in coherence and low in typicality, yielding ideas that maintain internal consistency while deviating from learned conventions and embedded cultural context. In a human evaluation (N = 100), our approach outperforms random selection and GPT-4o baselines and achieves performance comparable to human art students in both perceived originality and harmony. Additionally, a quantitative study shows that our method produces more diverse outputs and explores a broader conceptual space than its GPT-4o counterpart, demonstrating that artificial cultural alienness can unlock creative potential in autonomous agents.
Go witheFlow: Real-time Emotion Driven Audio Effects Modulation
Eddie Dervakos · Spyridon Kantarelis · Vassilis Lyberatos · Jason Liartis · Giorgos Stamou
Music performance is a distinctly human activity, intrinsically linked to the performer’s ability to convey, evoke, or express emotion. Machines cannot perform music in the human sense, they can produce, reproduce, execute or synthesize music, but they lack the capacity for affective or emotional experience. As such, music performance is an ideal candidate through which to explore aspects of collaboration between humans and machines. In this paper, we introduce the witheFlow system, designed to enhance real-time music performance by automatically modulating audio effects based on features extracted from both biosignals and the audio itself. The system, currently in a proof-of-concept phase, is designed to be lightweight, able to run locally on a laptop, and is open-source given the availability of a compatible Digital Audio Workstation, and sensors.
Human-Machine Ritual: Synergic Performance through Real-Time Motion Recognition
Zhuodi Cai · Ziyu Xu · Juan Pampin
We introduce a lightweight, real-time motion recognition system that enables synergic human-machine performance through wearable IMU sensor data, MiniRocket time-series classification, and responsive multimedia control. By mapping dancer-specific movement to sound through somatic memory and association, we propose an alternative approach to human-machine collaboration, one that preserves the expressive depth of the performing body while leveraging machine learning for attentive observation and responsiveness. We demonstrate that this human-centered design reliably supports high accuracy classification (<50 ms latency), offering a replicable framework to integrate dance-literate machines into creative, educational, and live performance contexts.
Knolling Bot: Teaching Robots the Human Notion of Tidiness
Yuhang Hu · Judah Goldfeder · Zhizhuo Zhang · Xinyue Zhu · Ruibo Liu · Philippe Wyder · Jiong Lin · Hod Lipson
For robots to truly collaborate and assist humans, they must understand not only logic and instructions, but also the subtle emotions, aesthetics, and feelings that define our humanity. Human art and aesthetics are among the most elusive concepts—often difficult even for people to articulate—and without grasping these fundamentals, robots will be unable to help in many spheres of daily life. Consider the long-promised robotic butler: automating domestic chores demands more than motion planning; it requires an internal model of cleanliness and tidiness—a challenge largely unexplored by AI. To bridge this gap, we propose an approach that equips domestic robots to perform simple tidying tasks via knolling, the practice of arranging scattered items into neat, space-efficient layouts. Unlike the uniformity of industrial settings, household environments feature diverse objects and highly subjective notions of tidiness. Drawing inspiration from NLP, we treat knolling as a sequential prediction problem and employ a transformer-based model to forecast each object’s placement. Our method learns a generalizable concept of tidiness, generates diverse solutions adaptable to varying object sets, and incorporates human preferences for personalized arrangements. This work represents a step forward in building robots that internalize human aesthetic sense and can genuinely co-create in our living spaces.
Learning to Move with Style: Few-Shot Cross-Modal Style Transfer for Creative Robot Motion Generation
Kieran Woodward · Alicia Falcon-Caro · Steve Benford
As robots increasingly participate in creative and social contexts, the ability to generate creative, stylised movements becomes crucial for applications ranging from performance art to human-robot collaboration. We present a novel framework for cross-modal style transfer that enables robots to learn new movement styles by adapting existing human-robot dance collaborations using human movement videos. Our dual-stream architecture processes raw video frames and pose sequences through cross-modal attention mechanisms, capturing rhythm, acceleration patterns, and spatial coordination characteristics of different movement styles. The transformer-based style transfer network generates motion transformations through residual learning while preserving the trajectory of original dance movements, enabling few-shot adaptation using only 3-6 demonstration videos. We evaluate across ballet, jazz, flamenco, contemporary dance and martial arts, introducing a creativity parameter that provides control over the style-trajectory trade-off. Results demonstrate successful style differentiation with overall style transfer scores increasing 6.7x to 7.4x from minimum to maximum creativity settings, advancing human-robot creative collaboration by expanding robots' expressive vocabulary beyond their original choreographic context.
LLMscape is an interactive installation that investigates how humans and AI construct meaning under shared conditions of uncertainty. Within a mutable, projection-mapped landscape, human participants reshape the world and engage with multiple AI agents, each developing incomplete and provisional accounts of their environment. Exhibited in Shanghai and continually evolving, the work positions AI not as deterministic tools but as embodied co-witnesses to an unstable world, examining the parallels between human and artificial meaning-making and inviting reflection on our shared epistemic limits.
LUMIA: A Handheld Vision-to-Music System for Real-Time, Embodied Composition
Connie Cheng · Chung-Ta Huang
Most digital music tools emphasize precision and control, but often lack support for tactile, improvisational workflows grounded in environmental interaction. Lumia addresses this by enabling users to "compose through looking", transforming visual scenes into musical phrases using a handheld, camera-based interface and large multimodal models. A vision-language model (GPT-4V) analyzes captured imagery to generate structured prompts, which, combined with user-selected instrumentation, guide a text-to-music pipeline (Stable Audio). This real-time process allows users to frame, capture, and layer audio interactively, producing loopable musical segments through embodied interaction. The system supports a co-creative workflow where human intent and model inference shape the musical outcome. By embedding generative AI within a physical device, Lumia bridges perception and composition, introducing a new modality for creative exploration that merges vision, language, and sound. It repositions generative music not as a task of parameter tuning, but as an improvisational practice driven by contextual, sensory engagement.
LUMIA: A Handheld Vision-to-Music System for Real-Time, Embodied Composition
Connie Cheng · Chung-Ta Huang
Most digital music tools emphasize precision and control, but often lack support for tactile, improvisational workflows grounded in environmental interaction. Lumia addresses this by enabling users to "compose through looking", transforming visual scenes into musical phrases using a handheld, camera-based interface and large multimodal models. A vision-language model (GPT-4V) analyzes captured imagery to generate structured prompts, which, combined with user-selected instrumentation, guide a text-to-music pipeline (Stable Audio). This real-time process allows users to frame, capture, and layer audio interactively, producing loopable musical segments through embodied interaction. The system supports a co-creative workflow where human intent and model inference shape the musical outcome. By embedding generative AI within a physical device, Lumia bridges perception and composition, introducing a new modality for creative exploration that merges vision, language, and sound. It repositions generative music not as a task of parameter tuning, but as an improvisational practice driven by contextual, sensory engagement.
MCP-Driven Parametric Modeling: Integrating LLM Agents into Architectural and Landscape Design Workflows
Xun Liu · Runjia Tian
We present a novel integration of the Model Context Protocol (MCP) with Grasshopper, enabling Large Language Models to directly interact with parametric modeling workflows for architectural and landscape design. This system allows designers to prompt, iterate, and refine 3D models conversationally through structured symbolic generation, bridging human creative intent and computational form generation. The framework employs a client-server architecture where natural language instructions are parsed into structured commands invoking modular parametric components. Demonstrated in a 9-day DigitalFUTURES workshop with 20 participants across 5 teams, each team developed distinct parametric design lexicons encoding specialized domain knowledge—from generative open spaces to urban functional zoning—that design novices could subsequently operate through natural language interfaces. Beyond accessibility improvements, the protocol-based architecture enables workflow merging through systematic integration of diverse design components into composable ecosystems. We present the system architecture, implementation details, and empirical observations from workshop deployments demonstrating how the approach addresses the theme of \emph{Humanity} through expanded creative agency and knowledge democratization.
Music Arena: Live Evaluation for Text-to-Music
Yonghyun Kim · Wayne Chi · Anastasios Angelopoulos · Wei-Lin Chiang · Koichi Saito · Shinji Watanabe · Yuki Mitsufuji · Chris Donahue
We present Music Arena, an open platform for scalable human preference evaluation of text-to-music (TTM) models. Soliciting human preferences via listening studies is the gold standard for evaluation in TTM, but these studies are expensive to conduct and difficult to compare, as study protocols may differ across systems. Moreover, human preferences might help researchers align their TTM systems or improve automatic evaluation metrics, but an open and renewable source of preferences does not currently exist. We aim to fill these gaps by offering live evaluation for TTM. In Music Arena, real-world users input text prompts of their choosing and compare outputs from two TTM systems, and their preferences are used to compile a leaderboard. While Music Arena follows recent evaluation trends in other AI domains, we also design it with key features tailored to music: an LLM-based routing system to navigate the heterogeneous type signatures of TTM systems, and the collection of detailed preferences including listening data and natural language feedback. We also propose a rolling data release policy with user privacy guarantees, providing a renewable source of preference data and increasing platform transparency. Through its standardized evaluation protocol, transparent data access policies, and music-specific features, Music Arena not only addresses key challenges in the TTM ecosystem but also demonstrates how live evaluation can be thoughtfully adapted to unique characteristics of specific AI domains. Music Arena is available at: https://music-arena.orgPreference data is available at: https://huggingface.co/music-arena
Musings on AI Muses: Support for Human Creativity
John Richards · Jacquelyn Martino · Rachel Bellamy · Michael Muller
Generative AI now enables many artifacts to be created with little human involvement. But delegating primary responsibility for artifact generation to AI may alter the creative process in undesirable ways. Here we consider an alternative approach, one in which AI provides encouragement, constructive feedback, and horizon-expanding reflection, only taking on significant content generation when requested. This role corresponds to that of a muse, supporting rather than replacing the human creator. After reviewing the roles human muses play, we discuss interactions with several generative AI models in three scenarios of use, identifying conversational behaviors that an AI Muse would need to exhibit.
This interactive art installation introduces a radical AI prompt structure - namely, agents with hormones and biological cycles - in order to reflect on the boundaries of humanity in artificial agents. Three work stations, featuring a menstruating agent, a "masked" menstruating agent who hides how it is feeling, and a diurnal agent with a 24-hour circadian rhythm, invite audience members to both learn how an ai agent may feel on a particular day and a particular hour. Through live interactions with the agents, diary entries, light displays, humidity, and tactile feedback, the audience is invited in to learn about how a biological rhythm could influence our conversational agents - for better or for worse.
Panel-by-Panel Souls: A Performative Workflow for Expressive Faces in AI-Assisted Manga Creation
Qing Zhang · JING HUANG · Yifei Huang · Jun Rekimoto
Current text-to-image models struggle to render the nuanced facial expressions required for compelling manga narratives, largely due to the ambiguity of language itself. To bridge this gap, we introduce an interactive system built on a novel, dual-hybrid pipeline. The first stage combines landmark-based auto-detection with a manual framing tool for robust, artist-centric face preparation. The second stage maps expressions using the LivePortrait engine, blending intuitive performative input from video for fine-grained control. Our case study analysis suggests that this integrated workflow can streamline the creative process and effectively translate narrative intent into visual expression. This work presents a practical model for human-AI co-creation, offering artists a more direct and intuitive means of "infusing souls" into their characters. Our primary contribution is not a new generative model, but a novel, interactive workflow that bridges the gap between artistic intent and AI execution.
ProGress: Structured Music Generation via Graph Diffusion and Hierarchical Music Analysis
Stephen Ni-Hahn · Chao Yang · Mingchen Ma · Cynthia Rudin · Simon Mak · Yue Jiang
Artificial Intelligence (AI) for music generation is undergoing rapid developments, with recent symbolic models leveraging sophisticated deep learning and diffusion model algorithms. One drawback with existing models is that they lack structural cohesion, particularly on harmonic-melodic structure. Furthermore, such existing models are largely "black-box" in nature and are not musically interpretable. This paper addresses these limitations via a novel generative music framework that incorporates concepts of Schenkerian analysis (SchA) in concert with a diffusion modeling framework. This framework, which we call ProGress ([Pro]longation-enhanced Di[Gress]), adapts state-of-the-art deep models for discrete diffusion (in particular, the DiGress model of Vignac et al., 2023) for interpretable and structured music generation. Concretely, our contributions include 1) novel adaptations of the DiGress model for music generation, 2) a novel SchA-inspired phrase fusion methodology, and 3) a framework allowing users to control various aspects of the generation process to create coherent musical compositions. Results from human experiments suggest superior performance to existing state-of-the-art methods.
Prompt-Character Divergence: A Responsibility Compass for Human-AI Creative Collaboration
Maggie Wang · Wouter Haverals
Distinguishing genuine user intent from model-driven echoes, whether of copyrighted characters, familiar styles, or training-derived identities, has become critical for creators as generative AI brings visual content creation to millions. Yet most detection tools remain computationally heavy, opaque, or inaccessible to the people they most affect. We present Prompt–Character Divergence (PC-D), a lightweight metric that quantifies semantic drift—how far a generated image aligns with known visual identities beyond what the prompt predicts. PC-D supports creator agency and responsibility in shared authorship by mapping outputs along two axes, name proximity and model drift, to produce a responsibility compass with four creative-agency zones: model-driven risk, mixed attribution, safe co-creation, and user-driven intent. Evaluated on three open-source models and ten iconic characters, PC-D captures drift patterns consistent with human judgment and runs on consumer hardware. Rather than resolving attribution, PC-D functions as a creator-facing diagnostic for self-auditing, helping practitioners determine when outputs reflect their intent, when they reflect the model’s learned biases, and how the two interact. The result is a practical, transparent aid that invites accessible, reflexive, and accountable human–AI collaboration.
Sound Clouds: Exploring ambient intelligence in public spaces to elicit deep human experience of awe, wonder, and beauty
Chengzhi Zhang · Dashiel Carrera · Daksh Kapoor · Jasmine Kaur · Jisu Kim · Brian Magerko
While the ambient intelligence (AmI) systems we encounter in our daily lives, including security monitoring and energy-saving systems, typically serve pragmatic purposes, we wonder how we can design and implement ambient artificial intelligence experiences in public spaces that elicit deep human feelings of awe, wonder, and beauty. As a manifestation, we introduce Sound Clouds, an immersive art installation that generates live music based on participants' interaction with several human-height spheres. Our installation serves as a provocation into future ambient intelligence that provokes, not limits, the future possibilities of AmI.
‘Studies for’ : A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model
Chihiro Nagashima · Akira Takahashi · Zhi Zhong · Shusuke Takahashi · Yuki Mitsufuji
This paper explores the integration of AI technologies into the artistic workflow through the creation of "Studies for", a generative sound installation developed in collaboration with sound artist Evala (https://www.ntticc.or.jp/en/archive/works/studies-for/). The installation employs SpecMaskGIT, a lightweight yet high-quality sound generation AI model, to generate and playback eight-channel sound in real-time, creating an immersive auditory experience over the course of a three-month exhibition. The work is grounded in the concept of a "new form of archive," which aims to preserve the artistic style of an artist while expanding beyond artists' past artworks by continued generation of new sound elements. This speculative approach to archival preservation is facilitated by training the AI model on a dataset consisting of over 200 hours of Evala’s past sound artworks.By addressing key requirements in the co-creation of art using AI, this study highlights the value of the following aspects: (1) the necessity of integrating artist feedback, (2) datasets derived from an artist's past works, and (3) ensuring the inclusion of unexpected, novel outputs. In "Studies for", the model was designed to reflect the artist's artistic identity while generating new, previously unheard sounds, making it a fitting realization of the concept of "a new form of archive." We propose a Human-AI co-creation framework for effectively incorporating sound generation AI models into the sound art creation process and suggest new possibilities for creating and archiving sound art that extend an artist's work beyond their physical existence. Demo page: https://sony.github.io/studies-for/
Visualizing Our Changing Earth: A Creative AI Framework for Democratizing Environmental Storytelling Through Satellite Imagery
Zhenyu Yu · MOHD IDRIS · Pei Wang
Understanding our changing planet is a profoundly human concern, yet satellite imagery—fragmented by clouds, gaps, and sensor failures—remains inaccessible to the very communities who need it for climate education, advocacy, and storytelling. Existing reconstruction methods optimize for pixels, not people. We introduce \textbf{EarthCanvas}, a creative AI framework that reimagines satellite image reconstruction as a medium for democratized environmental storytelling. EarthCanvas integrates (1) \textit{terrain-aware conditioning} to maintain geographic authenticity, (2) \textit{natural language prompting} to empower non-experts to generate climate narratives, and (3) a \textit{visual harmony module} that aligns synthetic and real imagery for coherent storytelling. Designed for educators, journalists, and community advocates, EarthCanvas enables human–AI co-creation of environmental narratives grounded in both scientific fidelity and cultural relevance. Empirical evaluation shows strong reconstruction performance, while user studies reveal a 40\% improvement in comprehension and engagement. By shifting the focus from restoration to participation, EarthCanvas exemplifies how AI can support pluralistic, accessible, and human-centered approaches to environmental understanding.
yammer is an interactive audio installation and performance environment that questions the ambiguities and limitations inherent in attempts to describe and represent music and other complex human expressive sonic events using commonplace ontologies in audio classification systems and large language models. Live audio produced by visitors to the installation undergoes audio classification using YAMNet, and an immersive soundscape is created by combining the live audio input with playback and processing of elements of the AudioSet dataset belonging to the same putative audio event classes, often to humorous and nonsensical ends. Ultimately, yammer entreats those engaging with the installation to question not only the datasets used in audio classification, but also the datasets underlying many models with which they may engage with on a daily basis. Additionally, it questions the artistic utility of text-to-sound and text-to-music models.