Fri 7:15 a.m. - 7:30 a.m.
|
Welcome and Introduction
(
Livestream
)
SlidesLive Video » |
🔗 |
Fri 7:30 a.m. - 8:30 a.m.
|
Poster Session I
(
Discord
)
|
🔗 |
Fri 8:30 a.m. - 9:00 a.m.
|
When Technology Changes Art
(
Livestream
)
SlidesLive Video » Aaron Hertzmann Invited Talk |
Aaron Hertzmann 🔗 |
Fri 9:00 a.m. - 9:30 a.m.
|
Emotional Glossary of Creative AI
(
Livestream
)
SlidesLive Video » Alexa Steinbrück invited talk |
Alexa Steinbrück 🔗 |
Fri 9:30 a.m. - 10:00 a.m.
|
Imagenary Patterns with Diffusion Models
(
Livestream
)
SlidesLive Video » Invited talk by Mohammad Norouzi |
Mohammad Norouzi 🔗 |
Fri 10:00 a.m. - 10:30 a.m.
|
Stable Diffusion and Friends
(
Livestream
)
SlidesLive Video » Invited talk by Robin Rombach |
Robin Rombach 🔗 |
Fri 10:30 a.m. - 11:00 a.m.
|
Q&A Panel Discussion 1
(
Livestream + Discord
)
SlidesLive Video » Aaron Hertzmann, Alexa Steinbrück, Robin Rombach Post questions to #q-and-a on Discord Moderated by Bokar N'Diaye |
Bokar N'Diaye · Aaron Hertzmann · Alexa Steinbrück · Robin Rombach 🔗 |
Fri 11:00 a.m. - 11:30 a.m.
|
Art Show
(
Livestream
)
SlidesLive Video » |
🔗 |
Fri 11:30 a.m. - 12:00 p.m.
|
Social 1
(
Discord
)
Social break on Discord |
🔗 |
Fri 12:00 p.m. - 12:07 p.m.
|
Paper Spotlight: Personalizing Text-to-Image Generation via Aesthetic Gradients
(
Livestream
)
SlidesLive Video » This work proposes aesthetic gradients, a method to personalize a CLIP-conditioned diffusion model by guiding the generative process towards custom aesthetics defined by the user from a set of images. The approach is validated with qualitative and quantitative experiments, using the recent stable diffusion model and several aesthetically-filtered datasets. Code is released at https://github.com/vicgalle/stable-diffusion-aesthetic-gradients |
Victor Gallego 🔗 |
Fri 12:08 p.m. - 12:15 p.m.
|
Paper Spotlight: Surreal VR Pong: LLM approach to Game Design
(
Livestream
)
SlidesLive Video » The increase in complexity from 2D to 3D game design makes it fascinating to study from a computational creativity perspective. Generating images given text descriptions using models like DALL-E has recently increased in popularity. However, these models are limited to generating 2-dimensional outputs. While outputs of these models can be used to stylize 3d objects with variable textures, they cannot produce mesh-level interactions. We introduce Codex VR Pong as a demonstration of controlled non-deterministic game mechanics leveraging generative models. We are proposing that prompt-based creation can become part of gameplay rather than just part of game development. |
Jasmine Roberts · Andrzej Banburski · Jaron Lanier 🔗 |
Fri 12:15 p.m. - 12:22 p.m.
|
Paper Spotlight: How do Musicians Experience Jamming with a Co-Creative “AI”?
(
Livestream
)
SlidesLive Video » This paper describes a study in which several musicians were invited to jam with an “AI agent”. Behind the scenes, the agent was actually a human keyboard player. In interviews about the experience, the musicians revealed that they had taken a different attitude to musical interaction than they normally do with human musicians. They had lower expectations to the musicality of the system, and therefore felt less constrained by musical rules. A perceived freedom from judgement allowed some of the musicians to feel less self-conscious about their own performance. |
Notto J. W. Thelle · Rebecca Fiebrink 🔗 |
Fri 12:22 p.m. - 12:30 p.m.
|
Paper Spotlight: Visualizing Semantic Walks
(
Livestream
)
SlidesLive Video » An embedding space trained from both a large language model and vision model contains semantic aspects of both and provides connections between words, images, concepts, and styles. This paper visualizes characteristics and relationships in this semantic space. We traverse multi-step paths in a derived semantic graph to reveal hidden connections created from the immense amount of data used to create these models. We specifically examine these relationships in the domain of painters, their styles, and their subjects. Additionally, we present a novel, non-linear sampling technique to create informative visualization of semantic graph transitions. |
Shumeet Baluja · David Marwood 🔗 |
Fri 12:30 p.m. - 12:38 p.m.
|
Artwork Spotlight: Flowers for Frankenstein’s Monster
(
Livestream
)
SlidesLive Video » |
Derrick Schultz 🔗 |
Fri 12:38 p.m. - 12:46 p.m.
|
Artwork Spotlight: Machine Reflections: A Self-Portrait Series
(
Livestream
)
SlidesLive Video » |
Orsolya Szantho 🔗 |
Fri 12:46 p.m. - 12:54 p.m.
|
Artwork Spotlight: Ducking Jorn
(
Livestream
)
SlidesLive Video » |
Filippo Fedeli 🔗 |
Fri 1:00 p.m. - 2:00 p.m.
|
PORTAGING (live AI Performance)
(
Discord
)
Portaging is a mode of transportation. It involves carrying a watercraft over land between lakes and rivers. Portaging requires humans working together with artificial tools they have created and natural elements of land, water, and wind. In PORTAGING we explore the use of artificial intelligence to move beyond the boundaries of a single human's creativity into a collective co-creativity, augmented and uplifted by machine learning. Join us on the discord #performance channel for an hour of live visual AI storytelling by humans, the story telling AI Dramatron, and several Image Generator AIs (StableDiffusion, Imagen/Parti, and more), and yourself through audience participation reactions on image alternatives which influence the direction of the story. |
Kory Mathewson · Piotr Mirowski · Hannah Johnston · Tom White · Jason Baldridge 🔗 |
Fri 2:00 p.m. - 3:00 p.m.
|
Poster Session 2
(
Discord
)
|
🔗 |
Fri 3:00 p.m. - 3:30 p.m.
|
Anastasiia Raina
(
Livestream
)
SlidesLive Video » Invited talk with Anastasiia Raina |
Anastasiia Raina 🔗 |
Fri 3:30 p.m. - 4:00 p.m.
|
Ambiguous and Alluring / Imagined by AI
(
Livestream
)
SlidesLive Video » Invited talk with Eunsu Kang |
Eunsu Kang 🔗 |
Fri 4:00 p.m. - 4:30 p.m.
|
AI for Anime in the Diffusion Era
(
Livestream
)
SlidesLive Video » Invited talk with Yanghua Jin |
Yanghua Jin 🔗 |
Fri 4:30 p.m. - 5:00 p.m.
|
On Production-Grade Generative Modelling for Singing Voice Synthesis
(
Livesteam
)
SlidesLive Video » invited talk by Kanru Hua |
Kanru Hua 🔗 |
Fri 5:00 p.m. - 5:30 p.m.
|
Q&A Panel Discussion 2
(
Livestream + Discord
)
SlidesLive Video » Anastasiia Raina, Eunsu Kang, Yanghua Jin, Emad Kanru Hua Post questions to #q-and-a on Discord Moderated by Yingtao Tian |
Yingtao Tian · Anastasiia Raina · Eunsu Kang · Yanghua Jin · Kanru Hua 🔗 |
Fri 5:30 p.m. - 6:00 p.m.
|
Art Show (rebroadcast)
(
Livestream
)
SlidesLive Video » |
🔗 |
Fri 6:00 p.m. - 6:15 p.m.
|
Closing Remarks
(
Livestream
)
SlidesLive Video » So luck suckers |
🔗 |
Fri 6:15 p.m. - 7:00 p.m.
|
Social 2
(
Discord
)
|
🔗 |
-
|
Instrument Separation of Symbolic Music by Explicitly Guided Diffusion Model
(
Poster
)
Similar to colorization in computer vision, instrument separation is to assign instrument labels (e.g. piano, guitar...) to notes from unlabeled mixtures which contain only performance information. To address the problem, we adopt diffusion models and explicitly guide them to preserve consistency between mixtures and music. The quantitative results show that our proposed model can generate high-fidelity samples for multitrack symbolic music with creativity. |
Sangjun Han · Hyeongrae Ihm · DaeHan Ahn · Woohyung Lim 🔗 |
-
|
Videogenic: Video Highlights via Photogenic Moments
(
Poster
)
This paper investigates the challenge of extracting highlight moments from videos. To perform this task, a system needs to understand what constitutes a highlight for arbitrary video domains while at the same time being able to scale across different domains. Our key insight is that photographs taken by photographers tend to capture the most remarkable or photogenic moments of an activity. Drawing on this insight, we present Videogenic, a system capable of creating domain-specific highlight videos for a wide range of domains. In a human evaluation study (N=50), we show that a high-quality photograph collection combined with CLIP-based retrieval (which uses a neural network with semantic knowledge of images) can serve as an excellent prior for finding video highlights. In a within-subjects expert study (N=12), we demonstrate the usefulness of Videogenic in helping video editors create highlight videos with lighter workload, shorter task completion time, and better usability. |
David Chuan-En Lin · Fabian Caba · Joon-Young Lee · Oliver Wang · Nikolas Martelaro 🔗 |
-
|
Co-writing screenplays and theatre scripts alongside language models using Dramatron
(
Poster
)
Language models are increasingly attracting interest from writers, but lack long- range semantic coherence, limiting their usefulness for longform creative writing. We address this limitation by applying language models hierarchically in a system we call Dramatron. By building structural context via prompt chaining, Dramatron can generate coherent scripts and screenplays complete with title, characters, story beats, location descriptions, and dialogue. We illustrate Dramatron’s usefulness as an interactive co-creative system with a user study of 15 theatre and film industry professionals. Participants co-wrote theatre scripts and screenplays with Dramatron and engaged in open-ended interviews. We report reflections both from our inter- viewees and from independent reviewers who watched productions of the works. Finally, we discuss the suitability of Dramatron for human-machine co-creativity, ethical considerations—including plagiarism and bias—and participatory models for the design and deployment of such tools. |
Piotr Mirowski · Kory Mathewson · Jaylen Pittman · Richard EVANS 🔗 |
-
|
High-Resolution Image Editing via Multi-Stage Blended Diffusion
(
Poster
)
Diffusion models have shown great results in image generation and in image editing. However, current approaches are limited to low resolutions due to the computational cost of training diffusion models for high-resolution generation. We propose an approach that uses a pre-trained low-resolution diffusion model to edit images in the megapixel range. We first use Blended Diffusion to edit the image at a low resolution, and then upscale it in multiple stages, using a super-resolution model and Blended Diffusion. Using our approach, we achieve higher visual fidelity than by only applying off the shelf super-resolution methods to the output of the diffusion model. We also obtain better global consistency than directly using the diffusion model at a higher resolution. |
Johannes Ackermann · Minjun Li 🔗 |
-
|
Language Does More Than Describe: On The Lack Of Figurative Speech in Text-To-Image Models
(
Poster
)
The impressive capacity shown by recent text-to-image diffusion models to generate high-quality pictures from textual input prompts has leveraged the debate about the very definition of art. Nonetheless, these models have been trained using text data collected from content-based labelling protocols that focus on describing the items and actions in an image, but neglect any subjective appraisal. Consequently, these automatic systems need rigorous descriptions of the elements and the pictorial style of the image to be generated, otherwise failing to deliver. As potential indicators of the actual artistic capabilities of current generative models we characterise the sentimentality, objectiveness and degree of abstraction of publicly available text data used to train current text-to-image diffusion models. Considering the sharp difference observed between their language style and that typically employed in artistic contexts, we suggest generative models should incorporate additional sources of subjective information in their training in order to overcome (or at least to alleviate) some of their current limitations, thus effectively unleashing a truly artistic and creative generation. |
Ricardo Kleinlein · Cristina Luna-Jiménez · Fernando Fernández-Martínez 🔗 |
-
|
Intentional Dance Choreography with Semi-Supervised Recurrent VAEs
(
Poster
)
We summarize the model and results of PirouNet, a semi-supervised recurrent variational autoencoder. Given a set of dance sequences of which 1% include qualitative choreographic annotations, PirouNet conditionally generates dance sequences in the style and intention of the choreographer. |
Mathilde Papillon · Mariel Pettee · Nina Miolane 🔗 |
-
|
Visualizing Semantic Walks
(
Poster
)
An embedding space trained from both a large language model and vision model contains semantic aspects of both and provides connections between words, images, concepts, and styles. This paper visualizes characteristics and relationships in this semantic space. We traverse multi-step paths in a derived semantic graph to reveal hidden connections created from the immense amount of data used to create these models. We specifically examine these relationships in the domain of painters, their styles, and their subjects. Additionally, we present a novel, non-linear sampling technique to create informative visualization of semantic graph transitions. |
Shumeet Baluja · David Marwood 🔗 |
-
|
Personalizing Text-to-Image Generation via Aesthetic Gradients
(
Poster
)
This work proposes aesthetic gradients, a method to personalize a CLIP-conditioned diffusion model by guiding the generative process towards custom aesthetics defined by the user from a set of images. The approach is validated with qualitative and quantitative experiments, using the recent stable diffusion model and several aesthetically-filtered datasets. Code is released at https://github.com/vicgalle/stable-diffusion-aesthetic-gradients |
Victor Gallego 🔗 |
-
|
3DGEN: A GAN-based approach for generating novel 3D models from image data
(
Poster
)
The recent advances in text and image synthesis show a great promise for the future of generative models in creative fields. However, a less explored area is the one of 3D model generation, with a lot of potential applications to game design, video production, and physical product design. In our paper, we present 3DGEN, a model that leverages the recent work on both Neural Radiance Fields for object reconstruction and GAN-based image generation. We show that the proposed architecture can generate plausible meshes for objects of the same category as the training images and compare the resulting meshes with the state-of-the-art baselines, leading to visible uplifts in generation quality. |
Antoine Schnepf · Ugo Tanielian · Flavian Vasile 🔗 |
-
|
CICADA: Interface for Concept Sketches Using CLIP
(
Poster
)
From Stable Diffusion to DALL·E2, state of the art models for high-resolution text-to-image generation seem to arrive nearly every week [1, 2], along with the promise to cause significant disruption in the creative industries. However, professional designers – from illustrators to architects to engineers – use low-fidelity representations like sketches to refine their understanding of the problem, rather than for developing completed solutions [3 , 4]. Conceptual stages of design have been operationalised as the co-evolution of problem and solution “spaces” [5]. We introduce the Collaborative, Interactive, Context-Aware Design Agent (CICADA) [6 ], which uses CLIP-guided [7] synthesis-by-optimisation to support conceptual designing. Building on previous approaches [8] we optimize a set of Bézier curves to match a given text prompt. In CICADA, users sketch collaboratively with the system in real-time. Users maintain editorial control, although additions to both the optimiser and interaction model enable designers and CICADA to influence one another by engaging with the sketch. CICADA provides an instrument to explore how text-to-image generative systems can assist designers, so we conducted a qualitative user study to explore its impact on designing. |
Tomas Lawton 🔗 |
-
|
VideoMap: Video Editing in Latent Space
(
Poster
)
Video has become a dominant form of media. However, video editing interfaces have remained largely unchanged over the past two decades. Such interfaces typically consist of a grid-like asset management panel and a linear editing timeline. When working with a large number of video clips, it can be difficult to sort through them all and identify patterns within (e.g. opportunities for smooth transitions and storytelling). In this work, we imagine a new paradigm for video editing by mapping videos into a 2D latent space and building a proof-of-concept interface. |
David Chuan-En Lin · Fabian Caba · Joon-Young Lee · Oliver Wang · Nikolas Martelaro 🔗 |
-
|
How do Musicians Experience Jamming with a Co-Creative “AI”?
(
Poster
)
This paper describes a study in which several musicians were invited to jam with an “AI agent”. Behind the scenes, the agent was actually a human keyboard player. In interviews about the experience, the musicians revealed that they had taken a different attitude to musical interaction than they normally do with human musicians. They had lower expectations to the musicality of the system, and therefore felt less constrained by musical rules. A perceived freedom from judgement allowed some of the musicians to feel less self-conscious about their own performance. |
Notto J. W. Thelle · Rebecca Fiebrink 🔗 |
-
|
Botto: A Decentralized Autonomous Artist
(
Poster
)
Botto is an experiment in creating a decentralized autonomous artist that generates art based on community feedback. Every week, Botto creates and sells artwork via a series of models and crowdsourced community evaluations of over 3,000people who also decide how to manage the artist and its sales. We present a formal description of how Botto works and its implications for creative machine learning. |
Mario Klingeman · Simon Hudson · Ziv Epstein 🔗 |
-
|
Surreal VR Pong: LLM approach to Game Design
(
Poster
)
The increase in complexity from 2D to 3D game design makes it fascinating to study from a computational creativity perspective. Generating images given text descriptions using models like DALL-E has recently increased in popularity. However, these models are limited to generating 2-dimensional outputs. While outputs of these models can be used to stylize 3d objects with variable textures, they cannot produce mesh-level interactions. We introduce Codex VR Pong as a demonstration of controlled non-deterministic game mechanics leveraging generative models. We are proposing that prompt-based creation can become part of gameplay rather than just part of game development. |
Jasmine Roberts · Andrzej Banburski · Jaron Lanier 🔗 |
-
|
Not All Artists Speak English: Generating images with DALL-E 2 from Portuguese
(
Poster
)
DALL-E 2 supports basic input in a number of languages, such as Spanish, German and Portuguese. This is an exciting development for non-English speaking artists and designers who could benefit from using the software. However, prompts given in other languages sometimes lack accuracy and differ in stylistic choices. In this work, we explore the differences in aesthetic quality and accuracy of results produced by DALL-E 2 from paired text inputs in English and Portuguese. |
Gretchen Eggers 🔗 |
-
|
Datasets That Are Not: Evolving Novelty Through Sparsity and Iterated Learning
(
Poster
)
Creative machines have long been a subject of interest for generative modeling research. One research goal of machine creativity is to create machine processes which are data adaptive to develop new creative directions, which may inspire users or be used to provide creative expansions of current ideas. Several works propose models which leverage data-driven deep learning approaches to generate "out-of-domain" or novel samples that deviate from the dataset on which these models are trained. In these existing works, generative model weights are only optimized on real datasets, rather than incorporating model generated outputs back into the training loop.In this work, we propose expanding the scope of a generative model by iteratively training on generated samples, in addition to the given training data. In this paper, we propose Datasets That Are Not, a procedure for accumulating generated samples and iteratively training a generative model on this expanding dataset. Specifically, we expand upon Digits that Are Not, a sparsity-based autoencoder for the inner generative model, due to the variety and novelty of outputs when trained on the standard MNIST dataset. Our results show that by learning on generated data, the model effectively reinforces its own hallucinations, directing generated outputs in new and unexpected directions \emph{away} from initial training data while retaining core semantics. |
Yusong Wu · Kyle Kastner · Tim Cooijmans · Cheng-Zhi Anna Huang · Aaron Courville 🔗 |
-
|
Towards Real-Time Text2Video via CLIP-Guided, Pixel-Level Optimization
(
Poster
)
We introduce an approach to generating videos based on a series of given language descriptions. Frames of the video are generated sequentially and optimized by guidance from the CLIP image-text encoder; iterating through language descriptions, weighting the current description higher than others. As opposed to optimizing through an image generator model itself, which tends to be computationally heavy, the proposed approach computes the CLIP loss directly at the pixel level, achieving general content at a speed suitable for near real-time systems. The approach can generate videos in up to 720p resolution, variable frame-rates, and arbitrary aspect ratios at a rate of 1-2 frames per second. Please visit our website to view videos and access our open-source code: https://pschaldenbrand.github.io/text2video/. |
Peter Schaldenbrand · Zhixuan Liu · Jean Oh 🔗 |
-
|
Sequence Modeling Motion-Captured Dance
(
Poster
)
By treating dance as a long sequence of tokenized human motion data, we build a system that can synthesize novel dance motions. We train a transformer architecture on motion-captured data represented as a sequence of characters. By prompting the model with different sequences or task tokens, we can generate motions conditioned on the movement of a single joint, or the motion of a specific dance move. |
Emily Napier · Gavia Gray · Sageev Oore 🔗 |
-
|
A programmable interface for creative exploration
(
Poster
)
Current advances in generative models, specially Large Language Models (LLM),has allowed the emergence of new AI-powered tools to support creativity. Thesemodels generate both text and image given a human prompt containing theirintention using natural language. However, to better respond these prompts, theyoften require expertise to fine-tune models’ responses to match user needs, whichcan be a barrier for non-expert users.In this paper, we present a spatial canvas with a set of AI-powered tools withpredefined prompts to support creative exploration for non-expert users. Ourresults show how non-expert users can explore a creative space by combining theirown input with the generated responses. |
Gerard Serra · Oriol Domingo · Pol Baladas 🔗 |