Skip to yearly menu bar Skip to main content


CtrlGen: Controllable Generative Modeling in Language and Vision

Steven Y. Feng · Dor Arad Hudson · Tatsunori Hashimoto · DONGYEOP Kang · Varun Prashant Gangal · Anusha Balakrishnan · Joel Tetreault

Mon 13 Dec, 8 a.m. PST

Over the past few years, there has been an increased interest in the areas of language and image generation within the community. As generated texts by models like GPT-3 start to sound more fluid and natural, and generated images and videos by GAN models appear more realistic, researchers began focusing on qualitative properties of the generated content such as the ability to control its style and structure, or incorporate information from external sources into the output. Such aims are extremely important to make language and image generation useful for human-machine interaction and other real-world applications including machine co-creativity, entertainment, reducing biases or toxicity, and improving conversational agents and personal assistants.

Achieving these ambitious but important goals introduces challenges not only from NLP and Vision perspectives, but also ones that pertain to Machine Learning as a whole, which has witnessed a growing body of research in relevant domains such as interpretability, disentanglement, robustness, and representation learning. We believe that progress towards the realization of human-like language and image generation may benefit greatly from insights and progress in these and other ML areas.

In this workshop, we propose to bring together researchers from the NLP, Vision, and ML communities to discuss the current challenges and explore potential directions for controllable generation and improve its quality, correctness, and diversity. As excitement about language and image generation has significantly increased recently thanks to the advent and improvement of language models, Transformers, and GANs, we feel this is the opportune time to hold a new workshop about this subject. We hope CtrlGen will foster discussion and interaction across communities, and sprout fruitful cross-domain relations that open the door for enhanced controllability in language and image generation.

Chat is not available.
Timezone: America/Los_Angeles