Symbolic Graphics Programming with Large Language Models
Abstract
Large language models (LLMs) have demonstrated exceptional capabilities in understanding and generating computer programs. Motivated by the effectiveness, we focus on a special class of programs, symbolic graphics programs (SGPs), which can be translated into unique graphical content (e.g., images or 3D objects). However, the ability of LLMs to generate SGPs remains underexplored and insufficiently evaluated. In this paper, we investigate the task of symbolic graphics programming, where the goal is to generate an SGP from a natural language description. This task also serves as a lens into how LLMs understand the visual world by prompting them to generate images rendered from SGPs. Among various SGPs, scalable vector graphics (SVGs) are one of the most representative and are widely available on the Internet, making them an ideal testbed for studying symbolic graphics programming with LLMs. We focus on two key research questions: (1) how well can LLMs draw using SGPs? and (2) how can we improve their ability to generate SGPs? To address the first question, we introduce SGP-GenBench, a comprehensive benchmark for evaluating LLMs' ability to generate SGPs from three perspectives: object, scene, and composition. We conduct extensive evaluations of both proprietary and open-source LLMs, revealing substantial limitations in their current symbolic graphics programming capabilities. To improve performance, we propose a reinforcement learning (RL) approach that uses similarity scores between visual encoder outputs and input text descriptions as reward signals. This enables LLMs to progressively enhance SVG generation quality and semantic alignment during training. Our experiments show that RL significantly boosts the symbolic graphics programming abilities of LLMs, ultimately achieving performance comparable to state-of-the-art closed-source models.