SigSpace: an LLM-based agent for drug response signature interpretation
Abstract
Agent systems powered by large language models (LLMs) are increasingly applied in computational biology to automate analysis, integrate data, and accelerate discovery. Here, we investigate the capacity of LLM-driven agents to interpret transcriptional response signatures of drug perturbations in cancer cell lines, a task central to understanding drug mechanisms of action (MoAs) and supporting cancer drug discovery. Leveraging the Tahoe-100M dataset of 100 million transcriptomic profiles across 1,100 small-molecule perturbations and 50 cancer cell lines, we developed an agent system (SigSpace) that processes differential gene expression signatures and generates concise, human-readable summaries of drug responses. We then tested whether blinded response signature summaries could be correctly matched to their corresponding drug identity or MoA. Our results show that LLM-generated summaries consistently outperform random baselines, while the choice of model and signature score significantly influences performance. These findings highlight the potential of LLMs to augment interpretation of complex transcriptional data and suggest future opportunities for improving summarization fidelity, benchmarking predictive performance across output formats, and extending to additional datasets and tasks in cancer drug discovery.