Alvessa: An Agentic Evidence-Grounded Research Assistant for Genomics
Abstract
The rapid proliferation of genomic data, large-scale experiments, and biological foundation models presents major opportunities for biological discovery, but also creates significant integration challenges. Researchers often face a landscape of heterogeneous databases with inconsistent formats, a challenge compounded by the difficulty of integrating this static knowledge with dynamic predictions from foundation models. Furthermore, knowledge from publications quickly becomes outdated and disconnected from new evidence. To address these gaps, we present Alvessa, a research assistant that orchestrates modular agents to perform user-intent understanding, context-specific tool calling, reasoning, and evidence-backed summarization. Alvessa integrates a diverse array of genetic databases, specialized foundation models, and bioinformatics tools, and dynamically selects tools needed for a given query. Unlike conventional portals that return data without reasoning, or general-purpose language models whose conclusions may be outdated due to static training data, Alvessa actively retrieves relevant evidence for a given query, reasons over it to synthesize a conclusion, and presents both the answer and its supporting evidence in an interactive interface.For reproducible assessment, we introduce GenomeArena, a suite of curated benchmarks for evaluating the framework's core components. This modular approach allows for granular measurement of performance across key tasks, including the precision of entity extraction, the accuracy of tool selection, the coherence of reasoning, and the reliability of evidence verification. This provides a detailed view of the strengths and weaknesses of the system in the genomic landscape.Our results show high performance within the GenomeArena benchmarks, while also highlighting current limitations, particularly reduced performance when necessary databases are not yet integrated. These findings clarify both the power of agentic systems to deliver transparent, evidence-based genomic reasoning and the critical need for continued expansion of their toolsets and knowledge bases.