Spotlight
in
Workshop: AI for Science: The Reach and Limits of AI for Scientific Discovery

Every Answer Counts: Enhancing Scientific Discovery with Efficient Entity-Centric Question Answering from Long Contexts

Binyamin Perets · Zohar Shnaider · Shie Mannor · Dvir Aran

Project Page [ OpenReview]

Abstract

Entity-centric question answering (ECQA) is the problem of selecting which entities from a large, predefined set are most relevant to given observations. For example, given genes active in disease, scientists want to identify which biological processes are involved. This represents a fundamental challenge for LLM-based scientific discovery. While LLMs can process complex knowledge, obtaining reliable answers from long, heterogeneous inputs remains largely unattainable. Current approaches rely mostly on consensus aggregation or extensive validation, but these methods incur token costs that scale poorly with input complexity, leading to "token explosion."We introduce ARISE (Adaptive Residual Information Sampling Engine), a framework that reframes ECQA as a multi-armed bandit problem with side observations. Our key insight is that each query provides noisy side-observations about related entities, which can be recycled for statistically-grounded updates and a more efficient query policy. ARISE employs the DUETS Bandit, a novel online learning algorithm with dual expert advisors: a GraphExpert that leverages entity co-occurrence and a NoiseExpert that strategically selects queries to maximize expected observation quality. This process is supported by Confirmation Atoms, a set of commonly known validation processes designed for scientific knowledge validation, which assess outputs and update the system's internal beliefs.Together, these components enable statistically rigorous hypothesis testing with formal p-values while dramatically reducing query complexity. For validating ARISE, we use the hallmark challenge of pathway enrichment analysis using 180+ annotated gene expression datasets we collected from three common benchmarks.

Video

Chat is not available.