Poster
in
Affinity Event: Women in Machine Learning

From Sensing to Reasoning: Multi-Modal Large Language Models Guiding Robotic Intelligence in Autonomous Labs

Akshita Ramya Kamsali · Kristen Hallas · Aldair Gongora

Project Page [ OpenReview]

Abstract

We evaluate multimodal large language models (LLMs) as protocol-aware “reasoning copilots” for self-driving laboratories (SDLs). Open-source families (e.g., Llama, Granite, Gemma, Hermes, LLaVA) and proprietary GPT models are benchmarked across image-based readiness checks, standard lab tasks, infeasible actions, and adversarial instructions. GPT models lead on perception—accurately detecting transparent vessels and counting objects—but no model exceeds 80% overall accuracy under protocol and safety constraints; in several real-world reasoning scenarios, compact open-source models (2–3B parameters) match or surpass GPT performance. These results reveal persistent gaps in fusing multimodal signals with SOP semantics and in reliable, real-time decision-making. We propose a practical path forward: protocol-aware prompting, rigorous safety stress-tests, action logging, and closed-loop evaluation, positioning LLMs as assistive automators with expert fallbacks—rather than autonomous controllers—to accelerate experimental science safely and effectively.

Chat is not available.