Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks
Abstract
Many of the recent foundation models have been successful because of having standardized evaluation benchmarks, which help in evaluating these models fairly and in a standardized manner. There are no evaluation benchmarks for Mars science applications, and hence, this obstructs the progress of building a foundation model for Mars science tasks. To address this gap, we introduce Mars-Bench, the first benchmark designed to systematically evaluate models across a broad range of Mars-related tasks using both orbital and surface imagery. Mars-Bench comprises 20 datasets spanning classification, segmentation, and object detection, provided in a standardized and ready-to-use format. We provide baseline evaluations using models pre-trained on natural images and Earth satellite data. Results from analyses suggest that Mars-specific foundation models may offer advantages over baselines, motivating further exploration of domain-adapted pre-training. Mars-Bench aims to establish a standardized foundation for developing and comparing machine learning models for Mars science.