Skip to yearly menu bar Skip to main content

Workshop: Computational Sustainability: Promises and Pitfalls from Theory to Deployment

A table is worth a thousand pictures: Multi-modal contrastive learning in house burning classification in wildfire events

Iv├ín Higuera-Mendieta · Jeff Wen · Marshall Burke


Wildfires have increased in frequency and duration over the last decade in the Western United States. This not only poses a risk to human life, but also results in billions of dollars in private and public infrastructure damages. As climate change potentially worsens the frequency and severity of wildfires, understanding their risk is critical for human adaptation and optimal fire prevention techniques. However, current fire spread models are often dependent on idealized fire and soil parameters, hard to compute, and not predictive of property damage. In this paper, we use a Dual Encoder (DE), a model with image and text embeddings that allows both image and text representations in the same latent space, to predict which houses will burn down in the event of wildfires. Our results indicate that the DE model achieves better performance than the baselines for image-only and text-only models (i.e. ResNet50 and XGBoost), and text or vision only models. Moreover, following other models in the literature, it outperform these models also in low-data regimes.

Chat is not available.