NeurIPS Risk Assessment and Statistical Significance in the Age of Foundation Models

Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)

Risk Assessment and Statistical Significance in the Age of Foundation Models

Apoorva Nitsure · Youssef Mroueh · Mattia Rigotti · Kristjan Greenewald · Brian Belgodere · Mikhail Yurochkin · Jiri Navratil · Igor Melnyk · Jarret Ross

[ Abstract ] [ Project Page ]

[ Poster] [ OpenReview]

Abstract:

We propose a distributional framework for assessing socio-technical risks of foundation models with quantified statistical significance. Our approach hinges on a new statistical relative testing based on first and second order stochastic dominance of real random variables. We show that the second order statistics in this test are linked to mean-risk models commonly used in econometrics and mathematical finance to balance risk and utility when choosing between alternatives. Using this framework, we formally develop a risk-aware approach for foundation model selection given guardrails quantified by specified metrics. Inspired by portfolio optimization and selection theory in mathematical finance, we define a \emph{metrics portfolio} for each model as a means to aggregate a collection of metrics, and perform model selection based on the stochastic dominance of these portfolios. We use our framework to compare various large language models regarding risks related to drifting from instructions and outputting toxic content.

Chat is not available.

Poster in Workshop: Socially Responsible Language Modelling Research (SoLaR)

Risk Assessment and Statistical Significance in the Age of Foundation Models

Apoorva Nitsure · Youssef Mroueh · Mattia Rigotti · Kristjan Greenewald · Brian Belgodere · Mikhail Yurochkin · Jiri Navratil · Igor Melnyk · Jarret Ross

Poster
in
Workshop: Socially Responsible Language Modelling Research (SoLaR)