Examining Data Compartmentalization for AI Governance
Abstract
The fusing of a vast corpus of data into model parameters poses a challenge for AI governance, particularly with regards to concerns over the appropriate use of specific examples. We investigate how partitioning data into semantically meaningful groups may allow for training and serving models with finer-grained control over subsets of data. Data compartmentalization can help isolate data groupings with differing levels of risk, permitted usages and expiry dates, and may provide a path towards data attribution. We propose data compartmentalization as a unifying framework across a number of existing technical approaches, and present hypotheses and open questions around the suitability of these approaches for addressing policy concerns related to AI governance.