Enabling the Visualization of Distributional Shift using Shapley Values
Abstract
In streaming data, distributional shifts can appear both in the univariate dimensionsand in the joint distributions with the labels. However, in many real-time scenarios,labels are often either missing or delayed; Unsupervised drift detection methodsare desired in those applications.We design slidSHAPs, a novel representation method for unlabelled data streams.Commonly known in machine learning models, Shapley values offer a way toexploit correlation dependencies among random variables; We develop an unsuper-vised sliding Shapley value series for categorical time series representing the datastream in a newly defined latent space and track the feature correlation changes.Transforming the original time series to the slidSHAPs allows us to track howdistributional shifts affect the correlations among the input variables; the approachis independent of any kind of labeling. We show how abrupt distributional shiftsin the input variables are transformed into smoother changes in the slidSHAPs;Moreover, slidSHAP allows for intuitive visualization of the shifts when they arenot observable in the original data.