An interactive system for the extraction of meaningful visualizations from high-dimensional data
Madalina Fiterau · Artur Dubrawski · Donghan Wang

Tue Dec 8th 07:00 -- 11:55 PM @ 210D

We demonstrate our novel techniques for building ensembles of low-dimensional projections that facilitate data understanding and visualization by human users, given a learning task such as classification or regression. Our system trains user-friendly models, called Informative Projection Ensembles (IPEs). Such ensembles comprise of a set of compact submodels that ensure compliance with stringent user-specified requirement on model size and complexity, in order to allow visualization of the extracted patterns from data. IPEs handle data in a query-specific manner, each sample being assigned to a specialized Informative Projection, with data being automatically partitioned during learning. Through this setup, the models attain high performance while maintaining the transparency and simplicity of low-dimensional classifiers and regressors.

In this demo, we illustrate how Informative Projection Ensembles were of great use in practical applications. Moreover, we allow users the possibility to train their own models in real time, specifying such settings as the number of submodels, the dimensionality of the subspaces, costs associated with features as well as the type of base classifier or regressor to be used. Users are also able to see the decision-support system in action, performing classification, regression or clustering on batches of test data. The process of handling test data is also transparent, with the system highlighting the selected submodel, and how the queries are assigned labels/values by the submodel itself. Users can give feedback to the system in terms of the assigned outputs, and they will be able to perform pairwise comparisons of the trained models.

We encourage participants to bring their own data to analyze. Users have the possibility of saving the outcome of the analysis, for their own datasets or non-proprietary ones. The system supports the csv format for data and xml for the models.

Author Information

Madalina Fiterau (Stanford University)

Madalina Fiterau is an Assistant Professor at the College of College of Information and Computer Sciences at UMass Amherst, with a focus on AI/ML. Previously, she was a Postdoctoral Fellow in the Computer Science Department at Stanford University, working with Professors Chris Ré and Scott Delp in the Mobilize Center. Madalina has obtained a PhD in Machine Learning from Carnegie Mellon University in September 2015, advised by Professor Artur Dubrawski. The focus of her PhD thesis, entitled “Discovering Compact and Informative Structures through Data Partitioning”, is learning interpretable ensembles, with applicability ranging from image classification to a clinical alert prediction system. Madalina is currently expanding her research on interpretable models, in part by applying deep learning to obtain salient representations from biomedical “deep” data, including time series, text and images. Madalina is the recipient of the GE Foundation Scholar Leader Award for Central and Eastern Europe. She is the recipient of the Marr Prize for Best Paper at ICCV 2015 and of Star Research Award at the Annual Congress of the Society of Critical Care Medicine 2016. She has organized two editions of the Machine Learning for Clinical Data Analysis Workshop at NIPS, in 2013 and 2014.

Artur Dubrawski (Carnegie Mellon University)
Donghan Wang (CMU)

More from the Same Authors