Timezone: »

Adaptive Data Analysis
Vitaly Feldman · Aaditya Ramdas · Aaron Roth · Adam Smith

Thu Dec 08 11:00 PM -- 09:30 AM (PST) @ Room 122 + 123
Event URL: http://wadapt.org/ »

Adaptive data analysis is the increasingly common practice by which insights gathered from data are used to inform further analysis of the same data sets. This is common practice both in machine learning, and in scientific research, in which data-sets are shared and re-used across multiple studies. Unfortunately, most of the statistical inference theory used in empirical sciences to control false discovery rates, and in machine learning to avoid overfitting, assumes a fixed class of hypotheses to test, or family of functions to optimize over, selected independently of the data. If the set of analyses run is itself a function of the data, much of this theory becomes invalid, and indeed, has been blamed as one of the causes of the crisis of reproducibility in empirical science.

Recently, there have been several exciting proposals for how to avoid overfitting and guarantee statistical validity even in general adaptive data analysis settings. The problem is important, and ripe for further advances. The goal of this workshop is to bring together members of different communities (from machine learning, statistics, and theoretical computer science) interested in solving this problem, to share recent results, to discuss promising directions for future research, and to foster collaborations.

Thu 11:55 p.m. - 12:00 a.m.
Introductory remarks (Introduction)
Fri 12:00 a.m. - 12:35 a.m.

In many genomic applications, it is common to perform tests using aggregate-level statistics within naturally defined classes for powerful identification of signals. Following aggregate-level testing, it is naturally of interest to infer on the individual units that are within classes that contain signal. Failing to account for class selection will produce biased inference. We develop multiple testing procedures that allow rejection of individual level null hypotheses while controlling for conditional (familywise or false discovery) error rates. We use simulation studies to illustrate validity and power of the proposed procedures in comparison to several possible alternatives. We illustrate the usefulness of our procedures in a natural application involving whole-genome expression quantitative trait loci (eQTL) analysis across 17 tissue types using data from The Cancer Genome Atlas (TCGA) Project.

Joint work with Nilanjan Chatterjee, Abba Krieger, and Jianxin Shi.

Fri 12:35 a.m. - 1:10 a.m.

We provide the first differentially private algorithms for controlling the false discovery rate (FDR) in multiple hypothesis testing. Our general approach is to adapt a well-known variant of the Benjamini-Hochberg procedure (BHq), making each step differentially private. This destroys the classical proof of FDR control. To prove FDR control of our method, we develop a new proof of the original (non-private) BHq algorithm and its robust variants -- a proof requiring only the assumption that the true null test statistics are independent, allowing for arbitrary correlations between the true nulls and false nulls. This assumption is fairly weak compared to those previously shown in the vast literature on this topic, and explains in part the empirical robustness of BHq.

Fri 1:10 a.m. - 1:20 a.m.
Vitaly Feldman (Discussion)
Vitaly Feldman
Fri 1:20 a.m. - 1:50 a.m.
Coffee break (break)
Fri 1:50 a.m. - 3:00 a.m.

10:50-11:00. Ibrahim Alabdulmohsin. On the Interplay between Information, Stability, and Generalization 11:00-11:10. Joshua Loftus. Significance testing after cross-validation 11:10-11:20. Yu-Xiang Wang, Jing Lei and Stephen E. Fienberg. A Minimax Theory for Adaptive Data Analysis 11:20-11:30. Sam Elder. Bayesian Adaptive Data Analysis: Challenges and Guarantees 11:30-11:40. Rina Foygel Barber and Aaditya Ramdas. p-filter: An internally consistent framework for FDR. 11:40-11:50. Ryan Rogers*, Aaron Roth, Adam Smith and Om Thakkar. Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing

Fri 3:00 a.m. - 5:30 a.m.
Lunch break (break)
Fri 5:30 a.m. - 6:05 a.m.
Aaron Roth. Adaptive Data Analysis via Differential Privacy (Talk)
Fri 6:05 a.m. - 6:40 a.m.

The traditional notion of generalization --- i.e., learning a hypothesis whose empirical error is close to its true error --- is surprisingly brittle. As has recently been noted, even if several algorithms have this guarantee in isolation, the guarantee need not hold if the algorithms are composed adaptively. In this paper, we study three notions of generalization ---increasing in strength--- that are robust to post-processing and amenable to adaptive composition, and examine the relationships between them.

Fri 6:50 a.m. - 7:35 a.m.
Posters (Poster Session)
Fri 7:35 a.m. - 7:55 a.m.

A common problem in modern statistical applications is to select, from a large set of candidates, a subset of variables which are important for determining an outcome of interest. For instance, the outcome may be disease status and the variables may be hundreds of thousands of single nucleotide polymorphisms on the genome. This talk introduces model-free knockoffs, a framework for finding dependent variables while provably controlling the false discovery rate (FDR) in finite samples. FDR control holds no matter the form of the dependence between the response and the covariates, which does not need to be specified in any way. What is required is that we observe i.i.d. samples (X,Y) and know something about the distribution of the covariates although we have shown that the method is robust to unknown/estimated covariate distributions. This framework builds on the knockoff filter of Foygel Barber and Candès introduced a couple of years ago, which was limited to linear models with fewer variables than observations (n ‹ p). In contrast, model-free knockoffs deal with a range of problems far beyond the scope of the original knockoff paper—e.g. it provides valid selections in any generalized linear model including logistic regression---while being more powerful than the original procedure when it applies. Finally, we apply our procedure to data from a case-control study of Crohn’s disease in the United Kingdom, making twice as many discoveries as the original analysis of the same data.

Fri 7:55 a.m. - 8:15 a.m.

Recent development in selective inference has provided a framework of valid inference after some information of the data has been used for model selection. However, most literature concerning selective inference require the practitioners to commit to a pre-specified procedure for model selection. This is rather stringent for applications. In many cases, multiple exploratory data analyses will be performed and the outcome of each will be input to the final model selected by the practitioners. Therefore, we want to develop a framework that allows multiple queries to the data. In a framework similar to that in differential privacy, we allow valid inference after multiple queries to the database. We seek to address this problem from the perspective of “multiple views of the data” and two concrete examples are considered below.

Joint work with Jonathan Taylor.

Fri 8:15 a.m. - 8:50 a.m.

Standard p-value based hypothesis testing is not at all adaptive: if our test result is promising but not conclusive (say, p = 0.07) we cannot simply decide to gather a few more data points. While the latter practice is ubiquitous in science, it invalidates p-values and error guarantees.

Here we propose an alternative test based on supermartingales - it has both a gambling and a data compression interpretation. This method allows us to freely combine results from different tests by multiplication (which would be a mortal sin for p-values!), and avoids many other pitfalls of traditional testing as well. If the null hypothesis is simple (a singleton), it also has a Bayesian interpretation, and essentially coincides with a proposal by Vovk (1993) and Berger et al. (1994). Here we work out, for the first time, the case of composite null hypotheses, which allows us to formulate safe, nonasymptotic versions of the most popular tests such as the t-test and the chi square tests. Safe tests for composite H0 are not Bayesian, and initial experiments suggests that they can substantially outperform Bayesian tests (which for composite nulls are not adaptive in general).

Fri 8:50 a.m. - 9:00 a.m.
Aaron Roth (Discussion)

Author Information

Vitaly Feldman (Google Brain)
Aaditya Ramdas (UC Berkeley)
Aaron Roth (University of Pennsylvania)
Adam Smith (Pennsylvania State University)

More from the Same Authors