Timezone: »

The Limits of Post-Selection Generalization
Jonathan Ullman · Adam Smith · Kobbi Nissim · Uri Stemmer · Thomas Steinke

Thu Dec 06 02:00 PM -- 04:00 PM (PST) @ Room 517 AB #146

While statistics and machine learning offers numerous methods for ensuring generalization, these methods often fail in the presence of post selection---the common practice in which the choice of analysis depends on previous interactions with the same dataset. A recent line of work has introduced powerful, general purpose algorithms that ensure a property called post hoc generalization (Cummings et al., COLT'16), which says that no person when given the output of the algorithm should be able to find any statistic for which the data differs significantly from the population it came from.

In this work we show several limitations on the power of algorithms satisfying post hoc generalization. First, we show a tight lower bound on the error of any algorithm that satisfies post hoc generalization and answers adaptively chosen statistical queries, showing a strong barrier to progress in post selection data analysis. Second, we show that post hoc generalization is not closed under composition, despite many examples of such algorithms exhibiting strong composition properties.

Author Information

Jonathan Ullman (Northeastern University)
Adam Smith (Boston University)
Kobbi Nissim (Georgetown University)
Uri Stemmer (Ben-Gurion University)
Thomas Steinke (IBM Research - Almaden)

Thomas Steinke is a Research Staff Member at the IBM Almaden Research Center. He has been at IBM since completing his PhD at Harvard University in 2016. His research focuses on data privacy (specifically, differential privacy) and its connections to machine learning (particularly, generalization when data is reused adaptively).

More from the Same Authors