Invited Talk (Breiman Lecture)
Veridical Data Science
Bin Yu
West Exhibition Hall C, B3
Data science is a field of evidence-seeking that combines data with domain information to generate new knowledge. It addresses key considerations in AI regarding when and where data-driven solutions are reliable and appropriate. Such considerations require involvement from humans who collectively understand the domain and tools used to collect, process, and model data. Throughout the data science life cycle, these humans make judgment calls to extract information from data. Veridical data science seeks to ensure that this information is reliable, reproducible, and clearly communicated so that empirical evidence may be evaluated in the context of human decisions. Three core principles: predictability, computability, and stability (PCS) provide the foundation for veridical data science. In this talk we will present a unified PCS framework for data analysis, consisting of both a workflow and documentation, illustrated through iterative random forests and case studies from genomics and precision medicine.