Timezone: »

Conformal Frequency Estimation with Sketched Data
Matteo Sesia · Stefano Favaro


A flexible conformal inference method is developed to construct confidence intervals for the frequencies of queried objects in very large data sets, based on a much smaller sketch of those data. The approach is data-adaptive and requires no knowledge of the data distribution or of the details of the sketching algorithm; instead, it constructs provably valid frequentist confidence intervals under the sole assumption of data exchangeability. Although our solution is broadly applicable, this paper focuses on applications involving the count-min sketch algorithm and a non-linear variation thereof. The performance is compared to that of frequentist and Bayesian alternatives through simulations and experiments with data sets of SARS-CoV-2 DNA sequences and classic English literature.

Author Information

Matteo Sesia (University of Southern California)

Matteo Sesia is an assistant professor in the Department of Data Sciences and Operations, at the University of Southern California, Marshall School of Business.

Stefano Favaro (University of Torino and Collegio Carlo Alberto)

More from the Same Authors