NeurIPS Poster Optimal Sketching for Trace Estimation

Poster

Optimal Sketching for Trace Estimation

Shuli Jiang · Hai Pham · David Woodruff · Richard Zhang

[ Abstract ]

[ OpenReview]

Abstract: Matrix trace estimation is ubiquitous in machine learning applications and has traditionally relied on Hutchinson's method, which requires

O (log (1 / δ) / ϵ^{2})

$O(\log(1/\delta)/\epsilon^2)$ matrix-vector product queries to achieve a

$(1 \pm \epsilon)$ -multiplicative approximation to

$\text{trace}(A)$ with failure probability

$\delta$ on positive-semidefinite input matrices

$A$ . Recently, the Hutch++ algorithm was proposed, which reduces the number of matrix-vector queries from

$O(1/\epsilon^2)$ to the optimal

$O(1/\epsilon)$ , and the algorithm succeeds with constant probability. However, in the high probability setting, the non-adaptive Hutch++ algorithm suffers an extra

$O(\sqrt{\log(1/\delta)})$ multiplicative factor in its query complexity. Non-adaptive methods are important, as they correspond to sketching algorithms, which are mergeable, highly parallelizable, and provide low-memory streaming algorithms as well as low-communication distributed protocols. In this work, we close the gap between non-adaptive and adaptive algorithms, showing that even non-adaptive algorithms can achieve

$O(\sqrt{\log(1/\delta)}/\epsilon + \log(1/\delta))$ matrix-vector products. In addition, we prove matching lower bounds demonstrating that, up to a

$\log \log(1/\delta)$ factor, no further improvement in the dependence on

$\delta$ or

$\epsilon$ is possible by any non-adaptive algorithm. Finally, our experiments demonstrate the superior performance of our sketch over the adaptive Hutch++ algorithm, which is less parallelizable, as well as over the non-adaptive Hutchinson's method.

Chat is not available.