Optimization is at the heart of machine learning, and gradient computation is central to many optimization techniques. Stochastic optimization, in particular, has taken center stage as the principal method of fitting many models, from deep neural networks to variational Bayesian posterior approximations. Generally, one uses data subsampling to efficiently construct unbiased gradient estimators for stochastic optimization, but this is only one possibility. In this talk, I will discuss an alternative approach to constructing unbiased gradient estimates in machine learning problems. We will revisit the Jacobian accumulation problem at the heart of automatic differentiation, observing that it is possible to collapse the linearized computational graph of, e.g., deep neural networks, in a randomized way such that less memory is used but little performance is lost. This is joint work with students Alex Beatson, Deniz Oktay, Joshua Aduol, and Nick McGreivy.