Multi-scale Autoregressive Models are Laplacian, Discrete, and Latent Diffusion Models In Disguise
Abstract
We revisit Visual Autoregressive (VAR) models through the lens of iterative refine-ment. Instead of viewing VAR solely as next-scale autoregression, we formalisea deterministic forward process that builds a Laplacian-like latent pyramid and alearned backward process that predicts residual code maps in a small number ofcoarse-to-fine steps. This perspective connects VAR to denoising diffusion, clari-fies where supervision enters, and isolates three design choices that may explainefficiency and fidelity: operating in a compact latent space, casting prediction asdiscrete classification over code indices, and partitioning the task by spatial fre-quency. Using small, controlled MNIST surrogates with matched budgets, we testthese hypotheses and observe consistent trends favoring latent refinement, discretetargets, and two-stage coarse-to-fine specialisation. We also discuss how the sameiterative-refinement template extends to permutation-invariant graph generation andto probabilistic, ensemble-style medium-range weather forecasting. The frameworksuggests practical ways to transfer tools from diffusion to VAR while keeping thefew-step, scale-parallel generation that makes VAR appealing.