We formulate the problem of detecting collective anomalies in collider experiments as a Goodness of Fit test of a reference hypothesis (the Standard Model) to the observed data. Several well established Goodness of Fit methods are available for one-dimensional problems but their multivariate generalisation is still object of study. We exploit machine learning to build a set of multivariate tests, starting from the outcome of a machine learned binary classifier trained to distinguish the experimental data from the reference expectations, as prescribed in Ref.s [1-4]. We compare typical one-dimensional test statistics computed on the output of the classifier with less common test statistics built out of standard classification metrics. In the considered setup, the likelihood-ratio test shows a broader model-independent sensitivity to the landscape of the signal benchmarks analysed. A novel test we define, based on event counting with an optimised classifier threshold, is found to perform slightly better than the likelihood-ratio test for resonant signal, but is exposed to strong failures for non-resonant ones.