Gaussian process (GP) is a non-parametric model capitalizing probabilistic inference and can be used for Bayesian optimization. It is a powerful function approximator. Compared to neural networks which is a ‘black box’ function approximator, GP can perform accurate probabilistic inference and quantify the uncertainty of prediction, so it is widely used in classification and regression tasks in the field of machine learning. However, GP usually has a large computational complexity, which limits its application to large-scale datasets. Besides, most of the current GP-based approaches only focus on single-output regression, i.e., the dependent variable is univariate, and do not easily extend to multi-output regression tasks.In order to tackle the above issues, we propose an expert-based approximation method to develop a learnable model that can be applied to large-scale datasets and perform multi-output regression tasks. We propose the multi-output mixture of Gaussian processes (MOMoGP), which employs a deeply structured mixture of single-output GPs encoded via a Probabilistic Circuit. This allows one to accurately capture correlations between multiple outputs without introducing a cubic cost in the number of output dimensions. By recursively partitioning the covariate space and the output space, posterior inference in our model reduces to inference on single-output GP experts, which only need to be conditioned on a small subset of the observations.