Timezone: »

High Dimensional Semiparametric Scale-invariant Principal Component Analysis
Fang Han · Han Liu

Tue Dec 04 07:00 PM -- 12:00 AM (PST) @ Harrah’s Special Events Center 2nd Floor

We propose a high dimensional semiparametric scale-invariant principal component analysis, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA accordingly estimates the leading eigenvector of the correlation matrix of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, although the marginal distributions can be arbitrarily continuous, the COCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the simulated data are conducted under both ideal and noisy settings, which suggest that the COCA loses little even when the data are truely Gaussian. The COCA is also implemented on a large-scale genomic data to illustrate its empirical usefulness.

Author Information

Fang Han (Johns Hopkins University)
Han Liu (Princeton University)

More from the Same Authors