Timezone: »
We propose a high dimensional semiparametric scale-invariant principal component analysis, named Copula Component Analysis (COCA). The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA accordingly estimates the leading eigenvector of the correlation matrix of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, although the marginal distributions can be arbitrarily continuous, the COCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the simulated data are conducted under both ideal and noisy settings, which suggest that the COCA loses little even when the data are truely Gaussian. The COCA is also implemented on a large-scale genomic data to illustrate its empirical usefulness.
Author Information
Fang Han (Johns Hopkins University)
Han Liu (Princeton University)
More from the Same Authors
-
2013 Poster: Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model »
Fang Han · Han Liu -
2013 Spotlight: Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model »
Fang Han · Han Liu -
2012 Poster: TCA: High Dimensional Principal Component Analysis for non-Gaussian Data »
Fang Han · Han Liu -
2012 Poster: High Dimensional Transelliptical Graphical Models »
Han Liu · Fang Han -
2012 Oral: TCA: High Dimensional Principal Component Analysis for non-Gaussian Data »
Fang Han · Han Liu