Spotlight
Minimax Localization of Structural Information in Large Noisy Matrices
Mladen Kolar · Sivaraman Balakrishnan · Alessandro Rinaldo · Aarti Singh

Wed Dec 14th 10:40 -- 10:44 AM @ None

We consider the problem of identifying a sparse set of relevant columns and rows in a large data matrix with highly corrupted entries. This problem of identifying groups from a collection of bipartite variables such as proteins and drugs, biological species and gene sequences, malware and signatures, etc is commonly referred to as biclustering or co-clustering. Despite its great practical relevance, and although several ad-hoc methods are available for biclustering, theoretical analysis of the problem is largely non-existent. The problem we consider is also closely related to structured multiple hypothesis testing, an area of statistics that has recently witnessed a flurry of activity. We make the following contributions: i) We prove lower bounds on the minimum signal strength needed for successful recovery of a bicluster as a function of the noise variance, size of the matrix and bicluster of interest. ii) We show that a combinatorial procedure based on the scan statistic achieves this optimal limit. iii) We characterize the SNR required by several computationally tractable procedures for biclustering including element-wise thresholding, column/row average thresholding and a convex relaxation approach to sparse singular vector decomposition.

Author Information

Mladen Kolar (University of Chicago)
Sivaraman Balakrishnan (CMU)
Alessandro Rinaldo (CMU)
Aarti Singh (CMU)

More from the Same Authors