We consider the problem of principal component analysis (PCA) in the presence of outliers. Given a matrix $A$ ($d \times n$) and parameters $k, m$, the goal is to remove a set of at most $m$ columns of $A$ (known as outliers), so as to minimize the rank-$k$ approximation error of the remaining matrix. While much of the work on this problem has focused on recovery of the rank-$k$ subspace under assumptions on the inliers and outliers, we focus on the approximation problem above. Our main result shows that sampling-based methods developed in the outlier-free case give non-trivial guarantees even in the presence of outliers. Using this insight, we develop a simple algorithm that has bi-criteria guarantees. Further, unlike similar formulations for clustering, we show that bi-criteria guarantees are unavoidable for the problem, under appropriate complexity assumptions.
from cs updates on arXiv.org https://ift.tt/2jnyNHT
//
0 comments:
Post a Comment