Return to Colloquia & Seminar listing
Sampling algorithms and coresets for Lp regression and applications
Applied MathSpeaker: | Michael Mahoney, Yahoo! Inc. |
Location: | 1147 MSB |
Start time: | Fri, Apr 6 2007, 12:10PM |
L2 regression, also known as the least squares approximation problem, and more generally Lp regression, are fundamental problems that have numerous applications in mathematics and statistical data analysis. Recall that the over-constrained Lp regression problem takes as input an n x d matrix A, where n > d, an n-vector b, and a number p, and it returns as output a d-vector x such that ||Ax-b||_p is minimized. Several randomized algorithms for this problem, each of which provides a relative-error approximation to the exact solution, are described. The first algorithm applies novel ``subspace sampling'' probabilities to randomly sample constraints and thus construct a coreset for the L2 regression problem. The second algorithm (of T. Sarlos) applies a random projection and uses a ``fast'' version of the Johnson-Lindenstrauss lemma to obtain an efficient approximation to the L2 regression problem. The third algorithm generalizes the concept of a ``well-conditioned'' basis and of ``subspace-preserving sampling'' to obtain efficiently a coreset for Lp regression, for all $p\in[1,\infty)$. Applications of these random projection and random sampling algorithms to internet data analysis will be briefly discussed.