Roman Vershynin | Commentary on papers


M. Rudelson, R. Vershynin, The least singular value of a random square matrix is O(n^{-1/2})

Here we settle the other direction of the conjectural bound on the least singular value of a random square matrix. Let A be a matrix whose entries are real i.i.d. centered random variables with unit variance and suitable moment assumptions. Then the smallest singular value of A is of order n^{-1/2} with high probability. We have proved with Mark Rudelson the lower estimate of this type; in this note we establish the matching upper estimate.


M. Rudelson, R. Vershynin, The smallest singular value of a random rectangular matrix

Here we further develop our inevrtibility method from our first paper on the topic. We consider a rectangular random matrix (N by n), and we show that with high probability its smallest singular value is at least of order N^{1/2} - (n-1)^{1/2}. Known results of random matrix theory show that in the limit, as the dimension n increases to infinity while the aspect ratio n/N converges to a constant, the smallest singular value converges almost surely to the quantity above. On the contrary, our estimate holds for all fixed dimensions (up to a constant factor), i.e. without taking the limit, and with very high probability. Estimates in fixed dimensions are crucial for various problems in geometric functional analysis, numerical analysis and other fields.

Our estimate above bridges the known results for tall matrices and square matrices. For square matrices, our estimate
is of order n^{-1/2}, which is our result from the first paper. For tall matrices, where n is any fixed proportion of N, our estimate is of order N^{1/2}, which yields the result of Litvak, Pajor, Rudelson and Tomczak, now with the optimal dependence on the proportion.

Our argument is a development of our method from the first paper for square matrices. However, dealing with rectangular matrices is in several ways considerably harder. Several new tools are developed in this paper, which are of independent interest.

One new key ingredient is a small ball probability bound for sums of independent random vectors in R^d. This is the probability that such a sum falls into a given small Euclidean ball in R^d. Our method is a development of the Littlewood-Offord theory, which connects the small ball probabilities to the additive structure of the sum. The less additive structure, the smaller is the small ball probability. We include a version of a new and simple argument sue to Sodin and Friedland, which allows to avoid any use of Halasz regularity lemma.

We use small the ball probability estimate to prove an optimal bound for the distance problem: how close is a random vector X to an independent random subspace H? We show that the distance is at least of order m^{1/2} with high probability, where m is the codimension of H. To prove this, we first show the intuitive (albeit nontrivial) fact that random subspaces have no additive structure. As said above, this fact makes our small ball probabiilty estimates powerful enough.

Finally, we use the distance estimate to bound below the smallest singular value of a random matrix A. Our random vector X is a column of A, and our random subspace H is the span of the other columns. The simple rank argument shows that the smallest singular valueis zero if and only if X lies in H for some column. A simple quantitative version of this argument is that a lower estimate on the smallest singular value yields a lower bound on the distance dist(X,H). We show how to reverse the rank argument, and thus derive bounds for the smallest singular value from the distance problem.


D. Needell, R. Vershynin, Signal recovery from incomplete and inaccurate measurements via Regularized Orthogonal Matching Pursuit

This is our second paper with my student Deanna Needell, where we continue to study alternatives to linear programming methods in Compressed Sensing. In our first paper,  we developed a greedy algorihtm called ROMP, which can perfectly recover any sparse signal from a small number of linear non-adaptive measurements. In this paper, we prove the stability of ROMP. It seems to be the first greedy algorithm in Compressed Sensing that is provably stable in the sense that the recovery error is proportional to the level of the noise. In particular, the recovery is exact in the absence of noise. Also, one can use ROMP to accurately recover arbitrary signals, not necessarily sparse. This kind of stability was known for the linear programming methods (a result of Candes, Romberg and Tao), but has been previously unknown for any greedy method (like OMP).



D. Needell, R. Vershynin, Uniform Uncertainty Principle and signal recovery via Regularized Orthogonal Matching Pursuit

This is the first paper we wrote with my student Deanna Needell. We settle there one problem in the area of Compressed Sensing: how to construct a greedy algorithm for sparse recovery with uniform guarantees.

Compressed Sensing is focused on the sparse recovery problem: how to reconstruct a vector X that has very few non-zero coordinates from few linear measurements of X. Such measurements can be described as a measurement vector AX, which is the product of some measurement matrix A by X. There are two major algorithmic approaches to the sparse recovery problem: methods based on Linear Programming (a.k.a. L1-minimization, Basis Pursuit) and greedy iterative methods (for example, Orthogonal Matching Pursuit, Tresholding Algorithms). Each of the two approaches has its own advantages.

L1 minimization method has strongest known guarantees. Once the measurement matrix A satisfies the abstract form of the Uncertainty Principle (in the form of the Restricted Isometry Condition), then a result of Candes and Tao states that L1-minimization yields exact recovery. Specifically, the unique vector with minimal L1-norm over all vectors with measurements AX is X. The L1-minimization can be described as a linear program, for which abundance of solvers is available.

An alternative approach is using greedy iterative algorithms. The measurement vector AX is a linear combination of very few columns of A (because X has few nonzero coefficients), and we need to know which columns. It would then be natural to select the columns that are most correlated with AX (i.e. whose inner products with AX is biggest in magnitude). However, the set selected this way will not in general recover the nonzeros of X exactly, even for very nice measurement matrices A. Instead, one could select only one column of A most correlated with X, subtract its contribution from AX (making the residual orthogonal to that column) and iterate. This simple algorithm is called the Orthogonal Matching Pursuit (OMP), see the paper by Tropp and Gilbert.

The advantage of OMP is its transparency, ease of implementation, and great speed (OMP is currently considerably faster than L1-mimimization). The disadvantage of OMP is its weaker guarantees. It is actually known that OMP does not have uniform guarantees (with high probability, correctly recovers all vectors) for natural random matrices. Furthermore, no analysis at all is known for some important classes of measurements, such as random Fourier measurements (where the measurements are random frequencies of the signal X).

Our paper essentially settles these concerns about OMP. We prove that a simple regularized version of OMP (ROMP) has essentially the same guarantees as L1-minimization. Once the measurement matrix A satisfies the Uncertainty Principle (a.k.a. Restricted Isometry Condition), then ROMP recovers every sparse vector X from its measurements AX exactly. ROMP is therefore the first greedy algorithm for sparse recovery with uniform guarantees.

ROMP is based on the following idea. To recover the signal X from its measurements Y = AX, we can use the "observation vector" U = A*Y as a good local approximation to X. Indeed, U encodes correlations of Y with the columns of A. By the Uniform Uncertainty Principle, every n columns form approximately an orthonormal system. Therefore, every n coordinates of U look like correlations of Y with the orthonormal basis and therefore are close to the coefficients of X.

This suggests to make use of the n biggest coordinates of the observation vector U, rather than one biggest coordinate as OMP does. ROMP thus selects n biggest coordinates, and performs regularization by further selecting only the coordinates with comparable sizes (to ensure that the information is spread evenly among them).


M. Rudelson, R. Vershynin, The Littlewood-Offord Problem and invertibility of random matrices

Here we prove two basic conjectures on the invertibility of random matrices. One of them has been a prediction of Von Neumann and his associates, who speculated that the condition number of a random n by n matrix with i.i.d. entries should be linear in n (with high probability). The condition number is the ratio of the largest to the smallest singlar values of the matrix, or equivalently the product of the norms of the matrix and its inverse.

The largest singular value of random matrices is well understood -- it is of order n^{1/2} with high probability, and it converges to a Tracy-Widom distribution as n increases. The hard part is the smallest singular value, conjectured to be of order n^{-1/2}. This was only known for Gaussian matrices (Smale's problem solved by Edelman in 1988). Rudelson and Tao and Vu proved weaker estimates in some more general cases. The full conjecture is proved in the present paper. Thus the norm of random matrix and its inverse are both of order n^{1/2} with high probability. This confirms Von Neumann's prediction in full generality.

The next question is: what is this high probability? For random Bernoulli matrices (whose entries are 1 and -1 with probability 1/2), Spielman and Teng conjectured at ICM 2002 the following asymptotics for the smallest singular value normalized (i.e. divided) by its mean n^{-1/2}. The probability that it is smaller than t should be linear in t for all t > c^n (where c<1 is some constant). This asymptotics must break down for smaller t because Bernoulli matrices are singular wiht nonzero probability. The general Spielman-Teng's conjecture is proved in this paper (for all subgaussian matrices).

One immediate consequence of the general Spielman-Teng's conjecture (now a theorem) for t=0 is that every subgaussian matrix is singular with probability exponentially small in n. For Bernoulli matrices, this was proved by Kahn, Komlos and Szemeredi in 1995. Even the fact that this singularity probability converges to zero as n increases is nontrivial; Komlos proved this in 1967. Erdos conjectured the singularity probability is (1/2 + o(1))^n. The best known bound (3/4 + o(1))^n is due to Tao and Vu.

Our random matrix results come as a consequence of our attempts to develop the Littlewood-Offord theory of anti-concentration inequalities.

The classical concentration inequalities of probability theory demonstrate that nice random variables, such as the sums of i.i.d.'s, are nicely concentrated about their means. Less studied but often needed is the reverse phenomenon -- that the concentration is not too tight. The Littlewood-Offord Problem asks, for i.i.d. random variables X_k and real numbers a_k, to put an upper bound on the probability P that the sum of a_k X_k lies near some number v.

Unlike the concentration inequalities, which are controlled only by the magnitude of the coefficients a_k (e.g. the norms of the coefficient vectors, like in Berstein's inequality), the anti-concentration inequalities are less stable; they are controlled by the arithmetic structure in the coefficients a_k. Tao and Vu put forth a program to show that if the concentration probability P is large then there is a strong additive structure in the coefficients a_1, ... , a_n: this sequence should be embedded in a short arithmetic progression (or a generalization thereof). This says that the only reason for a random sum to concentrate too much is due to (essential) cancellations, which can only be caused by a rich additive structure of the summands.

We give an optimal result for the Littlewood-Offord problem for arbitrary coefficients a_k of the same order of magnitude by showing that these coefficients essentially lie in an arithmetic progression of length 1/P.

This paper is written so that Von Neumann's prediction can be read separately from the rest. The Littlewood-Offord theory is not needed there; it becomes crucial in the proof of the stronger Spielman-Teng's conjecture.


M. Rudelson, R. Vershynin, Sampling from large matrices: an approach through geometric functional analysis

Here we study random submatrices of large matrices. The main problem is: how can one learn a matrix from a random sample of its entries?

We show how to approximately compute any matrix from its random submatrix of the smallest possible size O(r log r) with a small error in the spectral norm, where r is the numerical rank of the matrix. The numerical rank is the ratio of the Frobenius (a.k.a. Hilbert-Schmidt) norm squared to the operator (a.k.a. spectral) norm squared. The numerical rank is thus always bounded by the usual rank, and unlike the usual rank it is stable under perturbations. The stability is crucial for numerical applications, where the matrix is blurred by noise (e.g. roundoff errors), so the matrix is of full rank but can be of low numerical rank.

Our approximation result gives an asymptotically optimal guarantee for an algorithm to compute low-rank approximations of A.

We also give asymptotically optimal estimates on the spectral norm and the cut-norm of random submatrices of A. The result for the cut-norm yields a slight improvement on the best known sample complexity for an approximation algorithm for MAX-2CSP problems.

We use methods of Probability in Banach spaces, in particular the law of large numbers for operator-valued random variables.

This project started in the summer of 2002 and has undergone several major transformations since then.


Yu. Liubarskii, R. Vershynin, Uncertainty principles and vector quantization

Here we touch a computational aspect of the basic problem in asymptotic convex geometry: what matrices realize Euclidean projections of the cube? A sufficient condition for such matrices turns out to be some abstract form of the Uncertainty Principle (UP). This Uncertainty Principle was recently set forth by Candes and Tao in the context of the fast developing applied area of Compressed Sensing. For orthogonal matrices, UP says that all submatrices have norm slightly smaller than 1. For matrices that realize the (discrete) Fourier transform, this leads to variants of the classical Uncertainty Principle in harmonic analysis. Gluskin [unpublished] and Talagrand [Selection of proportion of characters] observed that some form of UP implies Euclidean sections of L_1 (thus Euclidean sections of the cube).

We use this idea to observe a new connection between the Uncertainty Principle and the vector quantization theory. We show that for frames in N dimensions that satisfy the Uncertainty Principle, one can quickly convert every frame representation into a more regular Kashin's representation, whose coefficients all have the same magnitude N^{-1/2}. Information tends to spread evenly among these coefficients. As a consequence, Kashin's representations have a great power for reduction of errors in their coefficients. In particular, scalar quantization of Kashin's representations yields robust vector quantizers.

This project started in the Fall of 2003 and has undergone several major transformations since then.


T. Strohmer, R. Vershynin, A randomized solver for linear systems with exponential convergence

The Kaczmarz method for solving linear systems of equations Ax=b is a simple iterative algorithm with many applications ranging from computer tomography to digital signal processing, but whose performance is poorly understood.

The algorithm starts with an arbitrary guess x_0 of the solution, and iteratively projects the running approximation x_k onto the solution spaces of each equation, in the cyclic order. Despite the simplicity and popularity of this method, there are no useful estimates for its rate of convergence.

It has been observed several times in tomography literature that Kaczmarz algorithm should become faster if one projects onto solution spaces in a random, rather than cyclic, order. However, no estimates on the convergence of such randomized version have been known either.

In this paper, we prove that a randomized Kaczmarz algorithm converges with expected exponential rate. This is the first solver whose rate does not depend on the number of equations in the system. The solver does not even need to know the whole system, but only its small random part. It thus outperforms all previously known methods on extremely overdetermined systems. Even for moderately overdetermined systems, numerical simulations reveal that our algorithm can converge faster than the celebrated conjugate gradient algorithm.


R. Vershynin, Beyond Hirsch Conjecture: walks on random polytopes and smoothed complexity of the simplex method

Here we introduce a new effective phase-I in linear programming, and significantly improve the smoothed analysis of the simplex method.

There exist algorithms which perform nicely in practice, but can be very slow in (specifically designed) worst cases. The celebrated simplex method is one of them. In practice, the simplex method is observed to solve linear programs in time linear in the dimension, but there exist linear programs for which it takes exponential time (Klee-Minty counterexample). 

To explain the discrepancy between the empirical evidence and theoretical analysis, Spielman and Teng introduced the concept of smoothed analysis of algorithms, where we measure the expected running time of an algorithm under slight random perturbations of arbitrary inputs. The idea is that the worst cases should be so few and isolated that one can rule them out by slightly perturbing the input.

Spielman and Teng proved that the (shadow-vertex) simplex method had polynomial smoothed complexity. On a slight random perturbation of arbitrary linear program, the simplex method finds the solution after a walk on the vertices of the feasible polytope with expected length polynomial in the number of constraints n, the number of variables d and the inverse standard deviation of the perturbation 1/sigma.

We show that the length of walk in the simplex method is actually polylogarithmic in the number of constraints n. Spielman-Teng's bound on the walk was O^*(n^86 d^55 sigma^{-30}), up to logarithmic factors. We improve this to O^*(d^9 sigma^{-4}). This shows that the tight conjectural bound n-d on the length of walk on polytopes (Hirsch Conjecture) is not a limitation for the smoothed Linear Programming. Random perturbations create short paths between vertices.

One ingredient of independent interest that is developed here is a new phase-I for solving arbitrary linear programs. Instead of finding a vertex of a feasible set, we add a vertex at random to the feasible set. This does not affect the solution of the linear program with constant probability. So, in expectation it takes a constant number of independent trials until a correct solution is found. This overcomes one of the major difficulties of smoothed analysis of the simplex method -- one can now statistically decouple the walk from the smoothed linear program. This yields a much better reduction of the smoothed complexity to a geometric quantity -- the size of planar sections of random polytopes. We also improve upon the known estimates for that size.


A. C. Gilbert, M. J. Strauss, J. A. Tropp, R. Vershynin, Algorithmic linear dimension reduction in the l_1 norm for sparse vectors

Compressible signals are pervasive in signal processing applications. The essential feature of a compressible signal is that it can be approximated well by a sparse signal. It has recently been observed that it is possible to collect the essential information from a compressible signal by projecting it onto a low-dimensional random subspace. This idea is called sketching in computer science and compressed sensing in signal processing.

The geometric functional analysis has studied non-algorithmic aspects of projections onto random low-dimensional subspaces since 1980s. Many tools are now available to check if such a projection is an almost isometry on a class of functions (a set of vectors in Rd). None of these, however, suggest reconstruction algorithms. Suppose we know that a certain projection is an almost isometry on a certain set in
Rd. Is it possible to reconstruct points in the set from their projections in polynomial time?

Much progress on this problem was made in 2004 by Candes-Tao, Donoho and others, including Rudelson-Vershynin. See our joint FOCS presentation Candes-Tao-Rudelson-Vershynin; also papers by Muthukrishnan and the Compressed Sensing Page. The reconstruction problem has been solved via linear programming, this beautiful idea going back to Donoho and his collaborators. Due to the development of interior point methods in the 1980s, the complexity of the linear program is polynomial in d. Howeverm this might be too slow in applications, especially when d is very large (in streaming algorithms, d=
232 is not uncommon).

This paper develops the first "superfast" reconstruction algorithm whose complexity is polylogarithmic, rather than polynomial, in dimension d (with one specifically designed projection that works for all vectors).


M.Rudelson, R.Vershynin, On sparse reconstruction from Fourier and Gaussian measurements

This is a journal version of the conference paper below. We added an (exponential) probability estimate for our result on random Fourier measurements, and we included a more accurate calculation of the constants for Gaussian measurements. Recently, Donoho and Tanner have been able to compute precise asymptotic behavior for these constants.


M.Rudelson, R.Vershynin, Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements

Here we try to understand how to compute a sparse signal from few frequencies, and we prove the best known estimate on the number of frequencies so that one can compute f using linear programming.

So we want to compute a signal f, viewed as a vector in R^n with small support r, from k = k(r,n) frequencies of f -- the values of its discrete Fourier transform. A simple linear algebra shows that k = 2r frequencies suffice. But is there a formula or an effective (polynomial time) algorithm to compute f from these frequencies?

Donoho and his collaborators advocated that one should be able to "convexify" the reconstruction problem -- relax it to a convex problem with the same solution, which can then be solved using methods of linear programming. When exactly the convex relaxation is equivalent to the original problem is an open question. Candes and Tao found a sufficient condition for this equivalence -- the Restricted Isometry Condition (R.I.C.). This reduces the reconstruction problem to veryfying R.I.C. We prove the best known bound for the number of measurements where R.I.C. holds: k(r,n) =  O(r log(n) log^2(r) log(r log n)). The optimal bound should be O(r log n).

Our arguments are based on the techniques of Geometric Functional Analysis and Probability in Banach spaces, in particular on the development of Rudelson's sampling method for random vectors in the isotropic position.



R. Vershynin, Random sets of isomorphism of linear operators on Hilbert space

This note deals with a problem of the probabilistic Ramsey theory. Given a linear operator T on a Hilbert space with an orthogonal basis, we define the isomorphic structure Sigma(T) as the family of all finite subsets of the basis so that T restricted to their span is a nice isomorphism. We give a dimension-free optimal lower bound on the size of Sigma(T). This improves and extends in several ways the principle of restricted invertibility due to Bourgain and Tzafriri. With an appropriate notion of randomness, we obtain a randomized principle of restricted invertibility.


M.Rudelson, R.Vershynin, Geometric approach to error correcting codes and reconstruction of signals

We develop an approach through geometric functional analysis to error correcting codes and to reconstruction of signals from few linear measurements.

An error correcting code encodes an n-letter word x into an m-letter word y in such a way that x can be decoded correctly when any r letters of y are corrupted. We prove that most linear orthogonal transformations Q from R^n into R^m form efficient and robust robust error correcting codes over reals. The decoder (which corrects the corrupted components of y) is the metric projection onto the range of Q in the L_1 norm.

An equivalent problem arises in signal processing: how to reconstruct a signal that belongs to a small class from few linear measurements? We prove that for most sets of Gaussian measurements, all signals of small support can be exactly reconstructed by the L_1 norm minimization. This is a substantial improvement of recent results of Donoho and of Candes and Tao.

Yet one more equivalent problem in combinatorial geometry is the existence of neighborly polytopes -- polytopes with fixed number of facets and maximal number of lower-dimensional facets. We prove that most sections of the cube form such polytopes.


B.Klartag, R.Vershynin, Small ball probability and Dvoretzky theorem

Large deviation estimates are by now a standard tool in the Asymptotic Convex Geometry, contrary to  small deviation results. In this note we present a novel application of a small deviations inequality to a problem related to the diameters of random sections of high dimensional convex bodies. Our results imply an unexpected distinction between the lower and the upper inclusions in Dvoretzky Theorem.


R.Vershynin,  Frame expansions with erasures: an approach through the non-commutative operator theory

In modern communication systems such as the Internet, random losses of information can be mitigated by oversampling the source. This is equivalent to expanding the source using overcomplete systems of vectors (frames), as opposed to the traditional basis expansions.

Dependencies among the coefficients in frame expansions often allow for better performance comparing to bases under random losses of coefficients. We show that for any n-dimensional frame, any source can be linearly reconstructed from only n log n randomly chosen frame coefficients, with a small error and with high probability. Thus every frame expansion withstands random losses better (for worst case sources) than the orthogonal basis expansion, for which the n log n bound is attained.

The proof reduces the problem to M.Rudelson's selection technique for random vectors in the isotropic position, which is based on the non-commutative Khinchine's inequality.


R.Vershynin,  Isoperimetry of waists and local versus global asymptotic convex geometries (with an appendix by M.Rudelson and R.Vershynin)

Existence of nicely bounded sections of two symmetric convex bodies K and L implies that the intersection of random rotations of K and L is nicely bounded. For L = subspace, this main result immediately yields the unexpected phenomenon: "If K has one nicely bounded section, then most sections of K are nicely bounded". This 'existence implies randomness' consequence was proved independently in [Giannopoulos, Milman and Tsolomitis] and even before that, in [Mankiewicz-Tomczak]. Our main result represents a new connection between the local asymptotic convex geometry (study of sections of convex bodies) and the global asymptotic convex geometry (study of convex bodies as a whole). The method relies on the new 'isoperimetry of waists' on the sphere due to Gromov.



M.Rudelson, R.Vershynin,  Combinatorics of random processes and sections of convex bodies

We find a sharp combinatorial bound for the metric entropy of sets in R^n and general classes of functions. This solves two basic combinatorial conjectures on the empirical processes:

1. A class of functions satisfies the uniform Central Limit Theorem if the square root of its combinatorial dimension is integrable.

2. The uniform entropy is equivalent to the combinatorial dimension under minimal regularity.

Our method also constructs a nicely bounded coordinate section of a symmetric convex body in R^n. In the operator theory, this essentially proves for all normed spaces the restricted invertibility principle of Bourgain and Tzafriri.


R. Vershynin, Integer cells in convex sets

Here a version of the classical Minkowski's theorem is proved for integer cells rather than integer points. Namely, every convex body K in R^n admits a coordinate projection that contains at least vol(0.1 K) cells of the integer lattice.

Our proof is based on an extension of the combinatorial density theorem of Sauer, Shelah and Vapnik-Chervonenkis from the Boolean cube {0,1}^n to Z^n. This leads to a new approach to coordinate sections of convex bodies. In particular, fundamental results of the asymptotic convex geometry such as the Volume Ratio Theorem and Milman's duality of the diameters admit natural versions for coordinate sections.