Mathematics Colloquia and Seminars

In modern statistical practice, one often encounters $n\times p$ data matrices with $n$ and $p$ both large. Classical statistical multivariate analysis (T.W Anderson 1963) fails to apply in this setting. Using random matrix theory, Johansson (2000) and Johnstone (2001) recently shed light on the behavior of the largest eigenvalue of a complex Wishart matrix when the true covariance is $\mathrm{Id}$. Specifically, when the entries of the $n\times p$ matrix $X$ are i.i.d ${\cal N}(0,1/\sqrt{2})+i{\cal N}(0,1/\sqrt{2})$ and $n/p\rightarrow \rho \in (0,\infty)$, they showed - among other things - that $l_1^{(n,p)}$, the largest eigenvalue of the empirical covariance matrix $X^*X$, converges in distribution to $W_2$ (after proper recentering and rescaling), a random variable whose distribution is the Tracy-Widom law appearing in the study of GUE . We will discuss two extensions of this result. First, we will explain that in this situation, one can find centering and scaling sequences $\mu_{n,p}$ and $\sigma_{n,p}$ such that $P((l_1^{(n,p)}-\mu_{n,p})/\sigma_{n,p}\leq s)$ tends to its limit at rate at least 2/3. Second, we will consider the case where the rows of $X$ are $p$-dimensional independent vectors with distribution ${\cal N}(0,\Sigma_p/\sqrt{2})+i{\cal N}(0,\Sigma_p/\sqrt{2})$. For a quite large class of matrices $\Sigma_p$ (including for instance well-behaved Toeplitz matrices), it turns out that $l_1^{(n,p)}$ converges again to $W_2$. We will give (numerically) explicit formulas for centering and scaling sequences in this setting and highlight connections between this result and work of Bai, Silverstein and co-authors about a.s behavior of the largest eigenvalue of random covariance matrices. Finally, time permitting, we will illustrate how these and related theoretical insights might be used in statistical practice.