# Mathematics Colloquia and Seminars

### Statistical and computational perspective of mixture and hierarchical models

Mathematics of Data & Decisions

 Speaker: Nhat Ho, UC Berkeley Related Webpage: https://people.eecs.berkeley.edu/~minhnhat/ Location: 1147 MSB Start time: Tue, Dec 3 2019, 4:10PM

The growth in scope and complexity of modern data sets presents the field of statistics and data science with numerous inferential and computational challenges, among them how to deal with various forms of heterogeneity. Mixture and hierarchical models provide a principled approach to modeling heterogeneous collections of data. However, it has been observed that statistical estimation methods, such as maximum likelihood estimator (MLE), or optimization techniques, such as expectation-maximization (EM), have non-standard statistical rates of convergence. In this talk, we provide new insight into these issues of mixture and hierarchical models.

From the statistical viewpoint, we propose a general framework for studying the convergence rates of parameter estimation in mixture models. Our study makes explicit the links between model singularities, parameter estimation convergence rates, and the algebraic geometry of the parameter space for mixtures of continuous distributions. Reposing on this framework, we develop a novel post-processing procedure, named Merge-Truncate-Merge algorithm, to determine the true number of components in mixture models.

From the computational side, we study the EM algorithm under the over-specified settings of mixture models in which the likelihood need not be strongly concave, or, equivalently, the Fisher information matrix might be singular. In such settings, it is known that a global maximum based on $n$ samples has a non-standard rate of convergence. Focusing on the simple setting of a two-component mixture fit with equal mixture weights to a multivariate Gaussian distribution, we demonstrate that EM updates converge to a fixed point at Euclidean distance $O((d/n)^{1/4})$ from the true parameter after $O((n/d)^{1/2})$ steps where d is the dimension. Analysis of this singular case requires the introduction of some novel analysis techniques, in particular we make use of a careful form of localization in the associated empirical process, and develop a recursive argument to progressively sharpen the statistical rate.

This talk features joint work with (alphabetically) Raaz Dwivedi, Aritra Guha, Michael Jordan, Koulik Khamaru, Long Nguyen, Ya’acov Ritov, Martin Wainwright, and Bin Yu.