Mathematics Colloquia and Seminars

Return to Colloquia & Seminar listing

Diffusion Geometries for Data Mining and Dimensionality Reduction

Applied Math

Speaker: Stephane Lafon, Google Inc.
Location: 693 Kerr
Start time: Fri, Nov 4 2005, 4:10PM

One of the main challenges of modern data mining is having to deal with high-dimensional data. In particular, classification, regression and pattern recognition become incredibly complex in high dimension because of the so-called curse of dimensionality. Over the past few years, new nonlinear dimension reduction techniques have emerged from the machine learning community. These new tools aim at finding parametrizations of data sets in order to improve the performance of typical machine learning tasks such as classification and regression. In this talk, I present a powerful framework for dimension reduction based on the spectral properties of certain Markov chains on the data. In particular, I describe how to construct coordinates parametrizing any data set that can put in the form of a graph. These coordinates are used to learn the intrinsic geometry of the data, and therefore they allow dimensionality reduction. I also introduce an explicit metric on the data, the diffusion distance, that proves to be extremely useful for classification and regression purposes. This metric allows the design inference algorithms based on the preponderance of evidences. All these ideas are illustrated with various examples on document classification, lexicon organization and automatic concept extraction, audio and visual pattern recognition... This is joint work with R.R. Coifman (Yale), Y. Keller (Yale), I.G. Kevrekidis (Princeton), A.B. Lee (CMU), M. Maggioni (Yale), B. Nadler (Weizmann institute).