Return to Colloquia & Seminar listing
Diffusion Geometries for Data Mining and Dimensionality Reduction
Applied Math| Speaker: | Stephane Lafon, Google Inc. |
| Location: | 693 Kerr |
| Start time: | Fri, Nov 4 2005, 4:10PM |
Description
One of the main challenges of modern data mining is having to deal
with high-dimensional data. In particular, classification, regression
and pattern recognition become incredibly complex in high dimension
because of the so-called curse of dimensionality.
Over the past few years, new nonlinear dimension reduction techniques
have emerged from the machine learning community. These new tools aim
at finding parametrizations of data sets in order to improve the
performance of typical machine learning tasks such as classification
and regression. In this talk, I present a powerful framework for
dimension reduction based on the spectral properties of certain Markov
chains on the data. In particular, I describe how to construct
coordinates parametrizing any data set that can put in the form of a
graph. These coordinates are used to learn the intrinsic geometry of
the data, and therefore they allow dimensionality reduction. I also
introduce an explicit metric on the data, the diffusion distance, that
proves to be extremely useful for classification and regression
purposes. This metric allows the design inference algorithms based on
the preponderance of evidences. All these ideas are illustrated with
various examples on document classification, lexicon organization and
automatic concept extraction, audio and visual pattern recognition...
This is joint work with R.R. Coifman (Yale), Y. Keller (Yale), I.G.
Kevrekidis (Princeton), A.B. Lee (CMU), M. Maggioni (Yale), B. Nadler
(Weizmann institute).
