High-dimensional pattern recognition using low-dimensional embedding and Earth Mover's Distance (with L. Lieu), submitted for publication, 2008, revised 2009.

Abstract

We propose an algorithm that combines existing techniques in a novel way to do classification of datasets consisting of high-dimensional data (e.g., sets of signals or images). Furthermore, our algorithm sets up a framework for application of the Earth Mover's Distance (EMD) [Rubner-Tomasi 1999, Rubner-Tomasi-Guibas 2000] as a discriminant measure between datasets. We show how to prepare a compact representation --- a signature --- for each dataset so that computation of EMD between datasets can be done efficiently. This signature-construction step requires the tasks of dimension reduction, automatic determination of the data's intrinsic dimensionality, out-of-sample extension, and point clustering. We will show how to apply some existing methods (which include Laplacian eigenmaps [Belkin-Niyogi, 2001, 2003, 2005], diffusion maps framework [Coifman-Lafon 2006, Lafon 2004, Lafon-Keller-Coifman 2006], and the elongated K-means [Sanguinetti-Laidler-Lawrence 2008]) to perform these tasks successfully. We will also provide two examples of applications of our proposed algorithm.

Get the full paper (Revised on 07/07/09): PDF file.



Please email me if you have any comments or questions!
Go back to Naoki's Publication Page