Mathematics Colloquia and Seminars

Return to Colloquia & Seminar listing

Machine Learning and Data Mining with Combinatorial Optimization

Mathematics of Data & Decisions

Speaker: Dorit Hochbaum, UC Berkeley
Related Webpage: https://www.ieor.berkeley.edu/~hochbaum/
Location: 1147 MSB
Start time: Tue, Mar 12 2019, 4:10PM

The dominant algorithms for machine learning tasks fall most often in the realm of AI or continuous optimization of intractable problems. We present combinatorial algorithms for machine learning, data mining, and image segmentation that, unlike the majority of existing machine learning methods, utilize pairwise similarities. These algorithms are efficient and reduce the classification problem to a network flow problem on a graph. One of these algorithms addresses the problem of finding a cluster that is as dissimilar as possible from the complement, while having as much similarity as possible within the cluster. These two objectives are combined either as a ratio or with linear weights. This problem is polynomial time solvable and is a variant of normalized cut, which is intractable. The problem and the polynomial-time algorithm solving it are called HNC. It is demonstrated, via an extensive empirical study, that incorporating the use of pairwise similarities improves accuracy of classification and clustering. However, the use of similarities implies a quadratic rate of growth in the size of the data. A methodology called "sparse computation" has been devised to address and eliminate this quadratic growth. It is demonstrated that the technique of "sparse computation" enables the scalability of similarity-based algorithms to very large-scale data sets

while maintaining high levels of accuracy. We demonstrate several applications of variants of HNC for data mining, medical imaging, and image segmentation tasks, including a recent one in which HNC is among the top performing methods in a benchmark for cell identification in calcium imaging movies for neuroscience brain research.