Mathematics Colloquia and Seminars

Return to Colloquia & Seminar listing

Topic Modeling: A Provable Algorithm

Probability

Speaker: Ravi Kannan, Microsoft Research
Location: 2112 MSB
Start time: Wed, Mar 4 2015, 2:10PM

Topic Modeling is widely used. The general model inference problem is hard. Known provable algorithms need some strong assumptions on the model. After a from-first-principles introduction, the talk will formulate a model with two natural assumptions: (i) each document has a dominant topic and (ii) each topic has dominant words. We provably solve the model fitting problem. The algorithm is based on three natural components: a thresholding step, Singular Value Decomposition and clustering. The proof crucially draws upon recent results on the convergence of another widely used tool: Lloyd’s k-means algorithm. We test both the performance of the algorithm and the validity of the assumptions on real document corpora with success. The simple device of thresholding has other uses - we will see an exponential advantage for certain “planted” problems in terms of the signalto-noise ratio. Joint with T. Bansal and C. Bhattacharyya, Indian Institute of Science.