Mathematics Colloquia and Seminars

Return to Colloquia & Seminar listing

Towards a new toolbox of optimal statistical primitives

Mathematics of Data & Decisions

Speaker: Jasper Lee, UC Davis
Location: 1025 PDSB
Start time: Tue, Oct 1 2024, 3:10PM

Given society's increasing reliance on data, its collection and processing into useful information is a technical problem of growing focus, and perhaps paradoxically, a critical bottleneck in many data science and machine learning applications. My research focuses on designing algorithms that push the limits of both statistical efficiency and computational efficiency. In particular, my work tackles the divide between the theory and practice of data science, which exists even for the most basic statistical problems including mean and (co)variance estimation. Conventional methods such as the sample mean, while supported by theoretical results under strong assumptions, are often brittle in the presence of extreme data points. To counter such deficiencies, practitioners often use ad-hoc and unprincipled "outlier removal" heuristics, revealing a marked gap between the theory and practice even for these fundamental problems. In this talk, I will describe my work towards building a new toolbox of optimal statistical primitives, bridging the theory-practice divide. I will specifically highlight 3 works: A) constructing a statistically-optimal and computationally-efficient 1-dimensional mean estimator, whose estimation error is optimal even in the leading multiplicative constant, under bare minimum distributional assumptions, B) a rather different but optimal mean estimator for the "very high-dimensional" regime, and C) a recent result showing that the estimator from "A)" is robust even under the presence of adversarial data corruption.