Mathematics Colloquia and Seminars
Toward Easy and Efficient Optimization for Modern Supervised LearningPDE and Applied Math Seminar
|Start time:||Wed, Nov 20 2019, 3:10PM|
Tuning hyper-parameters for efficient optimization can be challenging and costly. In this talk, I will show how to perform efficient and tuning-free optimization in a modern setting characterized by over-parameterized models. I will start by presenting a risk curve that reconciles the classical bias-variance trade-off and the modern over-parameterized learning. I then provide empirical and theoretical evidence that learning in the modern regime leads to strong generalization and easy optimization. In particular, we show that SGD achieves exponential convergence with constant step size in the modern regime. Furthermore, our results provide explicit expressions for key optimization parameters. They also explain and make precise the so called “linear scaling rule”, which has become an important heuristic for tuning optimization parameters for deep learning. Based on these findings, we design the first algorithmic solution that “extends” the linear scaling for a class of kernel machines. The “extended” linear scaling allows adapting the computation of training to any computing hardware with massively parallel structure such as GPU. Our approach dramatically accelerates training of these methods without any loss of testing accuracy — in some cases from hundreds of hours of CPU computation to minutes of computation on a single GPU. We believe that the underlying idea for "extending" linear scaling is applicable to different models and is key for efficient optimization algorithms for models such as neural networks.