Mathematics Colloquia and Seminars
Neural Networks Learning: The Power of InitializationPDE and Applied Math Seminar
|Speaker:||Amit Daniely, Google|
|Start time:||Fri, May 13 2016, 4:10PM|
Given real numbers w_1,...,w_d called weights, an (artificial) neuron computes the function g(x_1,...,x_d) = s(w_1*x_1+...+w_d*x_d) where s:R->R is some fixed (usually non-linear) function. A neural network is obtained by connecting many neurons. Given weights for each of them, it computes a function f:R^n->R.
Neural networks are useful for supervised learning, where the goal is to approximate (learn) a function f*:R^n->R based on a sample (x_1,f*(x_1)),...,(x_m,f*(x_m)). To this end, neural-networks algorithms fix a network, initialize its weights at random, and then locally optimize the weights in order to fit the sample. Despite this procedure optimizes highly non-convex objectives, neural networks enjoy exceptional success recently.
We develop a general connection between neural networks and reproducing kernel Hilbert spaces. Concretely, we show that with high probability over the initial choice of the weights, all functions in the corresponding kernel space can be approximated by a simple and convex change of the network's weights. Hence, even though the training objective is non-convex, the initial random network often forms a good starting point for optimization.
Joint work with Roy Frostig and Yoram Singer