On the complexity of Frank-Wolfe methodsMathematics of Data & Decisions
|Speaker:||Luis Rademacher, UC Davis|
|Start time:||Tue, Sep 21 2021, 1:10PM|
Frank-Wolfe methods are popular for optimization over a polytope. One of the reasons is because they do not need projection onto the polytope but only linear optimization over it. This talk has two parts.
The first part will be about the complexity of Wolfe's method, an algorithm closely related to Frank-Wolfe methods. In 1974 Phillip Wolfe proposed a method to find the minimum Euclidean-norm point in a convex polyhedron. The complexity of Wolfe's method has remained unknown since he proposed it. The method is important because it is used as a subroutine for one of the most practical algorithms for submodular function minimization. We present the first example that Wolfe's method takes exponential time. Additionally, we improve previous results to show that linear programming reduces in strongly-polynomial time to the minimum norm point problem over a simplex.
The second part will be about the smoothed complexity of Frank-Wolfe methods. To understand its complexity, a fruitful approach in many works has been the use of condition measures of polytopes. Lacoste-Julien and Jaggi introduced a condition number for polytopes and showed linear convergence for several variations of the method. The actual running time can still be exponential in the worst case (when the condition number is exponential). We study the smoothed complexity of the condition number, namely the condition number of small random perturbations of the input polytope and show that it is polynomial for any simplex and exponential for general polytopes. Our results also apply to other condition measures of polytopes that have been proposed for the analysis of Frank-Wolfe methods: vertex-facet distance (Beck and Shtern) and facial distance (Peña and Rodríguez). Our argument for polytopes is a refinement of an argument that we develop to study the conditioning of random matrices. The basic argument shows that for c>1 a d-by-n random Gaussian matrix with n >= cd has a d-by-d submatrix with minimum singular value that is exponentially small with high probability.