convergence_analysis_hypergradient_descent

Master Thesis submitted for the course in Mathematics and Foundations of Computer Science. Academic year 2016/2017

View the Project on GitHub




This is a Master Thesis submitted for the course in Mathematics and Foundations of Computer Science at the University of Oxford. Academic year 2016/2017.

Abstract

This dissertation studies the convergence of an adaptive method of gradient descent called Hypergradient Descent. We review some methods of gradient descent and their proofs of convergence for smooth and strongly convex functions. We show that the proof of convergence of Adam, a modern popular method of gradient descent, is incomplete. We derive a multiplicative rule of Hypergradient Descent and justify some choices made by, among other things, proposing a method of gradient descent and proving its convergence. The core of the dissertation is the convergence analysis of an instance of Hypergradient Descent. We prove convergence for quadratic functions and for a simple family of unidimensional functions. We also show that the method diverges if some properties are not assumed. We have implemented the algorithms reviewed in this dissertation and some Hypergradient Descent variants and we have applied them to two large scale optimization problems. We compare the executions and conclude that Hypergradient Descent is a good method in practice, specially because it does not need tuning of the learning rate.