Accelerating Stochastic Gradient Descent
Nicol N. Schraudolph (Statistical Machine Learning, National ICT Australia)
MSI Advanced Computation SeminarDATE: 2006-05-29
TIME: 11:00:00 - 12:00:00
LOCATION: GD 35 (John Dedman building)
CONTACT: JavaScript must be enabled to display this email address.
ABSTRACT:
Machine learning typically involves minimizing a differentiable loss function defined over a data set. The size of these data sets is currently exploding as the sensors generating them are becoming cheap, networked, and ubiquitous. Conventional optimization algorithms involving an innermost loop over the entire data set are very inefficient in this scenario.
Optimization methods that can handle stochastic approximation of the loss (and its gradient) from small subsamples of data address this issue. Unfortunately many standard tools of optimization, such as Krylov subspaces and line searches, do not tolerate the noise inherent in stochastic approximation. Consequently, the algorithms employed in stochastic optimization tend to be primitive and inefficient; simple first-order gradient descent and evolutionary algorithms are the rule. Even these methods, however, beat sophisticated deterministic optimization by wide margins on large, redundant data sets.
I am working on ways to accelerate first-order stochastic gradient descent by using second-order gradient information obtained cheaply and locally through implicit curvature matrix-vector products. In my stochastic meta-descent (SMD) algorithm, this cheap curvature information is built up iteratively into a stochastic approximation of Levenberg-Marquardt steps which are then used to adapt individual gradient step sizes. SMD handles noisy, correlated, and non-stationary data well, and can approach the rapid convergence of second-order methods at only linear cost per iteration, thus scaling up to extremely large systems.
To date SMD has been applied to machine learning in multi-layer perceptrons, 3-D projection pipelines, support vector machines, and conditional random fields, enabling adaptive applications in computational fluid dynamics, computer vision, reinforcement learning, and text data mining.
