Skip navigation

Convergence of online policy gradient algorithms in reinforcement learning

Mr. Evan Greensmith (ANU)

CSL SEMINAR SERIES

DATE: 2003-09-02
TIME: 16:00:00 - 17:00:00
LOCATION: RSISE Seminar Room, ground floor, building 115, cnr. North and Daley Roads, ANU
CONTACT: JavaScript must be enabled to display this email address.

ABSTRACT:
In reinforcement learning an agent explores the space of an environment, and receives rewards when favourable situations arise. We wish to choose a policy---how an agent chooses actions---such that we expect a high value of rewards in the future; our measure of performance. Gradient based methods improve this performance locally by taking steps in the policies parameter space. In online methods these steps are taken following each action, rather than spending time to more accurately estimate the current policy gradient.

In this talk I will describe the reinforcement learning framework, and online policy gradient algorithms used therein. I will then describe work showing the convergence of such online policy gradient algorithms, i.e. work showing that, in the limit, the gradient of the performance approaches zero.
BIO:
Evan obtained bachelor degrees in computer science and computer systems engineering from RMIT University in 2000. He is currently a PhD student in the Machine Learning group of the CSL in the RSISE.

Updated:  2 September 2003 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.