Convergence of online policy gradient algorithms in reinforcement learning
Mr. Evan Greensmith (ANU)
CSL SEMINAR SERIESDATE: 2003-09-02
TIME: 16:00:00 - 17:00:00
LOCATION: RSISE Seminar Room, ground floor, building 115, cnr. North and Daley Roads, ANU
CONTACT: JavaScript must be enabled to display this email address.
ABSTRACT:
In reinforcement learning an agent explores the space of an environment, and receives rewards when favourable situations arise. We wish to choose a policy---how an agent chooses actions---such that we expect a high value of rewards in the future; our measure of performance. Gradient based methods improve this performance locally by taking steps in the policies parameter space. In online methods these steps are taken following each action, rather than spending time to more accurately estimate the current policy gradient.
In this talk I will describe the reinforcement learning framework, and
online policy gradient algorithms used therein. I will then describe
work showing the convergence of such online policy gradient
algorithms, i.e. work showing that, in the limit, the gradient of the
performance approaches zero.
BIO:
Evan obtained bachelor degrees in computer science and computer
systems engineering from RMIT University in 2000. He is currently a
PhD student in the Machine Learning group of the CSL in the RSISE.
