Skip navigation
The Australian National University

Confident Bayesian Sequence Prediction -&&- Q-learning for history-based reinforcement Learning

Tor Lattimore -&&- Mayank Daswani (with Pizza )

ARTIFICIAL INTELLIGENCE SEMINAR PhD Monitoring

DATE: 2013-05-15
TIME: 11:30:00 - 12:30:00
LOCATION: RSISE Common LHS Room A203 ANU
CONTACT: JavaScript must be enabled to display this email address.

ABSTRACT:
Abstract1: Sequence prediction is a large component of many problems in AI. If a sequence of symbols is sampled from identical and independent distributions, then standard statistical tools can be used for computing estimators of the distribution as well as confidence bounds on their accuracy. In this talk I will discuss an approach for constructing confidence bounds for the prediction of non-IID data using Bayesian methods.

Abstract2: In this talk I extend the classical Q-learning algorithm to the history-based setting by using the temporal difference as a cost function for selecting subsets of features along with an l_0 regulariser. This fits nicely within the existing feature reinforcement learning framework and is also linked to current literature on temporal difference learning methods. I show experimental results on some small domains with a view to scaling to larger domains.


BIO:
http://people.cecs.anu.edu.au/user/4102

http://people.cecs.anu.edu.au/user/4025



Updated:  9 May 2013 / Responsible Officer:  JavaScript must be enabled to display this email address. / Page Contact:  JavaScript must be enabled to display this email address.