Context Tree Maximizing Reinforcement Learning
Phuong Nguyen (ANU)
NICTA SML SEMINARDATE: 2012-06-21
TIME: 11:15:00 - 12:00:00
LOCATION: NICTA - 7 London Circuit
CONTACT: JavaScript must be enabled to display this email address.
ABSTRACT:
Recent developments in reinforcement learning for non-Markovian problems witness a surge in history-based methods, among which we are particularly interested in two frameworks, $Phi$MDP and MC-AIXI-CTW. $Phi$MDP attempts to reduce the general RL problem, where the environment's states and dynamics are both unknown, to an MDP, while MC-AIXI-CTW incrementally learns a mixture of context trees as its environment model. The main idea of $Phi$MDP is to connect generic reinforcement learning with classical reinforcement learning. The first implementation of $Phi$MDP relies on a stochastic search procedure for finding a tree that minimizes a certain cost function. This does not guarantee finding the minimizing tree, or even a good one, given limited search time. As a consequence it appears that the approach has difficulties with large domains. MC-AIXI-CTW is attractive in that it can incrementally and analytically compute the internal model through interactions with the environment. Unfortunately, it is computationally demanding due to requiring heavy planning simulations at every single time step. We devise a novel approach called CTMRL, which analytically and efficiently finds the cost-minimizing tree. Instead of the context-tree weighting method that MC-AIXI-CTW is based on, we use the closely related context-tree maximizing algorithm that selects just one single tree. This approach falls under the $Phi$MDP framework, which allows the replacement of the costly planning component of MC-AIXI-CTW with simple Q-Learning. Our empirical investigation shows that CTMRL finds policies of quality as good as MC-AIXI-CTW's on six domains including a challenging Pacman domain, but in an order of magnitude less time.
