
Reinforcement Learning Projects
Reinforcement learning is a method for automatically acquiring control
policies for agents in stochastic environments. I've been working
on several areas in reinforcement learning:
Hierarchical Reinforcement Learning
This is the main area of my thesis research, and more info can be
found here.
Generalized Prioritized Sweeping
Working with Ron Parr
and Nir Friedman,
I've developed a generalization of Andrew Moore and Chris
Atkeson's prioritized sweeping
algorithm.
Prioritized sweeping is a model-based reinforcement learning method
that attempt to focus the agent's limited computational resources to
achieve a good estimate of the value of environment states. The
classic account of prioritized sweeping uses an explicit, state-based
representation of the value, reward, and model parameters. Such a
representation is unwieldy for dealing with complex environments and
there is growing interest in learning with more compact
representations. We claim that classic prioritized sweeping is
ill-suited for learning with such representations. To overcome this
deficiency, we introduced generalized prioritized sweeping, a
principled method for generating representation-specific algorithms
for model-based reinforcement learning. We then apply this method for
several representations, including state-based models and generalized
model approximators (such as Bayesian networks).
We compared the
approach with standard prioritized sweeping in the NIPS paper, found
here.
With Jeff Forbes,I'm also
working to extend this algorithm to continuous spaces. This project
is described in more detail below.
Continuous Reinforcement Learning
Jeff Forbes and I
have been working to create a system that can do reinforcement
learning in continuous domains. He's invented an RL system for doing
instance-based function approximation and instance husbandry that
allows the agent to use lazy-learning of examples for the q-value
function and uses bounded time and memory to do so. Together we've
applied GenPS to his function approximators and a novel(for RL) kind
of model of the environment to create a model-based RL agent that can
learn in continuous spaces. An extended abstract on this topic can be
found here.
Bayesian Exploration in Model-Based RL
With Nir Friedman and
Richard Dearden.
Abstract:
Reinforcement learning systems are often concerned with balancing
exploration of untested actions against exploitation of actions that
are known to be good. The benefit of exploration can be estimated
using the classical notion of Value of Information --- the expected
improvement in future decision quality arising from the information
acquired by exploration. Estimating this quantity requires an
assessment of the agent's uncertainty about its current value
estimates for states. In this paper we investigate ways of
representing and reasoning about this uncertainty in algorithms where
the system attempts to learn a model of its environment. We explicitly
represent uncertainty about the parameters of the model and build
probability distributions over Q-values based on these. These
distributions are used to compute a myopic approximation to the value
of information for each action and hence to select the action that
best balances exploration and exploitation. The paper can be found
here.
Back to Dave's Home Page
