-----

Reinforcement Learning Projects

-----

Reinforcement learning is a method for automatically acquiring control policies for agents in stochastic environments.  I've been working on several areas in reinforcement learning:
 

  • Hierarchical Reinforcement Learning
  • This is the main area of my thesis research, and more info can be found here.

    Generalized Prioritized Sweeping
    Working with Ron Parr and Nir Friedman, I've developed a generalization of Andrew Moore and Chris Atkeson's prioritized sweeping algorithm.       Prioritized sweeping is a model-based reinforcement learning method that attempt to focus the agent's limited computational resources to achieve a good estimate of the value of environment states. The classic account of prioritized sweeping uses an explicit, state-based representation of the value, reward, and model parameters. Such a representation is unwieldy for dealing with complex environments and there is growing interest in learning with more compact representations. We claim that classic prioritized sweeping is ill-suited for learning with such representations. To overcome this deficiency, we introduced generalized prioritized sweeping, a principled method for generating representation-specific algorithms for model-based reinforcement learning. We then apply this method for several representations, including state-based models and generalized model approximators (such as Bayesian networks). We compared the approach with standard prioritized sweeping in the NIPS paper, found here. With Jeff Forbes,I'm also working to extend this algorithm to continuous spaces. This project is described in more detail below.


    Continuous Reinforcement Learning
    Jeff Forbes and I have been working to create a system that can do reinforcement learning in continuous domains. He's invented an RL system for doing instance-based function approximation and instance husbandry that allows the agent to use lazy-learning of examples for the q-value function and uses bounded time and memory to do so. Together we've applied GenPS to his function approximators and a novel(for RL) kind of model of the environment to create a model-based RL agent that can learn in continuous spaces. An extended abstract on this topic can be found here.


    Bayesian Exploration in Model-Based RL
    With Nir Friedman and Richard Dearden.
    Abstract: Reinforcement learning systems are often concerned with balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the classical notion of Value of Information --- the expected improvement in future decision quality arising from the information acquired by exploration. Estimating this quantity requires an assessment of the agent's uncertainty about its current value estimates for states. In this paper we investigate ways of representing and reasoning about this uncertainty in algorithms where the system attempts to learn a model of its environment. We explicitly represent uncertainty about the parameters of the model and build probability distributions over Q-values based on these. These distributions are used to compute a myopic approximation to the value of information for each action and hence to select the action that best balances exploration and exploitation. The paper can be found here.


    -----

    Back to Dave's Home Page


    UCBStanford