David Andre's Dissertation Page
Programmable Reinforcement Learning Agents
I completed my dissertation in May of 2003. It is available here in
several formats. There's one-sided and two-sided versions, where both
are appropriately offset for binding. There's postscript, PDF, and
gzipped versions of those.
Coming soon: powerpoint slides of
talks I've given on this work are coming soon and animated gif movies
of the continuous taxi domain.
Enjoy.
Postscript, single sided
Postscript, single sided, gzipped
PDF, single sided
PDF, single sided, gzipped
Postscript, two sided
Postscript, two sided, gzipped
PDF, two sided
PDF, two sided, gzipped
Andre, David. (2003). Programmable reinforcement learning agents.
Ph.D. dissertation, University of Califronia, Berkeley, California.
Abstract
This dissertation examines the use of partial programming as a means
of designing agents for large Markov Decision Problems. In this
approach, a programmer specifies only that which they know to be
correct and the system then learns the rest from experience using
reinforcement learning.
In contrast to previous low-level languages for partial programming,
this dissertation presents ALisp, a Lisp-based high-level partial
programming language. ALisp allows the programmer to constrain the
policies considered by a learning process and to express his or her
prior knowledge in a concise manner. Optimally completing a partial
ALisp program is shown to be equivalent to solving a Semi-Markov
Decision Problem (SMDP). Under a finite memory-use condition,
online learning algorithms for ALisp are proved to converge to an
optimal solution of the SMDP and thus to an optimal completion of
the partial program.
This dissertation then presents methods for exploiting the
modularity of ALisp to speed the optimal completion of partial
programs. Safe state abstraction allows an agent to ignore aspects
of its current state that are irrelevant to its current decision,
and therefore speeds up reinforcement learning. By decomposing
representations of the value of actions along subroutine boundaries,
greater state abstraction can be obtained. Unlike previous methods
for state abstraction in reinforcement learning, the methods
presented in this research yield significant state abstraction while
maintaining hierarchical optimality , i.e., optimality among
all policies consistent with the partial program. These methods are
demonstrated on two simulated taxi tasks.
Function approximation, a method for representing the value of
actions, allows reinforcement learning to be applied to
problems where exact methods are intractable. Soft shaping is a
method for guiding an agent toward a solution without constraining
the search space. Both can be integrated with ALisp. ALisp
with function approximation and reward shaping is successfully
applied on a difficult continuous variant of the simulated taxi
task.
Together, the methods presented in this work comprise a system for
agent design that allows the programmer to specify what they know,
hint at what they suspect using soft shaping, and leave unspecified
that which they don't know; the system then optimally completes the
program through experience and takes advantage of the hierarchical
structure of the specified program to speed learning.
Go back to my home page.