|
|
DAGS Markov Decision Processes Home Page
|
In retrospect, it is interesting to note that
the original problem that started my research is still outstanding -- namely
the problem of planning or scheduling dynamically over time, particularly
planning under uncertainly. If such a problem could be successfuly solved
it could eventually through better planning contribute to the well-being
and stability of the world.
George B. Dantzig -
1991
Overview
|
Planning under uncertainty is fundamental to solving many
important real-world problems, including applications in robotics,
network routing, scheduling, and
financial decision making.
Markov Decision Processes provide a formal framework for modeling
these tasks and for deriving optimal solutions. However, in practice
the computational effort of solving an MDP may be prohibitive
and, moreover, the model parameters of the MDP may be unknown.
Our work focusses on exploiting
application-specific structure
to simplify learning and optimization, addressing these problems.
|
Current Themes
|
Factored MDPs
|
MDPs can be compactly represented using Bayesian Networks.
Unfortunately, this compact representation does not directly imply that
the optimal policy can be computed or represented efficiently. In this
work, we develop approximations that can exploit the compact
representation for efficient computation.
|
Learning
|
When a model of the environment is not available,
we can learn a decision strategy by
experimenting with the system. Here one approach is to
search the space of strategies directly; an alternative
approach estimates the value function of the MDP and
derives a strategy from it. Our focus is on the construction
of efficient and theoretically sound algorithms that
exploit and combine these two approaches.
|
Continuous States
|
An important statistical aspect of reinforcement learning
in MDPs with continuous state-space
is the ability to `generalize'
across system states.
In our work on
reinforcement learning in continuous-state MDPs
we focus on kernel-based methods to obtain statistical
accuracy while simultaneously guaranteeing the convergence of the
approximation procedure.
|
POMDPs
|
In many real world control problems, the state of the system cannot be
fully observed, e.g., a robot cannot observe its state, just noisy
sensor information. Partially Observable Markov Decision Processes
(POMDPs) provide a formal mathematical framework for modeling this
types of systems. Unfortunately, obtaining the optimal policy for a
POMDP is an intractable, in some cases undecidable, problem. We focus
on approximation algorithms that are efficient and can exploit problem
structure.
|
Multi-Agents
|
We focus on cooperative multi-agent systems, where multiple agents must
coordinate to achieve a common goal, e.g., a team of robots mapping an
unknown environment. Our work addresses two issues: computation and
learning.
|
Publications
|
Coordinated Reinforcement Learning.
Carlos Guestrin, Michail Lagoudakis and Ronald Parr.
In AAAI Spring Symposium on Collaborative Learning Agents, Stanford, California, March 2002.
|
Context Specific Multiagent Coordination and Planning with Factored MDPs.
Carlos Guestrin, Shobha Venkataraman and Daphne Koller.
In AAAI Spring Symposium on Collaborative Learning Agents, Stanford, California, March 2002.
|
Multiagent Planning with Factored MDPs. Carlos Guestrin, Daphne Koller and Ronald Parr.
In Advances in Neural Information Processing Systems (NIPS-14), Vancouver, Canada, December 2001.
|
Robust Combination of Local Controllers. Carlos Guestrin and Dirk Ormoneit.
To appear in 17th Conference on Uncertainty in Artificial Intelligence (UAI-01), Seattle, Washington, August
2001.
|
Max-norm Projections for Factored MDPs. Carlos Guestrin, Daphne Koller and Ronald Parr.
To appear in International Joint
Conference on Artificial Intelligence (IJCAI-01), Seattle, Washington, August
2001.
|
Solving Factored POMDPs with Linear Value Functions. Carlos Guestrin, Daphne Koller and Ronald Parr.
To appear in the IJCAI-01 workshop on Planning under Uncertainty and Incomplete Information (workshop PRO-2), Seattle, Washington, August 2001.
|
Max-norm Projections for Factored MDPs.
Carlos Guestrin, Daphne Koller and Ronald Parr.
AAAI Spring Symposium on Game Theoretic and Decision Theoretic Agents, Stanford, California, March 2001.
|
D. Ormoneit and P. Glynn.
Kernel-based reinforcement learning in average-reward problems, 2000.
In preparation.
|
D. Ormoneit and P. Glynn.
Kernel-based reinforcement
learning in average-cost problems:
An application to optimal portfolio choice.
In Advances in Neural Information Processing Systems 13. The
MIT Press, 2001.
Accepted for Publication.
|
D. Ormoneit and S. Sen.
Kernel-based reinforcement learning.
Machine
Learning, 2000.
To appear.
An older technical report is available
here.
|
|
Presentations
|
Max-norm Projections for Factored MDPs.
Carlos Guestrin, Daphne Koller and Ronald Parr.
International Joint Conference on Artificial Intelligence, Seattle, August 2001.
|
(Html)
|
Robust Combination of Local Controllers.
Carlos Guestrin and Dirk Ormoneit.
17th Conference on Uncertainty in Artificial Intelligence, Seattle, August 2001.
|
(Html)
|
Solving Factored POMDPs with Linear Value Functions.
Carlos Guestrin, Daphne Koller and Ronald Parr.
International Joint Conference on Artificial Intelligence workshop on planning under uncertainty and incomplete information, Seattle, August 2001.
|
(Html)
|
Max-norm Projections for Factored MDPs.
Carlos Guestrin, Daphne Koller and Ronald Parr.
AAAI Spring Symposium, Stanford, California, March 2001.
|
(Html)
|
| |