DAGS Markov Decision Processes Home Page

In retrospect, it is interesting to note that the original problem that started my research is still outstanding -- namely the problem of planning or scheduling dynamically over time, particularly planning under uncertainly. If such a problem could be successfuly solved it could eventually through better planning contribute to the well-being and stability of the world.

George B. Dantzig - 1991




Overview
Planning under uncertainty is fundamental to solving many important real-world problems, including applications in robotics, network routing, scheduling, and financial decision making. Markov Decision Processes provide a formal framework for modeling these tasks and for deriving optimal solutions. However, in practice the computational effort of solving an MDP may be prohibitive and, moreover, the model parameters of the MDP may be unknown. Our work focusses on exploiting application-specific structure to simplify learning and optimization, addressing these problems.



Current Themes
Factored MDPs
MDPs can be compactly represented using Bayesian Networks. Unfortunately, this compact representation does not directly imply that the optimal policy can be computed or represented efficiently. In this work, we develop approximations that can exploit the compact representation for efficient computation.
Learning
When a model of the environment is not available, we can learn a decision strategy by experimenting with the system. Here one approach is to search the space of strategies directly; an alternative approach estimates the value function of the MDP and derives a strategy from it. Our focus is on the construction of efficient and theoretically sound algorithms that exploit and combine these two approaches.
Continuous States
An important statistical aspect of reinforcement learning in MDPs with continuous state-space is the ability to `generalize' across system states. In our work on reinforcement learning in continuous-state MDPs we focus on kernel-based methods to obtain statistical accuracy while simultaneously guaranteeing the convergence of the approximation procedure.
POMDPs
In many real world control problems, the state of the system cannot be fully observed, e.g., a robot cannot observe its state, just noisy sensor information. Partially Observable Markov Decision Processes (POMDPs) provide a formal mathematical framework for modeling this types of systems. Unfortunately, obtaining the optimal policy for a POMDP is an intractable, in some cases undecidable, problem. We focus on approximation algorithms that are efficient and can exploit problem structure. 
Multi-Agents
We focus on cooperative multi-agent systems, where multiple agents must coordinate to achieve a common goal, e.g., a team of robots mapping an unknown environment. Our work addresses two issues: computation and learning. 



People
Carlos Guestrin Daphne Koller
Dirk Ormoneit Ron Parr



Publications
Coordinated Reinforcement Learning. Carlos Guestrin, Michail Lagoudakis and Ronald Parr. In AAAI Spring Symposium on Collaborative Learning Agents, Stanford, California, March 2002.
Context Specific Multiagent Coordination and Planning with Factored MDPs. Carlos Guestrin, Shobha Venkataraman and Daphne Koller. In AAAI Spring Symposium on Collaborative Learning Agents, Stanford, California, March 2002.
Multiagent Planning with Factored MDPs. Carlos Guestrin, Daphne Koller and Ronald Parr. In Advances in Neural Information Processing Systems (NIPS-14), Vancouver, Canada, December 2001.
Robust Combination of Local Controllers. Carlos Guestrin and Dirk Ormoneit. To appear in 17th Conference on Uncertainty in Artificial Intelligence (UAI-01), Seattle, Washington, August 2001.
Max-norm Projections for Factored MDPs. Carlos Guestrin, Daphne Koller and Ronald Parr. To appear in International Joint Conference on Artificial Intelligence (IJCAI-01), Seattle, Washington, August 2001.
Solving Factored POMDPs with Linear Value Functions. Carlos Guestrin, Daphne Koller and Ronald Parr. To appear in the IJCAI-01 workshop on Planning under Uncertainty and Incomplete Information (workshop PRO-2), Seattle, Washington, August 2001.
Max-norm Projections for Factored MDPs. Carlos Guestrin, Daphne Koller and Ronald Parr. AAAI Spring Symposium on Game Theoretic and Decision Theoretic Agents, Stanford, California, March 2001.
D. Ormoneit and P. Glynn. Kernel-based reinforcement learning in average-reward problems, 2000. In preparation.
D. Ormoneit and P. Glynn. Kernel-based reinforcement learning in average-cost problems: An application to optimal portfolio choice. In Advances in Neural Information Processing Systems 13. The MIT Press, 2001. Accepted for Publication.
D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 2000. To appear. An older technical report is available here.
 
Presentations
Max-norm Projections for Factored MDPs. Carlos Guestrin, Daphne Koller and Ronald Parr. International Joint Conference on Artificial Intelligence, Seattle, August 2001. (Html)
Robust Combination of Local Controllers. Carlos Guestrin and Dirk Ormoneit. 17th Conference on Uncertainty in Artificial Intelligence, Seattle, August 2001. (Html)
Solving Factored POMDPs with Linear Value Functions. Carlos Guestrin, Daphne Koller and Ronald Parr. International Joint Conference on Artificial Intelligence workshop on planning under uncertainty and incomplete information, Seattle, August 2001. (Html)
Max-norm Projections for Factored MDPs. Carlos Guestrin, Daphne Koller and Ronald Parr. AAAI Spring Symposium, Stanford, California, March 2001. (Html)