Friday, September 2, 2011

Equivalence Notions and Model Minimization in Markov Decision Processes

Abstract
Many stochastic planning problems can be represented using Markov Decision Processes (MDPs). A difficulty with using these MDP representations is that the common algorithms for solving them run in time polynomial in the size of the state space, where this size is extremely large for most real-world plan- ning problems of interest. Recent AI research has addressed this problem by representing the MDP in a fac- tored form. Factored MDPs, however, are not amenable to traditional solution methods that call for an ex- plicit enumeration of the state space. One familiar way to solve MDP problems with very large state spaces is to form a reduced (or aggregated) MDP with the same properties as the original MDP by combining “equivalent” states. In this paper, we discuss applying this approach to solving factored MDP problems— we avoid enumerating the state space by describing large blocks of “equivalent” states in factored form, with the block descriptions being inferred directly from the original factored representation. The resulting reduced MDP may have exponentially fewer states than the original factored MDP, and can then be solved using traditional methods. The reduced MDP found depends on the notion of equivalence between states used in the aggregation. The notion of equivalence chosen will be fundamental in designing and analyzing algorithms for reducing MDPs. Optimally, these algorithms will be able to find the smallest possible re- duced MDP for any given input MDP and notion of equivalence (i.e. find the “minimal model” for the in- put MDP). Unfortunately, the classic notion of state equivalence from non-deterministic finite state ma- chines generalized to MDPs does not prove useful. We present here a notion of equivalence that is based upon the notion of bisimulation from the literature on concurrent processes. Our generalization of bisimula- tion to stochastic processes yields a non-trivial notion of state equivalence that guarantees the optimal pol- icy for the reduced model immediately induces a corresponding optimal policy for the original model. With this notion of state equivalence, we design and analyze an algorithm that minimizes arbitrary factored MDPs and compare this method analytically to previous algorithms for solving factored MDPs. We show that previous approaches implicitly derive equivalence relations that we define here.


1 Introduction
Discrete state planning problems can be described semantically by a state- transition graph (or model), where the vertices correspond to the states of the system, and the edges are possible state transitions resulting from actions. These models, while often large, can be efficiently represented, e.g. with factoring, without enumerating the states.
Well-known algorithms have been developed to operate directly on these models, including methods for determining reachability, finding connecting paths, and computing
1optimal policies. Some examples are the algorithms for solving Markov decision proc- esses (MDPs) that are polynomial in the size of the state space [Puterman, 1994]. MDPs provide a formal basis for representing planning problems that involve actions with sto- chastic results [Boutilier et al., 1999]. A planning problem represented as an MDP is given by four objects: (1) a space of possible world states, (2) a space of possible actions that can be performed, (3) a real-valued reward for each action taken in each state, and (4) a transition probability model specifying for each action ˇ and each state p the distri- bution over resulting states for performing action ˇ in state p.

DOWNLOAD...



Robert Givan and Matthew Greig School of Electrical and Computer Engineering Purdue University, West Lafayette, IN 47907 (765) 494-9068 {givan, mgreig}@ purdue.edu
Thomas Dean Department of Computer Science Brown University, Providence, RI 02912
(401) 863-7600 tld@cs.brown.edu

No comments:

Post a Comment