Transcription

A Replanning Algorithm for Decision TheoreticHierarchical Planning: Principles and EmpiricalEvaluationGuido Boella [email protected] Damiano [email protected] di Informatica, Universita’ di TorinoC.so Svizzera 185, 10149 Torino ItalyAugust 4, 2005AbstractIn this paper, we present a replanning algorithm for a decision-theoretichierarchical planner, and illustrate the experimental methodology we designed to investigate its performance. The methodology relies on an agentbased framework, in which plan failures emerge from the interplay of theagent and the environment. Given this framework, the replanning algorithm is compared with planning from scratch by executing experimentsin different domains. The empirical evaluation shows the superiority ofreplanning with respect from planning from scratch. However, the observation of significant differences in the data collected across planningdomains confirm the importance of empirical evaluation in practical systems.1

1IntroductionReplanning is a central research issue for a variety of AI applications involvingthe execution of plans in realistic contexts, ranging from real-time architecturesfor automation to agent-based and multi-agent systems. Although Nebel andKoehler [Nebel and Koehler, 1993] have shown that some forms of replanningare harder than planning itself, the interest for replanning is motivated by boththeoretical and practical considerations. In real time systems, the interleavingof deliberation and execution is an effective strategy to tackle non-determinismand incomplete knowledge, but does not eliminate completely the need to replanfrom time to time.[Myers, 1999, Lemai and Ingrand, 2004]. More importantly,the stability of commitment remains a key property of agents in social contexts.For example, in multi-agent systems, a certain amount of a priori coordinationon individual plans cannot be set aside, together with the necessity for the agentsto keep to their plans as much as possible during the joint activity. So, efficientand effective replanning techniques are required to guarantee the stability andpredictability of the behaviour of agents.In this paper, we present a replanning algorithm and propose an agent-basedmethodology and framework for its empirical evaluation. The replanning algorithm extends the planning algorithm of the DRIPS system [Haddawy and Hanks, 1998],that conjugates hierarchical planning with Decision Theory [Luce and Raiffa, 1957].The evaluation methodology rests on an experimental framework that simulates the occurrence of plan failures as a result of the interaction between anautonomous agent and a non-deterministic environment. The replanning algorithm is compared with the alternative of planning from scratch based on timeperformance and quality of output plans.The paper is structured as follows: Section 2 describes the replanning algorithm; the evaluation framework and methodology are described in Section2

3. Section 4 reports the experiments performed according to the methodology,discussed in Section 5. Conclusions end the paper.2Extending Decision-Theoretic Hierarchical Planning to ReplanningHierarchical planning [Sacerdoti, 1977] lends itself to the design of resourcebounded agents as the levels of abstraction encoded in hierarchical plans canbe directly mapped to the elaboration of partial plans. Partial plans are a keynotion in the theory of practical reasoning [Bratman, 1990], since they allowavoiding premature commitment to overdetailed plans. In the context of replanning, they provide a useful instrument to manipulate commitment at thelower levels of detail, leaving it unmodified at the higher levels.Decision-theoretic planning allows an agent to make a distinction betweenthe achievement of a goal and the utility of the states in which the goal isachieved. States are represented as sets of attribute-value pairs and the individual preferences of an agent are expressed as functions which map the value of single attributes to real numbers, representing the utility values. By releasing the“all or nothing” goal satisfaction paradigm of classical planning [Blythe, 1999],in decision-theoretic planning the achievement of a goal can be weighted againstthe costs of achieving it, and goals can be partially satisfied.2.1The Planning AlgorithmThe representation of the world in DRIPS consists of attribute-value pairs.Attributes can have boolean values as well as continuous or discrete ones (e.g.,the consumption of fuel is represented by a continuous quantity). In order toaccount for the role of uncertain knowledge about the world in planning, each3

attribute is associated with a probability distribution on its values. Since theattributes are assumed to be independent, this representation is equivalent toa set of alternative worlds in which each attribute has a certain value with acertain probability. Formally, the representation of the current state of the worldWs is a set of pairs {Ai , P Di }, where Ai is an attribute and Pi is a probabilitydistribution on its values:Ws {{A1 , P D1 }, . . . , {An , P Dn }}For each attribute Ai , the probability distribution on its values is a set of pairs{Vij , Pij }, where Vij is one of the possible values, and Pij is the probability ofthat value:P Di {{Vi1 , Pi1 }, ., {Vin , Pin }}whereP(Pi1 , ., Pin ) 1.The effects of an action are non-deterministic and depend on a set mutuallyexclusive and exhaustive conditions. For example, the definition of the actiongo-X-room-4-to-2 in Figure 1 states that, when the action is executed while theagent is in room 4 and the door is open, there is one chance out of ten that theagent ends up being in the same room.1 Notice that there is no proper goalin the action representation: the DRIPS system provides an inter-subjectiverepresentation of actions, and relies on the agent’s utility function to representindividual goals. In order to account for the notion of action failure, we augmented the original representation with a field that specifies the action goal (seethe :goal field).1 The representation of effects in DRIPS allows representing the uncertainty associatedwith both complex effects, defined on multiple attributes (see the :effs field in Figure 1), andelementary effects (defined on individual attributes, see the :eff field), by posing separateprobability distributions on the two levels.4

:decompositiongo-4-2(at time fuel open-door):cond-eff :cond ((and (at 4)(door open)):effs :eff((at 2) prob 9/10(at 4) prob 1/10(time time 15) prob 1(fuel fuel - 0.5) prob 1)):prob 1:cond-eff :cond ((or (not (at 4))(door closed)):effs :eff((time time 15) 1(fuel fuel - 0.5) 1)):prob 1(at 2)nilnil)Figure 1: The definition of a primitive action, go-4-2, going from room 4 toroom 2, in the office domain (see Section 4.3 for description of the domains).A plan library is an action hierarchy that includes two kinds of abstraction relations, as described by [Haddawy and Suwandi, 1994, Haddawy and Hanks, 1998].The sequential abstraction relation is a task decomposition relation: an actiontype of this kind (complex action type) is a macro-operator that the planner canreplace by a sequence of action types. The specification relation describes howan abstract action type specializes into more specific action types. A primitiveaction is a directly executable action.The input to the planner is a plan that contains only the root of the actionhierarchy; this plan is recursively refined by substituting complex actions withtheir decomposition and abstract actions with the more specific actions theysubsume, until a set of primitive plans is obtained, i.e., a set of plans composedof primitive actions. At each cycle, the input to planning algorithm is the resultof the refinement performed in the previous cycle, i.e. a set of more detailedplans, which subsume a smaller set of alternatives. Before execution, the utilityof a plan is not known, but its expected utility can be computed on the setof states obtained by projecting the plan onto the current state of the world.5

The expected utility of a plan that has not been fully refined (a partial plan) isan interval that encompasses the expected utilities of all the alternatives plansit subsumes. Suboptimal plans are plans whose expected utility upper boundis lower than the lower bound of some other plan. In order to restrict thesearch space and to identify optimal plans, suboptimal plans are pruned at eachrefinement step.2.2The Replanning AlgorithmIn classical planning, the validity of a plan is evaluated in terms of success orfailure, intended as the truth or falsity of a logical expression. In decisiontheoretic planning, the outcome (or expected outcome) of a course of action ismapped to an utility value, whose acceptability cannot be established a priori.The replanning algorithm presented here delegates to the agent model the taskof defining the acceptable utility values, and assumes that a threshold is sufficient to identify the acceptable values. In this work, we consider as failurethe situation in which the expected utility of plan drops below the acceptabilitythreshold of the agent while the plan is being executed.When a plan fails, it is possible that the current plan is ‘close’ to a similar feasible solution, where “closeness” is represented by the fact that thecurrent plan and a new valid one are subsumed by the same partial plan atsome level of abstraction in the plan space. The key idea of the replanningalgorithm is to retract a refinement choice operated during planning (partialization), and then to restart the refinement process on the partial plan thusobtained, in the attempt to verify if an alternative refinement is acceptable inthe current context ([Boella and Damiano, 2002a, Boella and Damiano, 2002b,Boella and Damiano, 2003]). After a plan has been partialized, the expectedutility of the resulting plan is computed. Since this value encompasses the util-6

function replan (plan, world, threshold)FA : find-focused-action (plan);candidate-sub-plan : find-candidate-sub-plan (plan, FA);new-plan : partialize (plan, FA, candidate-sub-plan, world, threshold);if is-plan (new-plan)then return new-plan;else if (plan plan-root)then return false;elsebeginpartial-plan : collapse-plan (plan, candidate-sub-plan);replan (partial-plan, world, threshold);endend ifend ifend functionFigure 2: The main function of the replanning algorithm, replanity of all its refinements, it is possible to verify if it is a promising candidate, i.e.,if it encompasses a refinement whose utility is above the acceptability threshold.The abstraction and the decomposition hierarchies play complementary rolesin the replanning algorithm: the abstraction hierarchy allows identifying alternatives to the steps of the failed plan, while the decomposition hierarchy focussesthe partialization process on a sub-plan of the current plan, as described below.The partialization process is incremental: if a new valid refinement of the failedplan is not found after the first partialization step, the process is iterated, untileither a new valid plan is found or the root of the action hierarchy is reached.The functioning of the algorithm is the following:1. The starting point of the replanning process (see the replan procedure inFigure 2) is the first non-executed step whose preconditions do not hold.This step becomes the initially focused action (FA).22. find-candidate-sub-plan identifies the sub-plan of the input plan on2 Although the representation of actions does not explicitly report the preconditions of anaction, they can be univocally identified based on the action goal.7

function partialize (plan, FA, sub-plan, world, threshold)prioritized-steps : prioritize-steps (sub-plan);for each step in prioritized-steps dorevision-node : find-revision-node (step);candidate-plan : retract-step (plan, revision-node);if promising (candidate-plan)new-plan : refine (candidate-plan, world);then return new-plan;end ifend for eachreturn false;end functionFigure 3: The Partialize functionwhich it will focus the search for alternatives. In order to do so, it consultsthe derivation tree of the plan to identify the lowest-level decompositionthe FA appears into; then, it marks the steps which appear this decomposition as candidate sub-plan. The rationale is to ensure that only the stepsof the plan which are most directly related to the focused step (i.e. thatare part of the same, lowest-level decomposition) are potentially affectedby the revision process.The subtree of the action hierarchy dominated by the root of the candidate sub-plan constitutes the local context of the FA; the local context isthe portion of the action hierarchy that the algorithm will search for analternative refinement of the input plan (see next step).3. The partialization process (see the partialize function in Figure 3) triesto replace the steps of the candidate sub-plan with alternatives, in anincremental way, until no candidates are left or it succeeds. Each candidate step is replace with a more abstract action type, so as to retract arefinement choice operated by the planning algorithm when the plan wascreated; then, after each retraction, the resulting plan is refined to verifyif the retraction has made it possible to bring back the failed plan to a8

new valid plan. In detail:(a) The prioritize-steps function establishes the order in which thesteps of the candidate sub-plan will be examined for partialization:the FA is examined first; then, the steps on its right (the ones thattemporally follow it) are examined; finally, the steps on its left (theones that precede it) are examined. This order reflects the assumption that it is preferable to modify actions that have not been executed yet.(b) The revision cycle is the core of the algorithm: for each step in thecandidate sub-plan, the algorithm sea