Chapter 11 - Preplanning

C H A P T E R 11

Preplanning

11.1 Introduction

architecture must also be discussed, not just planning methods.

11.2 Sequential Planning

Compulsive deliberation is wasteful in that computations are repeated unnecessarily. Once a player is able to find a path to a terminal state with maximal reward, it should not have to repeat that computation on every step. Sequential planning is the antithesis of compulsive deliberation in which no work is repeated. Once a sequential planner finds a good path, it simply saves the sequence of actions along that path and then executes the actions step by step until the game is done without any further deliberation.

A sequential plan for a single player game is any sequence of feasible actions. A sequential plan is legal if and only if every action in the sequence is legal in the state in which that action is performed. A plan is complete if and only if it leads from the initial state of the game to a terminal state. A plan is minimal if and only if none of the intermediate states is terminal.

Consider the 8-puzzle game described above. The sequential plans shown below are all legal, complete, and minimal. The actions in the sequences shown are all legal. Each leads from the initial state to a terminal state. And none of the intermediate states produced during the execution is terminal.

`[right,down,right,down]`
`[right,down,left,right,right,down]`
`[right,down,left,right,left,right,right,down]`

Note that this definition does not require that the sequence of actions produces the best possible result. This allows us to compare good plans and bad plans. We say that one sequential plan is better than another if and only if the reward associated with the first plan is greater than the reward associated with the second plan; and we define a sequential plan to be optimal if and only if there is no better sequential plan.

The table below shows that rewards associated with the plans shown above. Clearly, the second plan is better than the first, and the last three plans are optimal.

`[right,left,right,left,right,left,right,left]`	40
`[right,right,left,right,left,right,left,right]`	50
`[right,down,right,down]`	100
`[right,down,left,right,right,down]`	100
`[right,down,left,right,left,right,right,down]`	100

Sequential planning is the process of finding sequential plans. A sequential planner is said to be admissible if and only if it returns an optimal sequential plan.

A sequential planning player is one that produces an optimal sequential plan and then executes the steps of that plan during game play. The planning is usually done during the start-up period of the game, but it can also be done during regular game play. It is also possible to mix sequential planning with other techniques. For example, in the case of large games, a player might play randomly during the initial part of a game and then switch to sequential planning once the game tree becomes small enough. Of course, in this last case, the player's ability to succeed depends on the strategy used before sequential planning commences.

Definitions for the basic methods for pure sequential planning are shown below. The start method invokes a procedure, called bestplan, to produce a sequential plan and stores this plan for later use. On each play of the game, the player performs the corresponding action in the plan. The stop and abort methods are the same as for the other players we have seen thus far.

var plan;
var step;

function start (id,player,rules,start,play)
 {game = rules;
  role = player;
  roles = findroles(game);
  state = findinits(game);
  plan = bestplan(role,state)[1];
  step = 0;
  return 'ready'}

function play (id,move)
 {var action = plan[step];
  step = step + 1;
  return action}

A depth-first search is conceptually the simplest approach to sequential planning. The procedure takes a player and a state as arguments and returns the corresponding utility and plan as value. The procedure takes the form of a recursive exploration of the game tree. If the state supplied as argument is terminal, the output is just the player's reward for that state and the empty plan. Otherwise, the output is the maximum of the utilities of the states resulting from executing any of the player's legal actions in the given state and the corresponding plan.

function bestplan (role,state)
 {if (terminalp(state)) {return [findreward(role,state,game),[]]};
  var actions = findlegals(role,state,game);
  var result = bestplan(role,findnext(roles,[actions[0]],state,game));
  var score = result[0];
  var plan = result[1];
  plan[plan.length] = actions[0];
  for (var i=1; i<actions.length; i++)
      {var result = bestplan(role,findnext(roles,[actions[i]],state,game));
       if (result[0]>score}
          {score = result[0];
           plan = result[1];
           plan[plan.length] = actions[i]};
  return [score,plan]}

Note that this procedure may not produce the shortest plan. However, it is guaranteed to produce an optimal plan as defined above.

11.3 Conditional Planning

While it is possible, in some multiple player games, to find sequential plans that achieve maximal rewards, this is rarely the case. In order to achieve an optimal reward, it is frequently necessary for a player to conditionalize its actions on the state of the game. The plans in such cases are often called conditional plans.

A conditional plan for a multiple-player game is a tree, with that leads from the initial state of the game to a terminal state. A conditional plan is legal if and only if every action in the sequence in legal in the state in which that action is performed. A conditional plan is minimal if and only if none of the intermediate states produced during the execution is terminal.

Consider the game of Tic-Tac-Toe. The expression shown below is a conditional plans for this game. ... Each leads from the initial state to a terminal state. All of the sequences shown are legal. And they are all minimal in the sense described above. While the last plan contains the same actions as the penultimate plan, it is still minimal because no intermediate state produced during the execution is terminal.

function start (id,role,rules,start,play)
 {description = rules;
  var result = bestplan(role,findinits(game));
  plan = result[1];
  return 'ready'}

function play (id,move)
 {return plan[step]}

A depth-first search is conceptually the simplest approach to sequential planning. The procedure takes a player and a state as arguments and returns the corresponding utility as value. The procedure via a recursive exploration of the game graph. If the state supplied as argument is terminal, the output is just the player's reward for that state. Otherwise, the output is the maximum of the utilities of the states resulting from executing any of the player's legal actions in the given state. To find a plan, the player uses a variation on the maxscore procedure used in incremental deliberation. The only difference is that, in addition to returning a score, it also returns a sequence of actions that produces that score.

function bestplan (role,state)
 {if (findterminalp(state,game)) {return seq(findreward(role,state,game),seq())};
  var actions = findlegals(role,state,game);
  var result = bestplan(role,findnexts(actions[0],state,game));
  var score = result[0];
  var plan = result[1];
  plan[plan.length] = actions[0];
  for (var i=1; i<actions.length; i++)
      {var result = bestplan(role,findnexts(actions[i],state,game));
       if (result[0]>score}
          {score = result[0];
           plan = result[1];
           plan[plan.length] = actions[i]};
  return seq(score,plan)}

11.1 Introduction

11.2 Sequential Planning

11.3 Conditional Planning

Problems