LIBRARY OFTHE MASSACHUSETTS INSTITUTE
CONSTRUCTION OF OPTIMAL SEQUENTIAL DECISION TREES
by CZrfhu^r WilliamA. Martin April 1971
MASSACHUSETTS
^UTE
OF
TECHNOLOGY
50MEMORIAL
DRIVE
[BRIDGE.
MASSACHUSETTS
(I
MASS.INST.TECH.
JUL
12
1971LIBRARY
DEWEr
CONSTRUCTION OF OPTIMAL SEQUENTIAL DECISIONTREES
by Cirthu.r WilliamA. Martin
ABSTRACT
The goal of a sequential decision problem is to identify one object selected from a knovm set of objects by applying aseries of tests to the object. The outcomes of tests already performed can be used to select the next test. Each test may be applied at most once. Given the a priori probabi-lities of the objects, the cost of each test, and the probabilities of the outcomes of each test given the object to vjhlch it is applied, an algorithm
is given for constructing the tree of tests v;hich minimizes the average
cost of identifying an object. It is assumed that the test results do not depend on the order inwhich the tests are applied. The algorithm uses a
branch and bound procedure to construct partial decision trees. A lov7er bound on the cost of completing a partial tree is taken as the minimum
cost of a set of tests V7hich will provide information about the objects
equal to their entropy, under the assum.ption that the tests are independent.
A Lower Bound
We are given a set X of objects and a series of K tests. The k test has outcomes in the set Y . We are also given the cost C of each test and
the joint probability distribution p(>:, y , ...,y, ) that object X occurs
and tests 1 to k produce outcomes y to y •
The entropy H(X) of the set of objects before any tests are performed is defined to be
Since each term in the sum is > 0, it is clear that the entropy is zero if and only if the probability of some object is equal to 1.
There-fore, if the entropy is not zero after several tests have been applied it
is not possible that the object has been identified. Similarly, if the
average decrease in entropy produced from several tests is not at least equal to the initial entropy then it is not possible for the objects to be identified in every case.
The decrease in entropy from applying testY, is
1(X;Y ) = H(X) -H(xIy ) =
-V
p(>:,y,,•••,y^) log p(x)^'^1
\
X.Y-..,Y^
S
^ p^'^-'^'iV
^°2 p(^'Iyi)S-^ PCxIy )
- 2
Similarly, the decrease in entropy from applying tv.'O tests Y and Y is
H(X) -HCXJY.Y ) =
-Z2
p(x,y^,...,y^,) log p(x)1 J YY Y
"*'2_/ p(x,y ,...,y ) log p(x|y y )
2^
^ pU.yi>--.>yK)1^2-:^—
+
2.
. p(>-'»yi.---.yK>^^^-YuWir
= I(X;YJ + I(X;Y^|y^)
But I(X;Y.|y.) < I(X;Y.)
since
EpCy.)p(y.)
V7hich using the inequality logV7 < (u - 1) becomes
Y^
pCy^pCyJ
l(X;Y.|Y^) - l(X;Y.) < (
2.
p(x,y,....y,)(Note that equality occurs when pCy.jy.) = p(y.!)p(y.).)
We use the above vrell-knovm result to compute a lower bound on the cost of testing the objects. The result says that each test yields at most an
entropy equal to that Which results when it is applied first. We give the
tests the benefit of the doubt and compute these entropies. Using the costs of the tests vje then compute the cost per unit of entropy hopefully to be received from each test and rank the tests in the order of increasing
cost per unit. Finally, using this ranking, we form the lox-?est cost set of tests v.-'hich could possibly yield entropy equal to that of the set of objects. A fractional part of one test may be included in this set in order
to get the exact amount of entropy needed. VJe then include the sair.e fraction of its cost in the cost of this set, which is the lower bound in question.
Branch and Bound
ArmedV7ith this lower boundwe can employ a branch and bound algorithm
like that described by Reinwald and Soland (2). We form a tree of partial
decision trees. The root node of the tree of partial decision trees corresponds to no tests applied. Anode is a complete node of a decision
tree if no tests remain which have not been applied in reaching that node, if one object has probability one at that node, or if branches extend from
that node. Anode of the tree of partial decision trees is a terminal node if the decision tree at that node contains no incomplete nodes. The
ave-rage cost of the decision tree of each non-terminal node is bounded by the costs of the tests applied so far plus the lovjer bounds on the cost of
- 4
each of the sub-decision trees v^hich has xiotbeen completed.
When the cost of a terminal node is less.than the lower bounds on all thenon-terninal nodes, this terminal node represents the optimum tree.
When this is not the case the non-terminal nodeV7hich contains the partial decision tree V7ith the lovrest cost bound is extended by adding abranch foreach test not yet applied.
It is possible that some uncertainty vjill remain after all of the tests
are applied. In this case, more than one order of applying the tests vill yield the same results. This situation can be detected by evaluatlns the result of applying all of the tests in a given order. If the reinaining un-certainty is small it may be desirable to find the best tree which reduces
References
1. Gallager, H.G., "InformationTlieory and Reliable Communication", Wiley (1968).
2. Reinwaldj L_. and Soland R., "Conversion of Limited-Entry Decision
Tables to Opt' ;! Computer Programs I: MinimumAverage
ProcessingTj . JACM (July 1966)
.
2. Reinv7ald, L. and Soland R., "Conversion of Limited-Entr>' Decision
Tables to Optimum Computer Programs II: Minimum Storage
Date
Due
^2-?-3 TDfiO
0D3
fl753b3
5A^-3 TOflO
0D3 TOb
40fi
||||||||ll|||||||((||||||n HD28 M.I.T. AlfredP. Sloan lllllillllllllllllllll 5'z5,ff
Mklk
school of Management3T0flDD03TDb3fl5
Nos.523-71-Working Papers.Nos. 532-71 Nos. 523-71 to Nos. 532-71 3 TOflO
DD3
fl7S3T7
I 3 TOflDDD3 TDb 3T0
3 TDflD0D3 TOb 3bb
3 TDfiDD03
fl7S413
3 ^OfiDDD3 TDb
fi4b 3S060 003
TOb
ttbJ. 3 TDfiO003 TOb 671
1527-7^BASFMENF
^ J, SIGNthiscardandpresentitwithbook
atthe
CIRCULATION
DESK
5>9'7| ^M-71 531