Sequential Updating of the Partial Posterior

4 1D Models: Deformable Curves

Algorithm 4.1: Curve Detection with Dynamic Programming

4.3 Global Optimization on a Tree-Structured Prior

4.3.3 Sequential Updating of the Partial Posterior

Trying to directly find the maximum of the posterior in equation 4.15 is computa-tionally impossible for trees of depth on the order of several tens. The state space is huge and even evaluatingYtalong each segmentLtcorresponding to a nodetofT is nontrivial. The idea behind the algorithm described below is to evaluate a sequence of partial posteriors conditioning only on the data at a small subset of nodes ofT, as opposed to conditioning on all the data, and evaluating the probabilities of a limited number of coarse subsets of. Gradually, the size of the conditioning set increases and the subsets at which the posterior is evaluated become more refined.

At stepm, assume we already have chosen a subtreeTmrooted at the root node ofT, andmnodest₁, . . . ,tm∈Tmat whichYthas been computed, with valuesy₁, . . . ,ym. LetBmdenote the event{Yt₁=y1, . . . ,Yt_m =ym}. Eacht∈Tmdetermines a subset tof paths, and we calculate the partial posterior

πt⁽^m⁾=. P(θ ∈t |Yt_i =yi,i =1, . . . ,m) (4.23) for all t ∈Tm. Typically, there are more than m nodes in Tm, but the number of segments at which we actually observeYtis given bymand is increased by 1 at each step. The subtreeTm is sequentially updated, and typically the partial posterior in equation 4.23 becomes more and more peaked at some nodetof the current subtree.

There is a recursive method to calculateπt⁽^m⁺¹⁾in terms ofπt⁽^m⁾.

Moreover, as discussed in the previous section, the next segment to “query” (i.e., the choice oftm+1at which to observeYt_m+1) is chosen as the variable maximizing the mutual information withθ,given the data already observed—that is, given the event Bm. At some stage, the partial posterior is sufficiently peaked at, say,t^∗ ∈Tm, the true curve can be assumed to pass throught^∗(i.e.,θ ∈t^∗) and the search is reinitialized att^∗.

Choosing the Next Segment to Query

Assumem−1 segments have been queried, determining an event Bm−1, and that the current setTm from which to choosetm is already determined. We are seeking t∈Tmfor which the mutual information ofθandYtgivenBm−1is largest. Due to the conditional independence assumptions, the most informative node is identified solely in terms of the current partial posteriorπt⁽^m⁻¹⁾.

For 0≤π≤1 define the mixture distributionP_π =πP₁+(1−π)P₀and let

φ(π)=H(P_π)−πH(P1)−(1−π)H(P0) (4.24)

whereH(P)is the entropy of the distribution P. Because−xlogxis concave, the function function ofπ ∈[0,1] and has a unique maximum atπmax. It depends on the particular distributions P0,P1.The mutual information is maximized at a the node t ∈Tmfor whichπt⁽^m⁻¹⁾is closest toπmax.

This is seen as follows: The distribution ofYtgivenθdepends only on whether the curve defined byθ passes throught (θ ∈ t) and therefore the mutual information betweenYt andθis the same as the mutual information betweenYt and the indica-tor function 1_t(θ). Settingπt⁽^m⁻¹⁾=P(t| Bm−1)=P(1t(θ)=1|Bm−1), and lettingdenote mutual information andHthe entropy, we have

where the second equality follows from the conditional independence. The second term in 4.25 is rewritten as

73 4.3 Global Optimization on a Tree-Structured Prior

where again the second equality follows from the conditional independence ofYtand Bm−1given 1_t. Therefore the mutual information reduces to

(Yt, θ| Bm−1)=H

as defined in equation 4.24. We therefore need to pass through all the nodes in Tm

and settmto be the one withπt closest toπmax. Updating fromπ^(m_t ⁻¹⁾toπ^(m)_t

Once the next nodetm ∈Tm at which to observe the data is chosen, we update the partial posterior given the larger conditioning set Bm, for every node inTm. In the next section, we will see how the setTmis chosen.

Recall that the partial posterior givenBm−1(i.e.,πt⁽^m⁻¹⁾) has already been computed and stored for every node int ∈ Tm. Lett be aterminalnode ofTm, because all the nodest1, . . . ,tmare inTm, knowing that the curve passes throught—that is,θ∈t

completely determines whether or not each of the nodesti,i =1, . . . ,m is on the curve or not. ThusYtm is conditionally independent of Bm−1givent. We can then write

The first factor can be expressed as P(Ytm =ym|t)=

For everyinternalnodet of the treeTm, writeπt⁽^m⁾ =πt⁽₁^m⁾+πt⁽₂^m⁾+πt⁽₃^m⁾, where t1,t2,t3are the three child nodes oft, so thatπt⁽^m⁾can be recursively updated going from the terminal nodes upward.

Defining the Subtree Tm+1

The partial posteriorπt⁽^m⁾has been computed for allt ∈Tm. The setTm+1is obtained from the setTm, keeping in mind the fact that subsequently we will be looking for informative nodes—that is, nodes withπt⁽^m⁾close toπmax. First include all nodes in Tm. For any terminal nodet ∈ Tmsatisfyingπt > πmax, add the three children to Tm+1. For these children nodes write

πt⁽^m⁾=P(t |Bm)= 1

3P(t |Bm)=1

3πt⁽^m⁾ (4.29)

This is due to the uniform prior and the fact that we are conditioning on observed data at nodes inTm, so thatt =t1, . . . ,tm. On the other hand, ifπt⁽^m⁾< πmax there is no point in adding the children because their value would be further away from πmax. Even ifπt⁽^m⁾ > πmax, the value for the three children themselves will have to be less thanπmax again because of the factor of 1/3 and becauseπmax is typically between.4 and.6. Therefore this extension occursonly for one level—that is, all new nodes are children of terminal nodes ofTm. An important conclusion is thatthe most informative segment in the entire tree given the information in Bm has to be in the set Tm+1.

The procedure now repeats. The next querytm+1will be that segmentt inTm+1

for whichπt⁽^m⁾ is closest toπmax. The new probability πt⁽^m⁺¹⁾ is computed from πt⁽^m⁾for all elements of Tm+1andTm+1 is extended toTm+2. To start, we set T0as the root node and therefore necessarily T1 consists of the root node and its three children, one of which is picked at random ast1. At some stage, the partial posterior is sufficiently peaked at somet^∗∈Tm. We then assume the true curve passes through t^∗(i.e.,θ∈t^∗) and the search is reinitialized att^∗.

The entire algorithm is summarized as follows:

Dans le document 2D Object Detection and Recognition (Page 90-93)