Solving the Quicksort Median-of- ree Recurrence with OGFs

As a detailed example of manipulating functional equations on generating functions, we revisit the recurrence given in§1.5 that describes the average number of comparisons taken by the median-of-three quicksort. is recur-rence would be diﬃcult to handle without generating functions:

C_N =N+ 1 + ^∑

1≤k≤N

(N−k)(k−1) (_N

) (C_k₋₁+C_N₋_k) forN >2 withC₀ = C₁ = C₂ = 0.We use N + 1 as the number of comparisons required to partitionN elements for convenience in the analysis. e actual cost depends on how the median is computed and other properties of the implementation, but it will be within a small additive constant ofN+1. Also, the initial conditionC₂= 0(and the impliedC₃= 4) is used for convenience in the analysis, though diﬀerent costs are likely in actual implementations.

As in§1.5, we can account for such details by taking linear combinations of the solution to this recurrence and other, similar, recurrences such as the one counting the number of partitioning stages (the same recurrence with cost1 instead ofN + 1).

We follow through the standard steps for solving recurrences with gen-erating functions. Multiplying by⁽^N₃⁾ and removing the symmetry in the sum, we have

(N 3

)

CN = (N + 1) (N

3 )

+ 2 ^∑

1≤k≤N

(N −k)(k−1)Ck−1.

en, multiplying both sides byz^N⁻³and summing onN eventually leads to the diﬀerential equation:

C^′′′(z) = 24

(1−z)⁵ + 12 C^′(z)

(1−z)². (5)

One cannot always hope to nd explicit solutions for high-order diﬀer-ential equations, but this one is in fact of a type that can be solved explicitly.

First, multiply both sides by(1−z)³to get

(1−z)³C^′′′(z) = 12(1−z)C^′(z) + 24

(1−z)². (6)

Now, in this equation the degree equals the order of each term. Such a differ-ential equation is known in the theory of ordinary differdiffer-ential equations as an Euler equation. We can decompose it by rewriting it in terms of an operator that both multiplies and differentiates. In this case, we de ne the operator

C(z)≡(1−z) d dzC(z), which allows us to rewrite (6) as

( + 1)( + 2)C(z) = 12 C(z) + 24 (1−z)².

Collecting all the terms involving into one polynomial and factoring, we have

( + 5)( −2)C(z) = 24 (1−z)².

e implication of this equation is that we can solve forC(z)by successively solving three rst-order diﬀerential equations:

U(z) = 24

(1−z)² or U^′(z) = 24 (1−z)³, ( + 5)T(z) =U(z) or T^′(z) =−5T(z)

1−z + U(z) 1−z, ( −2)C(z) =T(z) or C^′(z) = 2C(z)

1−z + T(z) 1−z.

Solving these rst-order diﬀerential equations exactly as for the simpler case that we solved to analyze regular quick sort, we arrive at the solution.

eorem 3.5 (Median-of-three Quicksort). e average number of com-parisons used by the median-of-three quicksort for a random permutation is given by

C_N = 12

7 (N+ 1)⁽H_N₊₁−23 14 )

forN ≥6.

Proof.Continuing the earlier discussion, we solve the diﬀerential equations to get the result

U(z) = 12

(1−z)² −12;

T(z) = 12 7

(1−z)² − 12 5 +24

35(1−z)⁵; C(z) = 12

7 1

(1−z)²ln 1

1−z −54 49

(1−z)² +6 5 − 24

245(1−z)⁵. Expanding this expression forC(z) (and ignoring the last term) gives the result (see the exercises in§3.1). e leading term in the OGF diﬀers from the OGF for standard quicksort only by a constant factor.

We can translate the decomposition intoU(z)andT(z)into recurrences on the corresponding sequences. Consider the generating functionsU(z) =

∑UNz^N andT(z) = ^∑TNz^N. In this case, manipulations on generating functions do correspond to manipulations on recurrences, but the tools used are more generally applicable and somewhat easier to discover and apply than would be a direct solution of the recurrence. Furthermore, the solution with generating functions can be used in the situation when a larger sample is used.

Further details may be found in [9] or [14].

Besides serving as a practical example of the use of generating func-tions, this rather detailed example illustrates how precise mathematical state-ments about performance characteristics of interest can be used to help choose proper values for controlling parameters of algorithms (in this case, the size of the sample). For instance, the above analysis shows that we save about 14% of the cost of comparisons by using the median-of-three variant for quicksort, and a more detailed analysis, taking into account the extra costs (primarily, the extra exchanges required because the partitioning element is nearer the middle), shows that bigger samples lead to marginal further improvements.

Exercise 3.49Show that(1−z)^tC^(t)(z) = Ψ(Ψ + 1). . .(Ψ +t+ 1)C(z). Exercise 3.50Find the average number of exchanges used by median-of-three quick-sort.

Exercise 3.51Find the number of comparisons and exchanges used, on the average, by quicksort when modi ed to use the median of veelements for partitioning.

Exercise 3.52[Euler] Discuss the solution of the diﬀerential equation

∑

0≤j≤r

(1−z)^r⁻^j d^j

dz^jf(z) = 0

and the inhomogeneous version where the right-hand side is of the form(1−z)^α. Exercise 3.53 [van Emden, cf. Knuth] Show that, when the median of a sample of 2t+ 1elements is used for partitioning, the number of comparisons used by quicksort

is 1

H2t+2−Ht+1

NlnN+O(N).

3.8 Counting with Generating Functions.So far, we have concentrated on describing generating functions as analytic tools for solving recurrence re-lationships. is is only part of their signi cance—they also provide a way to count combinatorial objects systematically. e “combinatorial objects” may be data structures being operated upon by algorithms, so this process plays a fundamental role in the analysis of algorithms as well.

Our rst example is a classical combinatorial problem that also corre-sponds to a fundamental data structure that will be considered in Chapter 6 and in several other places in the book. Abinary treeis a structure de ned recursively to be either a singleexternal nodeor aninternal nodethat is con-nected to two binary trees, aleft subtreeand aright subtree. Figure 3.1 shows the binary trees with ve or fewer nodes. Binary trees appear in many prob-lems in combinatorics and the analysis of algorithms: for example, if internal nodes correspond to two-argument arithmetic operators and external nodes correspond to variables, then binary trees correspond to arithmetic expres-sions. e question at hand is, how many binary trees are there withN ex-ternal nodes?

Counting binary trees. One way to proceed is to de ne a recurrence. Let T_N be the number of binary trees withN+1external nodes. From Figure 3.1 we know thatT₀ = 1,T₁ = 1,T₂ = 2,T₃ = 5, andT₄ = 14. Now, we can derive a recurrence from the recursive de nition: if the left subtree in a binary tree withN+ 1external nodes haskexternal nodes (there areT_k₋₁diﬀerent such trees), then the right subtree must haveN−k+ 1external nodes (there

T₂= 2 T₁= 1

T₀= 1

T₃= 5

T₄= 14

Figure 3.1 All binary trees with 1, 2, 3, 4, and 5 external nodes areT_N₋_kpossibilities), soTN must satisfy

TN = ^∑

1≤k≤N

Tk−1TN−k forN >0withT₀ = 1.

is is a simple convolution: multiplying byz^N and summing onN, we nd that the corresponding OGF must satisfy the nonlinear functional equation

T(z) =zT(z)²+ 1.

is formula forT(z)is easily solved with the quadratic equation:

zT(z) = 1 2(1±√

1−4z).

To get equality whenz= 0, we take the solution with a minus sign.

eorem 3.6 (OGF for binary trees). e number of binary trees withN+ 1external nodes is given by the Catalan numbers:

T_N = [z^N+1]1−√

Proof. e explicit representation of the OGF was derived earlier. To extract coeﬃcients, use the binomial theorem with exponent1/2(Newton’s formula):

zT(z) =−1

As we will see in Chapter 6, every binary tree has exactly one more external node than internal node, so the Catalan numbersT_N also count the binary trees withN internal nodes. In the next chapter, we will see that the approximate value isTN ≈4^N/N√

πN.

Counting binary trees (direct). ere is a simpler way to determine the explicit expression for the generating function above, which gives more in-sight into the intrinsic utility of generating functions for counting. We de ne T to be the set of all binary trees, and adopt the notation|t|to represent, for

t∈ T, the number of internal nodes int. en we have the following

e rst line is an alternative way to express T(z)from its de nition. Each tree with exactlykexternal nodes contributes exactly 1to the coeﬃcient of z^k, so the coeﬃcient ofz^k in the sum “counts” the number of trees withk internal nodes. e second line follows from the recursive de nition of binary trees: either a binary tree has no internal nodes (which accounts for the1), or it can be decomposed into two independent binary trees whose internal nodes comprise the internal nodes of the original tree, plus one for the root.

e third line follows because the index variablest_Landt_Rare independent.

Readers are advised to study this fundamental example carefully—we will be seeing many other similar examples throughout the book.

Exercise 3.54Modify the above derivation to derive directly the generating function for the number of binary trees withN externalnodes.

Changing a dollar (Polya). A classical example of counting with generating functions, due to Polya, is to answer the following question: “How many ways are there to change a dollar, using pennies, nickels, dimes, quarters, and fty-cent coins?” Arguing as in the direct counting method for binary trees, we

nd that the generating function is given by D(z) = ^∑

p,n,d,q,f≥0

zp+5n+10d+25q+50f

e indices of summationp,n,d, and so on, are the number of pennies, nick-els, dimes, and other coins used. Each con guration of coins that adds up to kcents clearly contributes exactly 1 to the coeﬃcient ofz^k, so this is the de-sired generating function. But the indices of summation are all independent in this expression forD(z), so we have

By setting up the corresponding recurrence, or by using a computer algebra system, we nd that[z¹⁰⁰]D(z) = 292.

Exercise 3.55Discuss the form of an expression for[z^N]D(z).

Exercise 3.56 Write an eﬃcient computer program that can compute [z^N]D(z), givenN.

Exercise 3.57 Show that the generating function for the number of ways to express Nas a linear combination (with integer coeﬃcients) of powers of 2 is

∏

k≥1

1 1−z²^k. Exercise 3.58[Euler] Show that

1−z = (1 +z)(1 +z²)(1 +z⁴)(1 +z⁸)· · ·.

Give a closed form for the product of the rsttfactors. is identity is sometimes called the “computer scientist’s identity.” Why?

Exercise 3.59Generalize the previous exercise to base 3.

Exercise 3.60Express[z^N](1−z)(1−z²)(1−z⁴)(1−z⁸)· · ·in terms of the binary representation ofN.

Binomial distribution. How many binary sequences of lengthN have ex-actlykbits that are 1 (andN−kbits that are 0)? LetBN denote the set of all binary sequences of lengthN andBN kdenote the set of all binary sequences of lengthN with the property thatkof the bits are 1. Now we consider the generating function for the quantity sought:

B_N(z) =^∑

|BN k|z^k.

But we can note that each binary stringbinBN with exactlyk1s contributes exactly 1 to the coeﬃcient ofz^k and rewrite the generating function so that it “counts” each string:

B_N(z)≡ ^∑

b∈BN

z^{# of 1 bits inb}= ^∑

b∈BN k

z^k (

=^∑

|BN k|z^k )

Now the set of all strings ofN bits withk1s can be formed by taking the union of the set of all strings withN −1 bits and k1s (adding a 0 to the beginning of each string) and the set of all strings withN−1bits andk−1 1s (adding a 1 to the beginning of each string). erefore,

BN(z) = ^∑

b∈B(N−1)k

z^k+ ^∑

b∈B(N−1)(k−1)

z^k

=B_N₋₁(z) +zB_N₋₁(z)

soBN(z) = (1 +z)^N. Expanding this function with the binomial theorem yields the expected answer|BN k|=⁽^N_k⁾.

To summarize informally, we can use the following method to “count”

with generating functions:

• Write down a general expression for the GF involving a sum indexed over the combinatorial objects to be counted.

• Decompose the sum in a manner corresponding to the structure of the objects, to derive an explicit formula for the GF.

• Express the GF as a power series to get expressions for the coeﬃcients.

As we saw when introducing generating functions for the problem of counting binary trees at the beginning of the previous section, an alternative approach is to use the objects’ structure to derive a recurrence, then use GFs to solve the recurrence. For simple examples, there is little reason to choose one method over the other, but for more complicated problems, the direct method just sketched can avoid the tedious calculations that sometimes arise with recur-rences. In Chapter 5, we will consider a powerful general approach based on this idea, and we will see many applications later in the book.

3.9 Probability Generating Functions. An application of generating functions that is directly related to the analysis of algorithms is their use for manipulating probabilities, to simplify the calculation of averages and vari-ances.

De nition Given a random variableXthat takes on only nonnegative inte-ger values, withp_k≡Pr{X =k}, the functionP(u) =^∑_k_≥₀p_ku^kis called the probability generating function (PGF) for the random variable.

We have been assuming basic familiarity with computing averages and standard deviations for random variables in the discussion in§1.7 and in the examples of average-case analysis of algorithms that we have examined, but we review the de nitions here because we will be doing related calculations in this and the next section.

De nition eexpected valueofX, orE(X), also known as themean value of X, is de ned to be ^∑_k_≥₀kpk. In terms of rk ≡ Pr{X ≤ k}, this is equivalent to E(X) = ^∑k≥0(1 −r_k). e variance of X, or var(X), is de ned to be^∑_k_≥0(k−E(X))²p_k. e standard deviation ofXis de ned to be^√var(X).

Probability generating functions are important because they can pro-vide a way to nd the average and the variance without tedious calculations involving discrete sums.

eorem 3.7 (Mean and variance from PGFs). Given a PGFP(z) for a random variableX, the expected value ofXis given byP^′(1)with variance P^′′(1) +P^′(1)−P^′(1)².

Proof.Ifpk ≡Pr{X=k}, then P^′(1) = ^∑

k≥0

kp_ku^k⁻¹|u=1 =^∑

k≥0

kp_k,

the expected value, by de nition. Similarly, noting thatP(1) = 1, the stated result for the variance follows directly from the de nition:

∑

k≥0

(k−P^′(1))²p_k=^∑

k≥0

k²p_k−2^∑

k≥0

kP^′(1)p_k+^∑

k≥0

P^′(1)²p_k

=^∑

k≥0

k²p_k−P^′(1)² =P^′′(1) +P^′(1)−P^′(1)².

e quantityE(X^r) =^∑kk^rp_kis known as therth momentofX. e expected value is the rst moment and the variance is the diﬀerence between the second moment and the square of the rst.

Composition rules such as the theorems that we will consider in§5.2 and

§5.3 for enumeration through the symbolic method translate into statements about combining PGFs for independent random variables. For example, ifP(u), Q(u) are probability generating functions for independent random variablesXandY, thenP(u)Q(u)is the probability generating function for X+Y. Moreover, the average and variance of the distribution represented by the product of two probability generating functions is the sum of the in-dividual averages and variances.

Exercise 3.61Give a simple expression for var(X)in terms ofrk=Pr{X ≤k}.

Exercise 3.62 De ne mean(P)≡P^′(1)and var(P)≡ P^′′(1) +P^′(1)−P^′(1)². Prove that mean(P Q) =mean(P) +mean(Q)and var(P Q) =var(P) +var(Q)for any diﬀerentiable functionsPandQwithP(1) =Q(1) = 1, not just PGFs.

Uniform discrete distribution. Given an integern > 0, suppose thatX_nis a random variable that is equally likely to take on each of the integer values 0,1,2, . . . , n−1. en the probability generating function forX_nis

P_n(u) = 1 n+ 1

nu+ 1

nu²+· · ·+ 1 nuⁿ⁻¹, the expected value is

P_n^′(1) = 1

n(1 + 2 +· · ·+ (n−1)) = n−1 2 , and, since

P_n^′′(1) = 1

n(1·2 + 2·3 +· · ·+ (n−2)(n−1)) = 1

6(n−2)(n−1), the variance is

P_n^′′(1) +P_n^′(1)−P_n^′(1)² = n²−1 12 . Exercise 3.63Verify the above results from the closed form

Pn(u) = 1−uⁿ n(1−u), using l’H^opital’s rule to compute the derivatives at 1.

Exercise 3.64 Find the PGF for the random variable that counts the number of leading 0s in a random binary string, and use the PGF to nd the mean and standard deviation.

Binomial distribution. Consider a random string ofN independent bits, where each bit is 0 with probabilitypand 1 with probabilityq = 1−p. We can argue that the probability that exactlykof theN bits are 0 is

(N k

)

p^kq^N⁻^k,

so the corresponding PGF is P_N(u) = ^∑

0≤k≤N

(N k

)

p^kq^N⁻^ku^k= (pu+q)^N.

Alternatively, we could observe that PGF for 0s in a single bit is(pu+q)and theNbits are independent, so the PGF for the number of 0s in theN bits is (pu+q)^N. Now, the average number of 0s isP^′(1) =pN and the variance is P^′′(1)+P^′(1)−P^′(1)²=pqN, and so forth. We can make these calculations easily without ever explicitly determining individual probabilities.

One cannot expect to be so fortunate as to regularly encounter a full decomposition into independent PGFs in this way. In the binomial distribu-tion, the count of the number of structures2^N trivially factors intoN simple factors, and, since this quantity appears as the denominator in calculating the average, it is not surprising that the numerator decomposes as well. Con-versely, if the count does not factor in this way, as for example in the case of the Catalan numbers, then we might not expect to nd easy independence arguments like these. For this reason, as described in the next section, we em-phasize the use of cumulative and bivariate generating functions, not PGFs, in the analysis of algorithms.

Quicksort distribution. LetQ_N(u) be the PGF for the number of com-parisons used by quicksort. We can apply the composition rules for PGFs to show that function to satisfy the functional equation

Q_N(u) = 1 N

∑

1≤k≤N

u^N⁺¹Q_k₋₁(u)Q_N₋_k(u).

ough using this equation to nd an explicit expression forQ_N(u)appears to be quite diﬃcult, it does provide a basis for calculation of the moments.

For example, diﬀerentiating and evaluating at u = 1 leads directly to the standard quicksort recurrence that we addressed in§3.3. Note that the PGF corresponds to a sequence indexed by the number of comparisons; the OGF that we used to solve (1) in§3.3 is indexed by the number of elements in the le. In the next section we will see how to treat both with just one double generating function.

ough it would seem that probability generating functions are natural tools for the average-case analysis of algorithms (and they are), we generally give this point of view less emphasis than the approach of analyzing para-meters of combinatorial structures, for reasons that will become more clear in the next section. When dealing with discrete structures, the two approaches are formally related if not equivalent, but counting is more natural and allows for more exible manipulations.

3.10 Bivariate Generating Functions. In the analysis of algorithms, we are normally interested not just in counting structures of a given size, but also in knowing values of various parameters relating to the structures.

We usebivariategenerating functions for this purpose. ese are func-tions of two variables that represent doubly indexed sequences: one index for the problem size, and one index for the value of the parameter being analyzed.

Bivariate generating functions allow us to capture both indices with just one generating function, of two variables.

De nition Given a doubly indexed sequence{a_nk}, the function A(z, u) = ^∑

n≥0

∑

k≥0

a_nkzⁿu^k

is called the bivariate generating function (BGF) of the sequence. We use the notation[zⁿu^k]A(z, u)to refer toank;[zⁿ]A(z, u)to refer to^∑_k_≥₀anku^k; and[u^k]A(z, u)to refer to^∑_n_≥₀ankzⁿ.

As appropriate, a BGF may need to be made “exponential” by dividing byn!. us the exponential BGF of{ank}is

A(z, u) = ^∑

n≥0

∑

k≥0

a_nkzⁿ n!u^k.

Most often, we use BGFs to count parameter values in combinatorial structures as follows. Forp ∈ P, whereP is a class of combinatorial struc-tures, let cost(p)be a function that gives the value of some parameter de ned for each structure. en our interest is in the BGF

P(z, u) = ^∑

wherep_nkis the number of structures of sizenand costk. We also write P(z, u) = ^∑

n≥0

pn(u)zⁿ where pn(u) = [zⁿ]A(z, u) =^∑

k≥0

pnku^k

to separate out all the costs for the structures of sizen, and P(z, u) =^∑

k≥0

q_k(z)u^k where q_k(z) = [u^k]P(z, u) = ^∑

n≥0

p_nkzⁿ

to separate out all the structures of costk. Also, note that P(z,1) = ^∑

is the ordinary generating function that enumeratesP.

Of primary interest is the fact that pn(u)/pn(1) is the PGF for the random variable representing the cost, if all structures of sizenare taken as equally likely. us, knowingpn(u)andpn(1)allows us to compute average cost and other moments, as described in the previous section. BGFs provide a convenient framework for such computations, based on counting and analysis of cost parameters for combinatorial structures.

Binomial distribution. LetBbe the set of all binary strings, and consider the “cost” function for a binary string to be the number of 1 bits. In this case, {a_nk}is the number ofn-bit binary strings withk1s, so the associated BGF is

BGF expansions. Separating out the structures of sizenas[zⁿ]P(z, u) = pn(u)is often called the “horizontal” expansion of the BGF. is comes from the natural representation of the full BGF expansion as a two-dimensional table, with powers ofuincreasing in the horizontal direction and powers of zincreasing in the vertical direction. For example, the BGF for the binomial distribution may be written as follows:

z⁰(u⁰)+

z¹(u⁰+u¹)+

z²(u⁰+ 2u¹+u²)+

z³(u⁰+ 3u¹+ 3u²+u³)+

z⁴(u⁰+ 4u¹+ 6u²+ 4u³+u⁴)+

z⁵(u⁰+ 5u¹+ 10u²+ 10u³+ 5u⁴+u⁵) +. . . . .

Or, proceeding vertically through such a table, we can collect[u^k]P(z, u) = q_k(z). For the binomial distribution, this gives

u⁰(z⁰+z¹+z²+z³+z⁴+z⁵+. . .)+

u¹(z¹+ 2z²+ 3z³+ 4z⁴+ 5z⁵+. . .)+

u²(z²+ 3z³+ 6z⁴+ 10z⁵. . .)+

u³(z³+ 4z⁴+ 10z⁵+. . .)+

u⁴(z⁴+ 5z⁵+. . .)+

u⁵(z⁵+. . .) +. . . ,

the so-called vertical expansion of the BGF. As we will see, these alternate

Dans le document AN INTRODUCTION TO THE ANALYSIS OF ALGORITHMS Second Edition (Page 139-200)