Contributions of This Work to the Analysis of the Greedy Heuristic's Perfor- Perfor-mance

Previous results on the performance of the greedy heuristic of Cornuejols et al. [21] and Nemhauser et al. [52] do not allow its comparison to the Lin-Vitter approximation algo-rithm, as they do not consider a relaxation of the requirement on the desired size of the approximating set. Their analysis bounds only the ratio we denotedPk;k and not the more generalPj;k.

Yet another novelty of our work is the proof presented in this section. Historically, Cornuejols et al. derived a lower bound on the quality of a greedy approximation to the solution of an s-median problem, which was subsequently generalized to arbitrary concave and monotone games by Nemhauser et al. Our proof, by contrast, proceeds from the general to the specic.

2.4 Application to Memory-Based Learning

Having analyzed the performance of a greedy alternative to the approximation algorithm Lin and Vitter present for the s-median problem, this section compares the performance of the two approximation algorithms as tools in the construction of Voronoi Systems that model Lipschitz functions. It begins with a review of the learning algorithm. Then it compares the size of the Voronoi system required by the original algorithm of Lin and Vitter to that required by the greedy alternative analyzed in the previous section for the same user-specication of accuracy and condence. It concludes with a review of the proof that the Lin Vitter algorithm indeed works (with either approximation subroutine).

2.4.1 The Learning Algorithm

Lin and Vitter [42] propose to learn classes of uniformly Lipschitz bounded functions by Voronoi systems of polynomial size with respect to the the error measure

erP^X(f;g) =

E

^X[dY(g(x);f(x))]

= ^Z_XdY(g(x);f(x))dPX:

Let QP^x(X;;dX) denote the quantization number dened to be the smallest integer s such that there exists a Voronoi encoder of size s that satises

E

[dX(x;u⁽x⁾)]. The algorithm draws

m = (sdimXdiamY logslog diamY + diamY log1

) (2.3)

examples, wheres = QP^x(X;⁴_K;dX). It runs ans-median approximation algorithm on the sample that was drawn. The resulting median set is used to build a Voronoi system, which is output by the algorithm.

Section 2.4.3 reviews the proof that for any given; and any target function f in the class the algorithm outputs a Voronoi system which implements a function h for which with condence of at least 1^,

erP^X(f;h):

2.4.2 Comparing the Two

-Median Approximation Subroutines

The size of a Voronoi system produced by the Lin-Vitter approximation algorithm is (sKdiamY logm

): (2.4)

If a priori information on the distribution of the input points is available, a lower bound d ~D on ~D may hold almost everywhere, that is for all of the space except, possibly, for a set of measure zero. For example, for m input points drawn from the uniform distribution on a region of area A in the plane with probability one the value of a solution to the s-median problem is lower bounded by(m^,s)^q^A_s, for some constant, as shown by Fisher and Hochbaum [26]. Then the vanilla greedy algorithm may be used to produce a system of size (slog^Kdiam_d ^Y).

Since a condence parameter is inherent in the evaluation of the performance of PAC learning systems, the following simpler analysis suces for a better comparison of the two

approximation schemes in the context of learning. The size of ~D is lower bounded by the distance between the two nearest points in . For an ordered set of m points drawn independently with respect to PX let NX^m denote the distance between the rst point and its nearest neighbour in the set

NX^m = min^fdX(x¹;xi) :i = 2;3;:::;m^g:

Consider, for example the case of a region X ^<ⁿ where dX is dened to be an lP

norm ^jx^j= (^Pⁿ¹x^pi)¹^=p, and PX a bounded density such thatPX < P. Then Pr^fNX^m < ^gPm(2)ⁿ:

Now

Pr^f~D < ^g < mPr^f~D < ^{^} x¹ ²arg min

fxⁱ⁶⁼x^jgdX(xi;xj)^g

mPr^fNX^m < ^g

Pm²(2)ⁿ:

Thus, for a given > 0, with probability at least 1^,,

~D 1

Pm²)¹ⁿ:

Since any two norms ^j:^j¹;^j:^j² on ^<ⁿ are equivalent, that is a^jx^j¹ ^jx^j² b^jx^j¹ for some positive constants a;b [45], for any norm on^<ⁿ:

~D C( Pm²)ⁿ¹;

for someC > 0. For distribution-metric pairs,^hPX;dXⁱ, for which the bound

~D = (( m^k)ⁿ¹): (2.5)

holds with condence 1^,, that is for all but a share of ^hPX^m;d_mXⁱ, a greedily chosen 74

memory-based learning system of size

(s(log KdiamY

+ 1dimX logm^k

)) (2.6)

can meet prespecied accuracy and condence bounds given by parameters of and 1^,. To achieve this we choose m, as specied in (2.3), for a condence parameter of 1^,=2.

We also choose the size of the greedily selected approximating set specied in (2.6) with respect to condence 1^,=2. The asymptotic size of the memory-based learning system is then given by (2.6). This is a smaller system than that produced by the algorithm proposed by Lin and Vitter, the size of which is given by (2.4).

2.4.3 How to Prove That it Works

This section gives an outline of Lin and Vitter's correctness proof for the learning algorithm described in Section 2.4.1.

First we quote two denitions after Haussler [34, 35].

For r²^< let sign(r) = 1 i r > 0, and zero otherwise.

Denition 2.4.1

For A ^<^m say A is full if there exists an x²^<^m such that the set of sign vectors of the following sums is of the maximum size possible

jfhsign(xi+yi)ⁱmi⁼¹ :y²A^gj= 2^m:

Denition 2.4.2

Let ^F be a class of functions from a set X into ^<. For any sequence X = (x¹;:::;xm) of points in X^{, let} ^F(X) = ^f(f(x¹);:::;f(xm)) : f ² ^F^g: ^If ^F(X) is full we say that X is shattered by ^F. The pseudo-dimension of ^F denoted by

dim

^P^F^{, is}

the largest m such that there exists a sequence of m points in X that is shattered by ^F. If arbitrarily long sequences are shattered, then

dim

^P^F is innite.

If ^F is a class of ^f0, 1^g-valued functions then the denition of the pseudo-dimension is the same as that of theV C dimension. Haussler and Long [33] showed an upper bound on the sample complexity required to guarantee the uniform convergence with condence 1^,

of the empirical estimates of a given family of functions with a bounded pseudo-dimension.

Lin and Vitter show that the pseudo-dimension of Voronoi encoders of size at most s is O(dimX slog s). Note that an =K-good Voronoi encoder guarantees an -good Voronoi system, by the Lipschitz condition.

Choosing s = QP^x(X;⁴_K;dX) they assure that there exists an ⁴_K-good Voronoi encoder of size s. Then by drawing a sample of the size required by Haussler and Long they guarantee that with high condence the empirically-best Voronoi encoder of size s is ²_K accurate. Hence a solution to the s-median problem would produce an ²-good Voronoi system. Since a solution is generally^N^P-hard to nd output an approximation that yields an -good system.

2.5 Conclusion

One of the fundamental problems of AI is ltering out redundant information. Operations researchers have investigated this problem as modeled by a Coalitional Game. In this model a sucient condition was found for the existence of a uniform bound on the performance of the greedy approximation heuristic. The same condition on the game, monotonicity and concavity, implies a uniform bound even when approximate rather than precise values of coalitions are known. An s-median problem can be mapped to a game satisfying the condition. We use this to derive bounds on the quality of a greedy approximate solution to the s-median problem. We argue that in the context of memory-based learning of Lipschitz functions the greedy approximation algorithm is an attractive alternative to the approximation technique proposed by Lin and Vitter [42].

Further exploration of the greedy heuristic as well as other simple data processing tech-niques may contribute, we conjecture, to a better understanding of conscious intelligence.

Chapter 3

Dans le document On Consulting a Set of Experts and Searching (Page 72-77)