The Index Functions in Practice - The Basis of Exploratory Projection Pursuit

Exploratory Projection Pursuit

4.2 The Basis of Exploratory Projection Pursuit

4.2.3 The Index Functions in Practice

For our explanations we will use two three dimensional artificial datasets. The first dataset (CYL-data) contains 100 random points located on the surface of a cylinder with both length and radius equal 1.

The second dataset (TET-data) consists of 250 points located on the sup-porting hyperplanes of a tetrahedron given by its facets. Three hyperplanes contain 50 normally distributed random points each while the fourth hyper-plane accommodates 100 normally distributed random points.

In case of the CYL-data we find a situation with one striking two dimensional projection while in case of the TET-data there are six interesting views de-termined by the pairs of the supporting hyperplanes. The latter dataset will be used to demonstrate interesting effects concerning the choice of band-widths and polynomial orders. The distinguished projections of the datasets are displayed in Figure (4.4) and Figure (4.5). The lower row of Figure (4.5) contains the projection determined by the intersection of the basis plane with one of the others. The upper row contains projections determined by pairs of the first three planes, i.e. the structure displayed contains 100 points in case of the upper row and 150 in case of the lower.

For every two dimensional projection of a three dimensional data set, the projection plane can be identified uniquely by the normal vector. The normal vector can be expressed in terms of two angles If' and f). The interesting projections correspond to the following values of If' and f):

If'

=

^-1.27,^f)

=

^0.55

If'

=

^-0.79,^f)

=

^-1.35 ^If'If' ⁼

=

^-1.2,0.53, f) ^f)

=

⁼-0.49 ^-0.13

If' = -0.56, f) = 0.22 If' = 0.77, f) = 0.53 The uniform kernel leads to noncontinuous piecewise constant indices while triangle and epanechnikov kernels provide continuity but no differentiability.

Because of the maximization of the index function involved in the projection pursuit approach the uniform kernel is not helpful although being easy to compute, while triangle and epanechnikov kernels do not provide the wanted qualitative properties of the indices.

We restricted ourselves to the triweight kernel which secures the existence of second derivatives of the indices and which is much easier to compute than the alternative gaussian kernel. The kernel has limited support, a circle with a radius equal to the selected bandwidth.

The essential question in exploratory projection pursuit seems to select an appropriate bandwidth. For a small bandwidth we would expect the index function to have numerous local maxima corresponding to small clusters of observations in the projection plane.

110 Exploratory Projection Pursuit

FIGURE 4.4. Optimal projection of CYL-data

. .

FIGURE 4.5. Optimal projection of TET-data

If the bandwidth increases the number of maxima will decrease as the local maxima will melt together. Further increase of the bandwidth will lead the index function to show just a few, maybe only one local maximum.

Finding one global maximum is a complicated task in case of many local

Exploratory Projection Pursuit 111 maxima. In exploratory projection pursuit the aim usually is not to pick up the global maximum but to identify a set of distinct local maxima. Projections corresponding to a local maximum provide an "interesting" view to the data, too.

Silverman (1986) gives the following formulas for an optimal bandwidth min-imizing the asymptotic mean integrated squared error (AMISE). If the kernel is a radially symmetric probability density function and the unknown density is bounded and has continuous second derivatives, the AMISE for a given h results in

~

}(2(t)dt ²

AMISE

⁼ ^IRdnhd

+ ~h4 (fIR

t~}(t)dt») (f~

tr(V2/)2(t)dt) which leads to

htf:,4 = dn-¹

(f ^~

^}(2(t)dt)

(fIR

^t~

^}(t)dt»)^-2

(fIR

^dtr(\72 f)2(t)dt) -1 ,

(see Scott (1992),6.48) where d stands for the dimensionality of the projected data. Analog to the "rule-of-thumb" we can plug in the multivariate normal density instead of

I,

which simplifies the last term to

f~

tr(V2¢)2(t)dt = (41r)d/2(d/2

+

d2/4).

Plugging-in the data of our example, the "rule-of-thumb" reference band-width becomes:

h~"tt4

⁼

~R(}() (fIR

t~}(t)dt)

^-2^(41r)-d/2

d(d~2)'

^(4.4)

and plugging in the triweight kernel

hrot

=

3.12n-1/ 6

which leads to a "rule-of-thumb" bandwidth hrot ~ 1.45 in case of the CYL-data and to hrot ~ 1.24 in case of the tetrahedron data.

The polynomial based indices under consideration measure differences be-tween the underlying density of the projected data and a standard normal density. In order to detect differences which are not caused by location and covariance structure the data should be standardized in a first step.

Even small deviations from a standardized situation lead to strong changes of the index functions. The effect can be regarded as a mapping of information by location and covariance effects.

112 Exploratory Projection Pursuit

The problem of bandwidth selection for kernel indices relates with the prob-lem of specification of an appropriate order J of the orthogonal series approx-imations in case of the polynomial indices. Selection of a small order usually corresponds to a consideration of global effects, while a high order will allow a more local modeling of the structure contained in the data. The situation is similar to case of kernel indices in that sense that a high order J will cause numerous local maxima of the index function while a small J will correspond to only a few but dim maxima.

A very unpleasant effect occurs when using the Legendre index which turns out to be not invariant with respect to rotation inside the projection plane.

This effect is avoided by use of rotation symmetric kernels in case of kernel indices.

Figures 4.6 - 4.10 illustrate the behaviour ofthe index functions for the CYt-Data. The figures show contour plots of the index functions computed on a net of 53 x 53 points for ('P, 19). In case of the polynomial indices the maximum was used with respect to rotation inside the projection plane. The levels used correspond to the 0.5,0.67,0.8,0.9,0.95,0.975,0.99 and 1-quantiles of index values on the net. Figures 4.11 - 4.15 show analogous contour plots for the tetrahedron data.

Figures 4.6 and 4.7 illustrate the expected behaviour of the kernel based indices. Small bandwidths h lead to a huge number of local maxima corre-sponding to occasional clusters of observations in meaningless projections.

Because of the outstanding structure most of the local maxima are small compared with the global maximum but nevertheless they cause tremendous problems in numerical maximization of the index functions.

Bandwidths in the magnitude of the "rule-of-thumb" bandwidth behave well in this example while large bandwidths lead to oversmoothing effects blurring the structure for h = 1.6 and hiding the structure completely for h = 3.2.

Figures 4.8, 4.9 and 4.10 show the corresponding plots for the polynomial based indices. The Legendre index and, to some extent, the Hermite index show evident problems when handling even this clear structure in case of a small order J.

The Natural-Hermite index behaves much better in this situation as expected in Cook et al. (1993). Even for larger values of J the estimated criteria are sufficiently smooth providing only a small number of local minima.

The situation is much more complicated for the second dataset. Figures 4.11 and 4.12 show contourplots of the Friedman-Tukey and the entropy index.

The projections shown in Figure 4.5 are marked by a star. In case of the smallest bandwidth h = 0.1 we get local maxima corresponding to all inter-esting projections, although for this bandwidth there are 141 local maxima on the net in case of the Friedman-Tukey index and 93 local maxima in case

Exploratory Projection Pursuit 113 of the Entropy index.

Increasing the bandwidth leads to smearing effects still presenting inter-esting projections of the basis plane though melting together the maxima corresponding to the wanted projections. The "rule-of-thumb" bandwidth is clearly too large to show the underlying structure of a tetrahedron. The maxima corresponding to the projections in the upper row of Figure 4.5 are smoothed away resulting in a local maximum without much information while the basis plane still can be identified. The larger bandwidths h = 1.6 and h = 3.2 lead to strong oversmoothing effects hiding the structure com-pletely.

The corresponding results for the polynomial indices are displayed in Figure 4.13, 4.14 and 4.15. In case of small order J ~ 3 all indices fail completely to find the structure. For J

>

3 the basis plane is identified. In case of the Legendre index J

=

10 gives sufficient information about the structure while the Hermite and Natural-Hermite index require a further increase of J. It can be observed that for a medium J a sufficient number of local maxima exists but global information considered in the polynomial approximation leads to a smoothing sufficient enough to hide the structure. A substantial increase of the order J of the approximation leads to longer computing times than needed as in case of the kernel based indices.

The optimal bandwidth h or order of approximation J seems to depend strongly on the underlying structure of the data. We would suggest to use smaller bandwidths and larger values of J rather than "optimal" bandwidths or orders because of hiding effects in case of complicated structures. Using a small h or large J will lead to a considerable amount of local maxima especially in case of the kernel based indices. This will cause a lot of numer-ical complications, requiring stochastic or deterministic search algorithms combined with a numerical maximization in order to improve accuracy. For simple structures it makes sense to follow an idea of Hall (1989a) to start with larger bandwidths or lower order of approximation in a initial step to identify the maxima and then to decrease the bandwidth or to increase the order for a numerical maximization step, although some interesting projec-tions might be lost in case of complicated structures. Usually it is interesting to determine all local maxima exceeding a certain level of the criteria. The projections corresponding to these maxima have to be inspected visually to evaluate the information contained. This indicates that a stochastic search algorithm concentrating on areas of this high index values could be used.

4.2.4

The "MSE-FT"-optimal Bandwidth

As another choice for a reference bandwidth we can use a bandwidth which is computed from the minimization of the approximated mean integrated

114 Exploratory Projection Pursuit

..

0 • ....

• =

~ _H _'8

"

"f ^H ^:i

I I

~ ~

,l( If! _I If! _I

J I

... .. ..

.. .. ..

N ID

0 _•

...

_•

= =

i

"f ^H ^"lE

I I

I

If! _I _I

I I

... .. ..

.. .. ..

S'O- 0'1-

S'1-;:;

0 • 0 •

= =

i

_0\ i

~f ^H "f

I I> .. I

~ If! _I .'~~

. ), ... J ..

^,l(If! ^~^I

I ^... ^.. ^.. ^.

^Oil

rJJ ^.. ^I

^.. ^.. ^..

FIGURE 4.6. CYL-Data Friedman-Tukey index for different bandwidths

Exploratory Projection Pursuit 115

FIGURE 4.7. CYL-Data entropy index for different bandwidths

116 Exploratory Projection Pursuit

FIGURE 4.8. CYL-Data Legendre index for different order J

M •

Exploratory Projection Pursuit 117

.... o _~•

FIGURE 4.9. CYL-Data Hermite index for different order J

118 Exploratory Projection Pursuit .,-FIGURE 4.10. CYL-Data Natural-Hermite index for different order J

Exploratory Projection Pursuit 119

..

^;;;

0 .;

" "

5 5

i

.... _:f ^~.... ^><

I I

~

...

i'!

~

I I

J J

"

... ..

_... .

.. ..

_...

n ,., ,.,

FIGURE 4.11. TET-Data Friedman-Tukey index for different band-widths

120 Exploratory Projection Pursuit

.. .. ~

...

e ^,;

j

I

^:i

fA

I I

f

^~^If

.. ..

Ii Ii

...

_'"' _'"' _" _'"'

..

"t 01 S'O 0" ,'0- o't-

'Ii'I-FIGURE 4.12. TET-Data entropy index for different bandwidths

Exploratory Projection Pursuit 121

FIGURE 4.13. TET-Data Legendre index for different order J

122 Exploratory Projection Pursuit

5'0

--

"0 5'0- 0"-

5'1-S', 0'1 5'0 O'D 5'0- 0"-

S',-51 0'1 S'O 00 S'O- 0"- " '

-I

~

so 0" 51'0- 0'1-

S',-FIGURE 4.14. TET-Data Hermite index for different order J

Exploratory Projection Pursuit 123

N ;;;

" ^H

~ ~

II II

~ ... _I 'B ... ^II_I

.E

^II a

.,

·S ·S

:I!

~

^:.:^"

...

_II

.. ...

_II

..

" "

., .,

:: ::

n '" '" ^,., ^~'O' ^0"1- ^'i-t- ,', '" ^,., '" ^S'O- ^0-1-

S-l-;::;

©I

^:;

" ^H

~ ~

II II

~ ^c ~

...

_I ... _I

II .2 ^II

.E

., .,

... e ·S

~

:I! ^:.:

"

...

_II

.. ...

_II

.. ~

" "

., .,

:: ::

'" '" '" '" ^S'o- ,', _'" '" '"

S-D-FIGURE 4.15. TET-Data Natural-Hermite index for different order J

124 Exploratory Projection Pursuit

squared error. We will see that we can derive a smaller reference bandwidth.

It holds for the Friedman-Tukey index

M SE(iFT ) =

+ +

+

C3h2

+

O(h4, h⁴n-1 , h⁴n- 2, ... ) with constants Co

=

^0(n-^{2 ),}C1

=

^0(n-^{2 ),}C2

=

^{0(1) and C}³

=

0(1) which can be found in Klinke & Cook (1995). To get an estimate for the "MSE-FT"-optimal bandwidth we drop the 0(h⁴)-term, and to find the minimum for the approximated MSE (AMSE) we compute the derivative

dAM SE(iFT) _ -4Co -2C1 20 h - 0

d h - h⁵

+

-and multiply with h⁵

By replacing h²by 9 we get a reduced polynomial of degree 3, which can be solved by the formula of Cardano (Gellert, Kiistner, Hellwich & Kastner 1977). The discriminant is always positive and converges to zero (n -+ 00).

The positive discriminant allows only one unique solution of the equation.

We can compute a minimum by plugging in the triweight kernel and the standard normal gaussian density for the unknown density. As result we get Table 4.4.

The reference bandwidth for the CYL-data is hm6e ~ 0.83 and for the TET-data hm • e ~ 0.61. As we see in the Figures 4.6, 4.7, 4.11 and 4.12 the "MSE-FT" -optimal bandwidth is still to large.

In principle the same could be done with the entropy index. But the compara-ble calculations would be much more complicated. We would have to replace the log-function by its Taylor-expansion. Obviously a linear approximation is not sufficient if n (or h) varies. This can be seen by the following inequality

Exploratory Projection Pursuit 125

The projection pursuit indices are involving, as a main task, the estimation of a density. With the kernel based indices this will be done with kernel density estimates.

One technique to improve the computational speed of kernel density estimates is binning. If the dimension of the data grows, however, this technique looses more and more of its advantage.

Another idea is to replace float-operations in the density estimation by integer-operations, which are faster. This can be seen in the comparison of Table 4.5 and Table 4.6.

Operation

16 bit integer - addition 16 bit integer - subtraction 16 bit integer - multiplication 16 bit integer - division

TABLE 4.5. Execution time in cycles for different mathematical opera-tions in the 80386

The computational speed of the kernels in Table 4.3 differ a lot. For all the

Dans le document Contributions Statistics (Page 115-131)

The Index Functions in Practice

Exploratory Projection Pursuit

4.2 The Basis of Exploratory Projection Pursuit

4.2.3 The Index Functions in Practice

=

=

=

=

=

=

. .

~

AMISE

+ ~h4 (fIR

t~}(t)dt») (f~

(f ~

(fIR

t~

(fIR

I,

f~

+

h~"tt4

~R(}() (fIR

t~}(t)dt)

d(d~2)'

=

>

=

4.2.4

..

•

=

=

"

J I

... .. ..

.. .. ..

...

= =

i

i

I

I I

... .. ..

.. .. ..

= =

i

0\ i

. ), ... J ..

I ... .. .. .

rJJ .. I

.. .. ..

..

i

~

...

~

J J

"

... ..

.. ..

.. .. ~

...

e ,;

j

I

fA

f

.. ..

...

..

--

-I

~

.E

.,

·S ·S

~

...

(f ^~

^t~

_0\ i

I ^... ^.. ^.. ^.

rJJ ^.. ^I

^.. ^.. ^..

e ^,;

+ +