Grand Tour - Motivation and History - Exploratory Projection Pursuit

Exploratory Projection Pursuit

4.1 Motivation and History

4.1.3 Grand Tour

The grand tour was developed by Asimov (1985). The representation of mul-tivariate data is done by showing a sequence of bivariate projections of these data. The theorem of Cramer-Wold is the basis for this method. Asimov proposed the following important properties for the sequence of projections:

• The sequence of projections should become dense in the space of all projections. G2,p ("Grassmannian manifold") stands for the space of all unoriented planes in the p dimensional space.

• The sequence of projections should become dense rapidly in G2,p'

• The sequence of projections should become dense uniformly in G2,p'

• The sequence of projections should be continuous, which means that the planes before and after the actual projection should be close.

• The sequence of projections should incorporate a degree of flexibility to optimize the goals mentioned above.

• The sequence of projections should be to reconstruct easily .

To fulfill these goals Asimov described three methods to choose a path through G²,p'

Torus method A curve

96 Exploratory Projection Pursuit

The final rotation matrix will be composed by R(t)

=

RI (t) ... Rp(p_I)/2(t) step is small we achieve continuity. Of course every projection is recon-structable.

At-random method

Generate two random vectors and orthonormalize them with the Gram-Schmidt method. This can be done in such a way that the sequence of projections is uniformly distributed in G2,p. The sequence of projection will become dense uniformly and rapidly, but will not be continuous.

Since we are using random seeds to initialize the random generator we can also reconstruct a single projection.

Buja, Asimov & Hurley (1989) have shown a way to obtain continuity.

They constructed a subspace interpolation between two projections in

Exploratory Projection Pursuit 97 p-space. The idea is to construct a rotation between the two planes in a four dimensional space, given by the projection planes.

This is the preferred method for implementing the grand tour in sta-tistical software.

At-random walk

Asimov before proposed a mixture of these two methods as follows:

• Choose a measure I' on all rotations of the p dimensional coordi-nate system such that it generates a dense subset in the space of rotations.

• Start with R(O) = Ip.

• Generate a rotation 9, according to the law 1'.

• Compute R(t) as 9,R(t - 1) and generate the projection plane as (R(t)el, R(t)e2)'

He described two measures that fulfill these conditions.

The problem of the grand tour is that we will have to review many planes for to find any structures. In Huber (1985) the RANDU dataset is used to show that a rotation by five degrees will hide the structure. The RANDU dataset was generated by a random generator proposed by IBM in the early seventies. As any linear congruential random number generator, it has the property that the data are lying on hyperplanes if we produce p dimensional data. But a bad choice of the involved constants for computing the random numbers had been made, as exactly 15 hyperplanes are generated for three dimensional data (see Figure 2.17). Obviously this is not a good random generator.

Table 4.1 shows how many planes we have to examine if the distance between two planes is less equal five degrees. If we wanted to revise all planes for a six dimensional dataset to find a structure and we would watch every plane for just one second we would need already 23 days of uninterrupted watching of the screen.

As a consequence we need other methods to pick out the interesting projec-tions which lead to the exploratory projection pursuit.

For a detailled overview about grand tours see Buja, Cook, Asimov & Hurley (1996).

98 Exploratory Projection Pursuit

Dimension No. of planes

3 263

4 51684

5 ~ 9000000

6 ~2000000000

7 ~200000000000

8 ~40000000000000

9 ~6000000000000000

10 ~800000000000000000

12 ~20000000000000000000000

14 ~400000000000000000000000000

16 ~70000000000000000000000000000000

20 ~3000000000000000000000000000000000000000 TABLE 4.1. Number of planes we have to look at if the distance between two planes is less equal five degrees.

4.1.4

Multidimensional Scaling

The aim of multidimensional scaling (MDS) is to find a low dimensional (d = 1,2,3) space so that the distances dr. between the objects rand s in this space match as close as possible the original dissimilarities Or. of a higher dimensional configuration space. For an analysis see Cox & Cox (1994).

Several MDS models will be examined: Classical (metric) scaling, least squares scaling, nonmetric scaling.

In metric scaling the dissimilarities Or, are taken immediately as euclidean distances. An easy algorithm is given by

1. Compute A

=

^(a^{r ,)}

=

^(-0.56~,)

2. Compute B = (ar• - ar. - a .•

+

a..) with ar. the column sums, a .• the row sums and a .. the total sum of A

3. Find the eigenvalues Ai and the eigenvectors Vi of B and normalize so that

vi

^Vi⁼^Ai

4. Take the first eigenvectors corresponding to the largest eigenvalues and show them in a plot

The algorithm shows one weakness: we have to choose the dimension of the projection. If the dissimilarities come from euclidean distances then B is a

+ +

+ + +

+ +

Exploratory Projection Pursuit 99

+ +

+ + +

+ ++ • +-q. '" + +

+ +.j+. + +

XB[,ll

I8(.lJ 0.l37563"11

FIGURE 4.2. Nonmetric multidimensional scaling on a subset of the Swiss banknote data (each fourth observation of the data is included).

positive semidefinite matrix. If B is a positive semidefinite matrix it follows that the eigenvalues are positive or zero. Thus a first choice would be to take the number of the nonzero eigenvalues as dimension d .

Since it holds that

we can use the proportion of the variance as a measure, explained by using d dimensions

i=l

if B is positive semidefinite. If B is not positive semidefinite

100 Exploratory Projection Pursuit

FIGURE 4.3. Metric multidimensional scaling on the Swiss banknote data with euclidean distances. Table 4.2 shows that we recover the di-mensionality of the dataset exactly.

;=1

is minimized (wr. being appropriate weights).

N onmetric scaling assumes that the level of measurement is at the nominal or, at best, at the ordinal scale. The transformation function

f

is a monotone

Exploratory Projection Pursuit 101

i Ai

1 597.061

2 186.188

3 48.439

4 38.737

5 16.957

6 7.067

7 4.743e-013 8 5.247e-014 9 4.105e-014 10 3.710e-014 200 -3.167e-013

TABLE 4.2. We see the eigenvalues computed by MOS. The eigenvalues larger than six are zero, only rounding effects make then different from zero. We discover that the Swiss banknote dataset is six dimensional (as expected).

function such that

f(dr.) ~ f(dtu ) if Or.

<

Otu

which means that the dissimilarity influences the stress function only indi-rectly

The stress function 82 and the minimization of it was proposed in Kruskal (1964a, 1964b). The algorithm is

1. Choose an initial configuration X

2. Normalize the configuration so that mean(X)

=

0 and var(X)

=

3. Compute dr.

4. Fit f(dr.), e.g. by monotonic least squares regression

5. Compute a new configuration X by minimizing the stress function 6. Go to 2.

102 Exploratory Projection Pursuit

Dans le document Contributions Statistics (Page 101-108)