• Aucun résultat trouvé

For Graphical Objects

Dans le document Contributions Statistics (Page 174-178)

Data Structures

5.1 For Graphical Objects

Data Structures

Summary

In the first section we will show that graphical objects can be generated in three steps. Then we will develop a hierarchy for the graphical data struc-tures (datapart, windows, displays). In the next section we will give reasons why matrices are no sufficient structure to store statistical data, so we need multidimensional arrays. Then we will discuss their impact on mathematical and statistical operations. The second section will close with a description why we need hierarchical objects. In the third section several forms of linking will be discussed. First we will give examples of linking plots in the thesis, then we will show further examples of linking, i.e. asking data themselves or linked data, the link of events with subroutines and at last the linking between different datasets. The fourth section will describe in short some statistical packages and indicate which features concerning data structures are available in these programs.

5.1 For Graphical Objects

5.1.1 Generating Graphical Objects

We have seen in section 2 that we have to distinguish between two kinds of graphical objects:

• points, curves and surfaces

• glyphs

In the first class we have boxplots, quantile-quantile plots, histograms, regres-sograms, scatterplots, 3D-scatterplots, sunflower plots, scatterplot matrices, parallel coordinate plots, Andrews curves and dendrograms. The second class consists of piecharts, star diagrams, Chernoff faces and so on.

170 Data Structures

The graphical tools of the first class can be decomposed into three steps 1. do the mathematical computation,

2. create graphical objects

3. show the graphical objects in a plot to the user.

The advantage of this method is not only that we can offer the standard tools, but we are also open for new graphical methods. The above mentioned graphics can be decomposed as follows:

Boxplot

1. Compute the 3 quartiles, the mean and the ranges.

2. Create the graphical objects like boxes, lines and the datapoints which are outside values.

3. Show the graphical objects in a plot. If we want to show several boxplots we can shift the graphical objects by an addition of a point.

Quantile-Quantile Plot

1. Compute the theoretical quantiles for one or two distributions.

2. Do nothing.

3. Show the true values and the theoretical quantiles in a plot.

Histogram, regressograms 1. Bin the data

2. Create the graphical objects (lines or outlined boxes) 3. Show the graphical objects in a plot

Scatterplot, 3D-scatterplot 1. Do nothing

2. Do nothing

3. Show the dataset in the plot and, if necessary, draw lines Andrews curve

1. Compute a curve for every observation 2. Do nothing

3. Show the curves in a plot

I>ata Structures 171 Parallel coordinate plot

Tree

1. Compute a density for all axes and within axes lines 2. Do nothing

3. Show the dataset in the standard plot and combine the datapoints to curves

1. Compute a tree of merging clusters depending on the sum of the within cluster variance

2. Transform the tree into the graphical object

3. Show the dataset in the standard plot and combine the datapoints to a dendrogram

Apart from some special commands to produce the appropriate datasets, we need one type of standard plots, which basically is a 2D- or 3D-scatterplot.

5.1.2 Dataparts

In the scatterplot we want to show different datasets (e.g. in the case of the regression). This leads to the concept of dataparts which contain exactly one dataset with all the necessary attributes to plot it. The structure of the datapart will be

• data about observations - location of the observation - colour of the observation - size of the observation

- form of the observation, the form can be a string

• data about lines between observations - which observations form one line - type of the line

- colour of the line - thickness of the line

The structure can easily be extended to incorporate areas built up from datapoints. The shells in the cover of the book of Scott (1992) could be produced in this way.

172 Data Structures

5.1.3 Graphical Windows

A window can be composed from several dataparts. Additionally we need some parameters which are global for the window. This includes the following objects: for a 3D-scatterplot or projection of higher-dimensional data we need no box, but we have to plot the axis into the data

• how the axes are scaled

which is the minimum value, which is the maximum value, how many tickmarks, which is the output format of tickmark values

• how does an axis appear

as a tripod in the right upper vertex of the window, as a tripod in the data with or without tickmarks, text, etc.

5.1.4 Glyph Windows

To incorporate glyphs is a difficult task. Glyph objects are piecharts, star diagrams and Chernoff faces. In du Toit, Steyn & Stumpf (1986) a lot of other glyphs like tree-diagrams are shown. We have two choices to put them into a standard 2D-window: we either use the form-parameter for a datapoint and allow a special language to produce the desired glyph, or we create an own window for this type of graphics. If we compare for example star-diagrams and Chernoff faces we see that all necessary operations like sorting, reordering etc.

are the same, only the appearance changes. One disadvantage of incorporating Chernoff faces into the standard window is that the computation as given by Flury & Riedwyl (1981) is very intensive and will produce big datasets which have to be transmitted to the window. It seems reasonable to program a special window which has a parameter which tells us what the appearance will be like.

Data Structures 173

5.1.5 Displays

As mentioned in section 1.2.2 the flood of windows is a real problem during working. It is pointed out that we need different windows to display informa-tions about the same task. Examples can be seen in a lot of pictures in this thesis.

So creating a display means the creation of a group of (maybe) different types of windows. Operations in one window will affect all other windows of a display.

While there is only one display visible in XploRe 3.2, window systems offer several displays simultaneously. The implementation in XploRe 4.0 will allow one display in one window, but several displays can overlap each other.

5.1.6 Updating of Windows

Since interactivity is necessary in statistical tasks we have to have the oppor-tunity to influence every component of a window. The same has to be true as to the content and appearance of a window from outside. In a programming language we need a command that will allow such updates. As an example we can again use the teachware for the regression smoothing. Most of the methods incorporate a smoothing parameter which should be chosen by the user himself to see the effect of over- and undersmoothing. A change of the smoothing parameter will effect the regression line. We have to recompute the line and to plot it into the window.

Dans le document Contributions Statistics (Page 174-178)