Existing Computational Environments - 5.2'4 Hierarchical Objects

5.2'4 Hierarchical Objects

5.4 Existing Computational Environments

5 . ../.1 Classification of Software

As already mentioned in the beginning we nowadays have a lot of statistical software available. To get an overview it will be necessary to classify statistical software. The basic classification scheme is based on the aims of the software (see Table 5.3).

But not only the aims are important, but also how many aims are satisfied.

This leads to a classification as it is used by Koch & Haag (1995) in their yearly overview of statistical software:

• Statistical software systems

which try to satisfy a lot of aims simultaneously. Examples for these programs are S-Plus, SAS, GAUSS and XploRe 3.2.

• Special purpose programs

which want to do just one task very well. Examples are XGobi and XploRe 2.0.

188 Data Structures

GAUSS, SAS, S-Plus, X-Lisp-Stat Hathematica,Haple V, XGobi,

TABLE 5.3. Examples for general aims which statistical software tries to satisfy.

• Subroutine libraries

which can be used in other programs.

• Teachware

which are special programs to teach statistics.

Another classification is given by user groups: Students, consultants and re-searches. Each has its own needs:

Low price, easy interface Extensibility, speed

SPSS, SYSTAT, XGobi, XploRe 2.0

Online analysis, special topic, completeness, speed Extensibility, modern software engineering

teach ware capabilities, DataDesk, SAS,

Important: Easy interface, speed, low price, special topic GAUSS, S-Plus, XploRe 3.2, X-Lisp-Stat

As more aims a program wants to fulfill, as more work is necessary to do all the tasks. One of the big advantages of S-Plus is that a lot of people write additional software (see for e.g. XGobi) for it.

5.4.2 DataDesk

DataDesk is a software developed for Macintosh computers. The main aim is to visualize relationships between variables. Here relationship is meant in the sense of an exploratory data analysis. DataDesk is only available for Macintosh computers.

The data in DataDesk are stored as variables (vectors). A dataset consists of a set of variables. But DataDesk is only able to handle one dataset at a time.

If we want to handle two datasets we have to merge the two datasets into one.

The data are linked by their index number. DataDesk allows linking between all graphical objects and supports subgroup analysis (brushing). It offers all graphical tools of statistics, like boxplots, scatterplots, 3D-scatterplots. But the linking is extended over the graphics. In the linear regression analysis of DataDesk we can make a multiple linear regression and we get the result of the linear regression as in the Figures 5.7 and 5.8 in an output window. If we now drop one of the variables in the linear regression, we will immediately get the recomputed values of the linear regression in our output window.

DataDesk offered up to version 4 no programming language, so as a conse-quence we can neither extend it to new algorithms nor can we see the data structure The new version 5 has now also a programming language (Theus 1996).

The help system is integrated into the help system of the MacIntosh. As the MacIntosh computers still offer a good user interface, DataDesk can be easily used.

Missing values are handled in the "standard" way, an observation is deleted if a missing in one variable appears. This leads to the effect that we can have a different number of points in the scatterplots when we show scatterplots of three variables.

190 Data Structures

5.4.3 GAUSS

GAUSS is a matrix-oriented programming language. It has mutated from a DOS program to a UNIX- and a DOS-program. Since GAUSS was mainly written in assembler which makes it very fast, the extension to UNIX took plenty of time. A lot of routines are now written in C. Nevertheless there is no perfect compatibility between DOS and UNIX. One example are the random generators. They do not give the same results on DOS and UNIX even when the same seed is used for initialization. But the UNIX-GAUSS offers a possibility to make the random generator to work like in DOS.

The basic data object in GAUSS is a matrix. The graphic possibilities of GAUSS in terms of interactivity are rather poor, it only offers static graphics. Since we have no dynamic graphics we have no possibility of brushing or linking.

Although GAUSS is a programming language it does not support any pro-grammable menu driven environment. Nevertheless we have programs on top of GAUSS, e.g. MULTI, which is menu driven.

GAUSS allows a user defined error handling and supports the user by a help system. But some help, e.g. if we want to know the parameters of a command, are not easily accessible.

Mathematica and Maple V

Mathelllatica and Maple V are doing almost the same. They are no statis-tical programs, their emphasis is more on general mathematics. The main advantage for statisticians is the possibility of symbolic computation which allows to handle formulas.

It seems that Math8lllatica is slightly better in handling symbolic computa-tions whereas Maple V gives better numerical results. Mathelllatica does not provide interactive or statistical graphics. Only the basic statistical functions are implemented (Wolfram 1991) in six standard packages:

• Descriptive statistics

• Continuous distributions

• Discrete distributions

• Hypothesis tests

• Confidence intervals

• Linear regression

Data Structures 191 Both, Mathematica and Maple V, run under different platforms. They use lists and arrays to handle objects. It is possible to build up larger units like matrices and multi-dimensional arrays.

Neither of them supports dynamic graphics, therefore we have no linking or brushing. They do not offer statistical graphics at all.

The programming language of the programs is very large, but more directed to symbolic computations. Nevertheless the handling of the programming language can be complicated. When I tried to compute the mean squared error of a kernel estimator one of the tasks was to replace a function

f

^{by its}

Taylor-expansion of order J. Only with some tricky handling of Mathematica commands I was able to achieve this.

Since both program run under a GU! all possibilities of a good help system are offered. But at least Mathematica has a very short help system. Maple V offers a topic oriented help in a way that we can click on the topics and get to see the subtopics or the appropriated command.

5.4.4 S-Plus

S-Plus is one of the latest statistical packages which runs on different plat-forms (UNIX, PC).

It offers a modern object orientated programming language. The object ori-entation supports hierarchical objects (see PPR).

Since a lot of people use it, we have a lot of routines available. The integration in a GU! environment allows good graphics although the handling could be improved. Linking and brushing are available in a limited form (scatterplot matrix), but a programmable environment for this will come soon.

The help system is not very convincing, but we have lot of literature about S-Plus available.

5.4.5 SAS

SAS is a old batch-orientated programming language. Nowadays it runs on a lot of different platforms. During the time SAS has grown to an allround program which means you will find an appropriate SAS function for a lot of statistical (standard) problems.

The data objects in SAS are variables (vectors). They can be built up to datasets (matrices). A special DATA step is used to declare these objects.

Although SAS now offers dynamic graphics we have only poor possibilities

192 Data Structures

for brushing and linking. The programming language allows to build modules which encapsulate the whole SAS code so that the unexperienced user will only see a module where he could use a menu.

The help system is part of the GUI help system. The paper documentation of SAS is excellent, it does not only offer a description of the commands but introduces them in the topic as well.

5.4.6 SYSTAT and SPSS

SYSTAT and SPSS are menu driven programs. SYSTAT runs on MacIntosh computers and under Windows, SPSS only under Windows. We can do the standard routines under both programs.

Both programs use spreadsheets for the data input, which indicates that a matrix structure is used to store the data. But as in DataDesk both programs can only handle one dataset.

They offer mainly static graphic and some dynamic graphics (e.g. 3D-scatter-plot, smoothing in scatterplots). But linking and brushing is only possible in a very limited form, e.g. SYSTAT refuses to brush more than 50 datapoints.

Although both programs have a programming language, the language is hid-den. It seems to be difficult to introduce new algorithms to both program packages. A new- or redefinition of menuitems is impossible.

The help system is again integrated in the GUI help system. Nevertheless the help system is quite short. SPSS tries to introduce the user to the topic, but often the information is too sparse. Especially some of the tests used are not described sufficiently.

It offers all graphical possibilities of a statistical program. This includes link-ing, brushing and dynamic graphics. Since linking is done through Lisp it should be possible to link different datasets and to handle them

appropri-Data Structures 193 ately.

Most of the routines are available in Lisp and we can have a look at them.

The programming language allows us to build up completely menu driven environments. The help system is integrated in the GUI help system.

The main drawback from my point of view is that the programming language is based on X-Lisp. It is very difficult to change from procedural languages like Pascal, C or Fortran to Lisp which is based on the manipulation of lists.

When I tried to install X-Lisp-Stat for Windows I had a lot of problems which I could solve, still I was very disappointed what is being delivered together with the system. The documentation was not too good.

5.4.8 XGobi and XploRe 2.0

XGobi and XploRe 2.0 are examples of highly specialized software. XGobi runs only under UNIX and the main aim is to visualize multivariate data.. XploRe 2.0 runs under DOS and is specialized in nonparametric smoothing methods.

Both programs are menu driven.

The used data. objects are matrices. In XploRe 2.0 these matrices are called

"workspaces". Whereas XploRe 2.0 can handle several datasets (matrices), XGobi can handle only one dataset. But it is possible to start several XGobi's which communicate with each other.

Both programs are offering dynamic and interactive graphics with the pos-sibility of linking and brushing. It is not possible to link different graphics.

In principle XGobi and XploRe 2.0 offer the possibility to extend the system which nevertheless turned out to be quite difficult. For example Prof. Schimek and his group tried to use the programming interface to extend XploRe 2.0, but he admitted that they had a lot of problems. I tried to include some faster EPP-indices in XGobi by myself, but I found it quite difficult. Finally one of the authors of XGobi, Dianne Cook, did it for me so that I only had to deliver the routines.

5.4.9 XploRe 3.2

XploRe 3.2 is an extension of XploRe 2.0 and runs only under DOS. It has a programming language which allows to create and use menus, though we still have a concentration on smoothing methods.

The standard data object is a matrix. In some sense it is possible to build hierarchical objects. The program offers interactive and dynamic graphics including linking and brushing. In some aspects the linking is not as good as

194 Da.ta. Structures

X-Lisp-Stat or DataDesk. Since the basis is a programming language it is more difficult to handle this program than DataDesk. Like X-Lisp-Stat it offers programmable links, but the handling is much easier. The help system is topic orientated and context-sensitive.

IJt ..i....

~~

... Data GAUSS Mathe Maple S:Plus SAS SYS SPSS X-Lisp XGobi XploRe ~

=

t:C Desk matica V TAT Stat 2.0 3.2 4.0 .., t-' MUlti-platform 7

a

't;Ej n y y y y y y n y n n n en Data objects ~ (b ~ Vector

d

y y y y y y y y y y y y y Cl Matrix

....

n y y y y y y y y y y y y (b 0 ~

=

Matrices n y y y y y n n y n y y y "t:I Arrays n n y y y n n n y n n n y ~ ~. Rier. objects y <::I'" n n n n y n n n y n n n 0 0 Graphics ~ 1:1

...

So Stat. graphics y y y y y y y y y y y y y ~ Dyn. graphics ?

-

~ y n n n y y y y y y y y

-

=r (II Linking '"tl n Scatterplot matrix y n n n y n y n y n y y ?

a

~ tI::l PI Graph. windows y n n n n n n n y n n y ?

a

r:1' Difr. datasets ? !? y n n n n n n n y n n n ~ ~ Programming language ;;;' ~

..,

Menu driven y n n n n y y y n y y n n 0 Available y

....

n y y y y y y y y n n y (II Visible n y y y y y n n y n n y y ~. ~ Program able menus n n n n n y n n y n n y ? Er User def. error n y y y y y n. n y n n y 7 t::1 (JQ Help system ~

..,

PI ~ ? CIl ~ on paper y y y y y y y y n y y y ~ Iii' online y y y y y y y y y ?

...

~ y y n

=

n' topic oriented ? n y y n y y y y y y n n y ~ t!.

=

context-sensitive ?

...

Y n n n n n y y n n y y (II "t:I Missing treatment 7

.., ...

y y y y y y n n y y y y 9

....

co en

Dans le document Contributions Statistics (Page 192-200)