• Aucun résultat trouvé

Extending Mathematical Operations

Dans le document Contributions Statistics (Page 181-185)

Data Structures

5.2 For Data Objects

5.2.2 Extending Mathematical Operations

It is obvious that the mathematical operations have to be extended for mul-tidimensional operations. The following extensions are suggested:

INV

INV(A[.,.,4]) INV(A[.,.,3]) INV(A[.,.,2]) INV(A[.,.,l ])

FIGURE 5.2. How the inverse matrix operation will work on a three dimensional array.

Unary mathematical functions.

Unary mathematical functions are mainly scientific functions like cosine, sine, exponential and logarithm, but also cumulative distribution functions and their inverses etc. Since these functions have only one argument and one result the extension is that we produce a multi-dimensional array of the same size as the input array, and the function is applied to each element.

Unary mathematical operators.

Unary mathematical operations are the unary minus, the logical not, the faculty and so on. These operators will also effect each element of the array and can be treated in the same way as the unary mathematical functions.

Vector operations.

We also have vector operations which are mainly operations on a variable.

Typical operations are the mean, the median, the variance, the summing etc.

Often it is of interest just to look at the conditional means, the conditional medians, the conditional variances, the conditional sums etc. The result will be a multi-dimensional array that in the working dimension only has one element (or k if we have k classes we condition on).

Data Structures 177 Matrix operations.

Operations which are specific for matrices are the multiplication of matri-ces, the inversion of matrimatri-ces, the transposition of matrimatri-ces, the calculation of moment matrices etc. Since the general assumption is that the rows rep-resent the observations and the columns reprep-resent the variables, the matrix operation will work on those columns and the rows which contain one data matrix. The parallel matrices will be seen as layers so that the operation will be executed on each layer (see Figure 5.2). Optional parameters will allow to execute the operation on other layers apart from rows and columns.

Binary mathematical operators.

Binary mathematical operators are the elementwise plus, the elementwise minus and the elementwise logical comparisons. We have to take into ac-count that we already have some kind of extension in the standard statistical languages. In EPP we need to center the data matrix by

y = X - mean(X)

One the left side of the minus we have a n x p matrix, on the right side a 1 x p vector. Thus for an elementwise operation these matrices are not compatible. Nevertheless this operation is allowed since the meaning is that we want to subtract the mean of X for each observation. As Table 5.1 shows, the programming languages do allow the elementwise operations for special

SlZes.

A possible generalization for a multi-dimensional array is to allowelementwise operation only if the size is the same or equal to 1 in each dimension. For

to each block of 2 x 2 matrices. This possibility can be included if we redefine the result size of an elementwise operation on two arrays. We define the size in the i-th dimension as the maximum of the sizes in the i-the dimension of the operands. With this definition we still have the possibilities given by Table 5.1, and we would be able to add blocks to the matrices as needed for the plotting of histograms. This way of dimensioning the result matrix will confuse the unexperienced user, so the first method is better as a

stan-178 ])ata Structures

left argument right argument result

nxp nxp nxp

dard operation. For the experienced user the second method simplifies the programing task.

Binary or n-nary mathematical functions.

Binary or n-array functions are functions like cumulative X2-distribution with d degrees of freedom, the normal random generator, univariate regression smoothers etc. Often we already have relationships between the parameters of a function given from the definition. A general rule to extend the parameters can not be given, but great care should be taken to find an appropriate solution of the problem. Let us take two examples:

• the normal random generator The form might be

y

=

normgen(n, 1', E)

where I' is a 1 x p X ql ... X qlc array and E is a p x p x rl ... x Tic array. In this case we can regard everything above the second dimension of the array as layers. The resulting array y would be a n xpx max(ql' rl) ... x max(qlc, ric), which means we have generated in one step max(ql' rl) x ... x max(qlc' ric) normally distributed random samples. Since it will be a condition that q, =

r,

or one of both have to be 1, we are able to

Da.ta. Structures 179 compute a lot of different random samples in one step. Especially for simulations this will be very helpful.

• the Nadaraya-Watson estimator

The standard form of the call to compute a N adaraya-Watson estimator will be

with

(zr, yr) = regest(z, y, bandwidth)

y bandwidth

n x 1 x ... array n x m x .,. array 1 x 1 x ... array

The result matrices will be a

zr k x 1 x maz( ... ) array yr k x m x maz( ... ) array

With this definition we can compute a lot of tasks:

- a univariate Nadaraya-Watson estimator,

- a set of univariate Nadaraya-Watson estimators for different sets of y, e.g. to calculate confidence intervals and

- a set of univariate Nadaraya-Watson estimators for different sets of bandwidths, e.g. to calculate the crossvalidation function.

It is easy to build a multivariate form

(zr, yr) = regestp( z, y, bandwidth)

y bandwidth zr yr

n x p x ... array n x m x ... array 1 x 1 x ... array k x p x maz( ... ) array k x p x maz( ... ) array

180 Data Structures

Further modifications can be done via including different kernels or a matrix of binwidths, e.g.

(zr, yr) = regestpkd(z, y, bandwidth, kernel, binwidth) The aim of the two examples is to compute as much as possible in one step.

We should keep in mind that this is just an offer to the user, we still could use loops for doing it.

Dans le document Contributions Statistics (Page 181-185)