• Aucun résultat trouvé

Techniques employed

Dans le document WORLD CLIMATE PROGRAMME DATA and MONITORING (Page 114-120)

SPECIAL TOPICS

TESTING FOR CHANGE IN VARIABILITY AND PERSISTENCE IN TIME SERIES

9.2 Techniques employed

This section details suggested statistical methods, tests and graphical approaches that may be of use when testing for change in variability and persistence in time series. Use of the methods is illustrated via examples in the final section.

9.2.1 Symmetrizing transform (Box-Cox)

It is often useful to symmetrize the data before testing (e.g. prior to application of Moolman’s test (Moolman, 1985)). This is important for parametric tests that are often based on the assumption of normality, but is less important for non-parametric tests. The transform is:

(9.1) distance between the lower and upper quartiles.

9.2.2 Moolman ‘s modification of the t-test

It is not obvious how to test for change in mean or trend in a time series exhibiting serial correlation. Moolman’s (1985) contribution was to adapt the usual t-test for shift in means or trend development to normal AR(1) processes with parameter φ. He derived the relation that

( )

φ

[ (

φ

) ]

α =1−Φzα 1− (9.2)

where zα is the upper critical l00α% value under independence, Φ(.) is the cumulative standard normal and α(φ) is the significance level of the dependent series. He found the t-test more powerful than the U-test (Wilcoxon, 1945; Mann & Whitney, 1947) in detecting shift and trend and that normality was approached by n = 70 (see also the example in the final section of this chapter). This technique provides an alternative to the non-parametric tests presented elsewhere in this report.

9.2.3 Testing for variability and persistence

The sample variance and first serial covariance of a zero-mean sequence are given by:

and (9.3)

where φ is the first serial correlation coefficient. These are summary (global) statistics and are averages of individual elements computed from the sample.

A possible approach to testing for changes in variability and persistence is to consider the terms vi =xi2 and ci = xixi+1. The {v} and {c} series are plotted, and non-parametric tests are applied to the segments of the series that are considered to be different.

An additional statistic that can be used for testing persistence, useful especially when the sample is suspected of being drawn from a non-stationary process, is based on the variogram at lag-one which is given by:

(9.5)

+

= x x n

g ( i i 1)2 /

where the sequence {x) does not need to be zero-mean. As with the {v} and {c} series, the {g}

series, made up from the individual elements of g, can be plotted and then non-parametric tests used to detect change. Indications are that, in tests on persistence, the elemental variogram g is more powerful than the covariance c for detecting change in correlation in a time series (see example in final section of this chapter).

A multivariate extension particularly useful for testing for change in a basin or region, is to compute the sequence Vt = cov(xt,xt+1), which is the covariance between the vectors of successive observations of streamflows (monthly or annual) for example. This can be generalised to the variance and the variogram if desired.

9.2.4 Grouping by threes to visualize and test for variability

A fairly simple method of visualizing the variation within a time series is to successively group, and average by threes, the elemental statistics such as {v}, {c} or {g}. This is easily achieved using a spreadsheet. The averages of each of the successive sequences is equal to the global variance, covariance or variogram of the sample. The successive grouping of the elements assists in visually detecting shifts and trends in these statistics.

The approach is illustrated below for the case of the covariance, c. Let

Ci = xixi+1, i = 1, 2, … , n-1 (9.6) and define

R3j = (c3j-1 + c3j + c3j+1)/3, j = 1, 2, ... , (n-1)/3 (9.7)

R9j = (R3j-1 + R3j + R3j+1)/3, j = 1, 2, ... , (n-1)/3x3 (9.8) etc.,

with the final sequence containing at least 3 groups. For example, if n = 244, the last grouping will be R81 with three elements in it. Of course, the cascade can be computed with any number of values forming the sequences of separate partial sums, but the symmetry associated with the odd numbers centres the successive sums at the mid-points of the intervals they summarize which is nice for plotting.

Once such a plot has been made, then it is relatively straightforward to decide whether φ is stationary or not. Where the R values appear to exhibit a trend, this can be treated by parting the series in two and comparing the two estimates of the serial correlation from the two sub-samples (found by averaging the covariances ci), using a t-test with Moolinan’s modification for correlation between the ci values. A more convenient way may be to use a non-parametric test on parts of the sequence {c}.

9.2.5 Windowing

An alternative to grouping by threes is a moving window wherein the statistic of interest is calculated. This is particularly appropriate in the calculation of the Hurst coefficient, for example. A development and exposition of this technique is given by Radziejewski &

Kundzewicz (1997).

9.2.6 Exponential filter

To examine change in the occurrence of flood-producing rainfall, one needs to extract the annual maxima of the accumulation of rain, not the 1 hour or 1 day maxima. Exponentially filtering, with a mean of 5 days for example, will yield maximum accumulations which are more appropriate for flood analysis. These can then be examined for change.

9.2.7 The bootstrap

A method which has promise when used in combination with the sequences of elemental variances, covariances or variogranis is the bootstrap. it is used here to compare the slopes of the linear trend lines fitted to the {v}, {c} or {g} values. Because of linearity, these trends are almost identical to the trend lines fitted through the derived R values (equations 9.7 and 9.8).

Again, the method is illustrated for the case of the covariance.

1 The first step is to take the series in question and estimate the linear trend-line of c by least-squares.

2 The next step is to analyse the time series, assuming that it is stationary, and fit the appropriate model ARMA (p,q) : p, q = 0, 1 or 2, employing the usual Box-Jenkins approach. Take the residuals {αi} which (under the null hypothesis) form an independent sequence derived from this analysis. From these, by sampling with replacement, generate a large odd number (say 101) of bootstrap sequences {xi}* of the same length as the original sample, using the parameters of the ARMA model estimated from the original sample. Hereon one can choose how to test for trend (see below for an example based on pairwise covariances using the grouped by threes approach).

3 In exactly the same way as in the original sample, calculate the bootstrapped {c}*

4 Accumulate all the 101 b* values and compute their mean, median, standard deviation and fourth spread.

The resulting statistics give the distribution of b under the null hypothesis of the sequence being stationary and provide a guide as to whether b measured in the original sample is significantly different from zero.

9.3 Examples

In this section, the methods described above are illustrated using a set of examples.

9.3.1 An artificial sequence

Fig. 9.1 is a plot of an artificial, standardized sequence of length 244, which was generated with a first serial correlation coefficient, r, that increases linearly from 0.0 to 0.4 over the length of the sequence. This sequence will be used to examine how to detect trend. It can be seen from the trace in the figure that the differences between successive values reduce as the correlation increases, as expected.

Fig. 9.1. Artificial standarized normal time series generated with r increasing linearly from 0.0 to 0.4.

Fig. 9.2 shows the pairwise covariances ci = xixi+1 for the sequence shown in Fig. 9.1. The trend line is fitted using linear least squares. Fig. 9.2 also shows the ci values successively grouped by threes (3, 9, 27, 81) with a linear trend fitted through the R27 values (averaged values of the successive 27-long sub-sequences). This gives a close approximation to the underlying trend.

The bootstrap resampling of the trend-line fitted through the pairwise covariariances of 101 sequences computed using the algorithm outlined above yields the following statistics of the 101 slopes:

• Mean 0.00015

• Standard deviation 0.00128

• Upper 95% confidence limit 0.00277

The observed value for the original series is 0.0017. Comparing this with the above statistics shows that the trend of the covariances is not significant at this leveL This is confirmed by examining the list of slopes of the bootstrap samples, for which 10 of the 101 slopes exceeded the sample value.

Fig. 9.3 shows the pairwise lag-one elements (all positive) of the variogram grouped by threes. It can be seen that the slope of the trend line is negative and confirms that the higher the persistence, the smaller the differences between successive values. The bootstrap samples of the slopes of the elemental variogram values gives a lower 95% confidence limit of -

0.00449 which is just above the sample value of -0.0048. Only 2 of the 101 bootstrapped slopes were below this value, lending support to its significance.

From this exploratory calculation on an artificial sample, it seems that the sequence of variograin elements is more sensitive to the detection of trend than is that of the covariance.

Fig. 9.2. Pairwise covariances grouped by threes for the sequence in Fig. 9.1.

Fig. 9.3. Pairwise lag-one variogram grouped by threes for sequence in Fig. 9.1.

9.3.2 Tree ring data from South Africa

Figure 9.4 presents a sequence of raw tree ring data that Moolman used in his thesis (1985).

Performing a Box-Cox symmetrizing transform with exponent 0.459 yields a value of 0.619 for the serial correlation. To test whether the mean of the first 20 years is the same as the remainder under persistence, Student’s t-test is calculated assuming unequal variances. This yields a one-tailed probability of exceedence of 0.0011 which has a z-value of 2.29.

Multiplying this by (1-0.619) gives 0.872 which has a probability of exceedence of 0.192, which is not significant.

References

Mann H.B. & Whitney D.R, 1947. On a test whether one or two variables is stochastically larger than the other. Ann. Math. Statist. 18, pp. 50-60.

Moolman, W.H., 1985. A Study of Tests for First Order Stationarily. Unpublished PhD thesis, University of Durban-Westville, Durban, South Africa.

Radziejewski, M. & Kundzewicz., Z.W., 1997. Fractal analysis of flow in the river Warta. J.

of Hydrol., vol. 200, pp 280-294.

Wilcoxon, F., 1945. Individual comparisons by ranking methods. Biometrics, 1, pp. 80-83.

CHAPTER 10 SEGMENTATION

Piere Hubert

10.1 Introduction

A possible manifestation of nonstationarity in time series is the existence of some modification of their statistical parameters, and especially a sudden change of the mean.

Series with such a change may exhibit a strong temporal persistence, with high values of the Hurst coefficient, but poorly fit any autoregressive model.

Some classical tests, (Pettitt, 1979; Buishand, 1982) help in detecting a possible change point of the mean so that the original nonstationary series can be split into two stationary sub series. The Bayesian procedure defined by Lee & Heghinian (1977) supposes the a priori existence of a change of the mean somewhere in the series and yields at each time step an a posteriori probability of mean change.

Yet these classical approaches seek one change point in the original series To go further and to explore multiple singularities, a segmentation procedure of time series has been developed (Hubert, 1997). It yields an optimal partition, from a least squares point of view of the original series into as many subseries as possible. The Scheffe test of contrasts ensures that all differences between two contiguous means remain simultaneously significant. The main problem has been to master the combinatory explosion while exploring the tree of all possible segmentations of a series.

Dans le document WORLD CLIMATE PROGRAMME DATA and MONITORING (Page 114-120)