• Aucun résultat trouvé

Setting the scene

Dans le document WORLD CLIMATE PROGRAMME DATA and MONITORING (Page 56-59)

PLOTTING PRACTICE AND STYLE

5.2 Setting the scene

5.2.1 Some basics of statistical testing for change

This section provides an overview of some of the main statistical concepts and terminology required for the statistical testing of change. For further details on statistical testing the reader should refer to a standard introductory statistical textbook (e.g. Chatfield, 1970) or to texts such as Hirsch et al. (1992) and Helsel & Hirsch (1992).

Types of change

Change in a series can occur in numerous ways: e.g. steadily (a trend), abruptly (a step- change) or in a more complex form. It may affect the mean, median, variance, autocorrelation or almost any other aspect of the data.

The most widely used tests for change look for one of the following

• Trend in the mean or median of a series

• Step-change in the mean or median of a series.

There are also some tests that look for a general change in distribution (Section 5.6).

Trend and step change are special cases of a change in distribution. Tests for a change in distribution are generally not particularly powerful: Le. if trend is present it would be best detected by a test for trend. However, such tests may be useful as a general check for evidence of change.

Testing for more complex types of change and for measures other than the

mean/median generally requires use of advanced techniques such as Maximum Likelihood (see also Section 5.3). Typically these techniques can be difficult to apply and are beyond the scope of this report.

Hypotheses

The starting point for a statistical test is to define the null and alternative hypotheses; these are statements that describe what the test is investigating. The null and alternative hypotheses are usually framed in terms of the types of change described above. For example, to test for trend in the mean of a series the null hypothesis (Ho) would be that there is no change in the mean of a series, and the alternative hypothesis (Hi) would be that the mean is either increasing or decreasing over time. To test for step-change in the mean of a series, the null hypothesis would again be that there is no change in the mean of the series, but the alternative hypothesis would be that the mean of the series has suddenly changed.

The starting point for statistical testing is to assume that the null hypothesis is true, and then to check whether the observed data are consistent with this hypothesis. The null hypothesis is rejected if the data are not consistent.

Test statistic

The test statistic is a means of comparing the null and alternative hypotheses. It is just a numerical value that is calculated from the data series that is being tested. A good test statistic is designed so that it highlights the difference between the two hypotheses. A simple example of a test statistic is the linear regression gradient: this can be used to test for a trend in the mean. If there is no trend (the null hypothesis) then the regression gradient should have a value near to zero. If there is a large trend in the mean (the alternative hypothesis) then the value of the regression gradient would be very different from zero. More formally, to carry out a statistical test it is necessary to compare the observed test statistic with the expected distribution of the test statistic under the null hypothesis. The significance level of a test statistic expresses this concept more formally.

Significance level

The significance level is a means of measuring whether a test statistic is very different from values that would typically occur under the null hypothesis. Specifically, the significance level is the probability of a value as extreme as, or more extreme than the observed value, assuming “no change” (the null hypothesis). In other words, significance is the probability that a test detects trend when none is present.

A possible interpretation of the significance level might be:

• Significance level >10% - very little evidence against the null hypothesis (Ho)

• 5 % to 10 % - possible evidence against H0

• 1 % to 5 % - strong evidence against H0

• below 1 % - very strong evidence against H0.

Note that when reporting results the actual significance levels should normally be quoted (e.g. a significance level of 6.5 %).

For many traditional statistical methods, significance levels can be looked up in reference tables or calculated from simple formulae, providing the required test assumptions apply. In general, the significance level can be found if the distribution of the test statistic under the null hypothesis (i.e., assuming the null hypothesis is true) is known or can be estimated. One case where this distribution is usually easy to determine is where the data are independent and normally distributed. Resampling methods provide an alternative, robust method of estimating the test statistic distribution in a general case.

Power and errors

There are two possible types of error that can occur in a test result. The first is that the null hypothesis is incorrectly rejected (type I error) - the significance level expresses the probability of this error. The second is that the null hypothesis is accepted when the

alternative hypothesis is true — type II error. A test which has low type II error probability is said to be powerful. In general more powerful tests are to be preferred. The power of the test is the probability of correctly detecting a trend when one is present.

5.2.2 Cautionary words

Poorly understood data gives poor results

Statistical tests can be easily misapplied unless the data is thoroughly understood. A prerequisite before undertaking any formal statistical testing, is that the data should be quality controlled (Chapters 2, 3) and that an exploratory analysis (Chapter 4) should have been carried out. Statistical testing is a clear case where the “Garbage In Garbage Out” principle applies.

Inappropriate test assumptions are dangerous

If the assumptions made in a statistical test are not fulfilled by the data then test results can be meaningless. For example, many statistical tests are founded on an assumption that the data being tested are normally distributed. If the data follow a strongly non-normal distribution then the test results cannot be trusted. Another common assumption that can lead to highly misleading test results if ignored is that data values are independent. Many hydrological data sets either show autocorrelation (correlation from one time value to the next: also referred to as serial correlation or temporal correlation) or spatial correlation (correlation between sites) and therefore data values are not independent. It is very important to understand what restrictions apply to a particular statistical test, and in what situations it is valid to apply the test.

A statistical test provides evidence not proof

Statistical tests give results that are expressions of probability and not certainty. There is always the chance that the null hypothesis was true when a test result suggests it should be rejected. Similarly, if the null hypothesis is accepted, then this result says only that the available evidence does not contradict the null hypothesis, it is not proof that the null hypothesis is true.

Each statistical test frames only a very specific question

There is no universal test that can prove that a series is truly free of any change. For example, a test result that shows there is no conclusive evidence of a trend in the mean does not establish that the variance of the same series is unchanged, or that frequency and magnitude of the extremes are unchanged.

Tests can be sign for the ‘wrong’ reason

Even if a test result is significant it does not prove that the hypothesised change has taken place. For example, if there has been a marked step change in a data series then it is likely that a test for trend will give significant results — even though there is no trend. Often a test can only be correctly interpreted if it is examined alongside plots of the data, and with some understanding of possible causes of change.

Sign is not the same as importance

A test result may be highly significant (i.e. provide strong evidence against the null hypothesis) but the size of the observed change may be so small that it is of no importance.

Conversely, an important level of change might not be significant because noise in the data means it cannot be statistically distinguished from the null hypothesis. In such cases it is

5.2.3 The components of testing for change The main stages in statistical testing are

• Decide what type of series/variable to test depending on the issues of interest (e.g.

monthly averages, annual maxima, deseasonalized data etc)

• Decide what types of change are of interest (trend/step-change)

• Check out data assumptions (Section 5.5)

• Select one or more tests/test statistics that are appropriate for each type of change (Using more than one is good practise: Section 5.6).

• Evaluate significance levels, using resampling methods if needed (Section 5.3, 5.4)

• Investigate and interpret results (Section 5.7).

Dans le document WORLD CLIMATE PROGRAMME DATA and MONITORING (Page 56-59)