• Aucun résultat trouvé

SPECIFY, HYPOTHESIZE, ASSUME, OBTAIN, TEST, OR PROVE?

I am constantly amazed that many researchers don't understand the differences among the six verbs "specify", "hypothesize", "assume", "obtain", "test", and

"prove".

An example

Consider the following example: You're interested in the relationship between height and weight, and you would like to carry out a study of that relationship for a simple random sample of some large population.

What do you SPECIFY? If you plan to use traditional statistical inference

(significance testing) you need to specify the magnitudes of tolerable probabilities of Type I (alpha) and Type II (beta) errors before you see the data. (For the latter you can specify the power you want rather than the tolerable probability of a Type II error, where power = 1 - beta.) If you plan to use interval estimation you need to specify how confident you want to be with the finding you'll get and the tolerable margin of error (half-width of the confidence interval), also before you see the data.

What do you HYPOTHESIZE? If you plan to use significance testing you need to hypothesize both a null value (or set of values) for a particular parameter and an alternative value (or set of values) for that parameter. If you plan to use interval estimation you need not, nay cannot, hypothesize any values beforehand.

What do you ASSUME? For significance testing you need to assume the independence of the observations and random sampling (which you have), and you might need to assume a normal distribution of the observations in the population from which the sample is to be drawn. You might also need to assume homogeneity of variance, homogeneity of regression, and/or other things. For interval estimation the assumptions are the same. For Bayesian inference you need to consult your local friendly statistician.

What do you OBTAIN? For both significance testing and interval estimation the first thing you obtain is the appropriate sample size necessary for your

specifications, before you embark upon the study. Upon completion of the study you obtain the relevant descriptive statistics, p-values, actual confidence

intervals, and the like.

What do you TEST? For significance testing you test the null hypothesis against

So what's the problem?

1. Some people say you calculate (obtain) power for a study. No, you specify the power you want (directly; or indirectly by specifying the tolerable probability of a Type II error, which is 1 minus power). There is such a thing as post hoc power in which power is calculated after the fact for the effect size actually obtained, but it is a worthless concept. See below for more about post hoc power.

2. Some people say you specify the sample size. No, unless you're stuck with a particular sample size. As indicated above, you determine (calculate, obtain) the appropriate sample size.

3. Some people say you assume the null hypothesis to be true until, or unless, rejected. No, you hypothesize it to be true (although you usually hope that it isn't!), along with an alternative hypothesis, which you usually hope to be true.

4. Some people say you hypothesize that the population distribution is normal.

No, you assume that (sometimes).

5. Some people say you prove the null hypothesis to be true if you don't reject it.

No, you calculate the probability of getting the statistic you got, or anything more discrepant from the null-hypothesized parameter, if the null hypothesis is true. If that conditional probability is greater than your pre-specified alpha level, you cannot reject the null-hypothesized parameter. But that doesn't mean you've proven it to be true.

Some of those same people say you prove the null hypothesis to be false if you reject it. No; if the conditional probability is less than your pre-specified alpha you reject the null-hypothesized parameter. But that doesn't mean you've proven it to be false

Back to the example What should you do?

a. If you're going to use significance testing, you should first SPECIFY alpha and beta. The conventional specifications are .05 for alpha and .20 for beta (power of .80), but you should preferably base your choices on the consequences of

making Type I and Type II errors. For example, suppose your null hypothesis will be that the population correlation is equal to zero and you subsequently reject that hypothesis but it's true. There should be no serious consequence of being wrong, other than your running around thinking that there is a non-zero

relationship between height and weight when there isn't. In that case you should feel free to specify a value for alpha that is more liberal than the traditional .05 (perhaps .10, double that probability?). If the null hypothesis of zero is pitted

against an alternative hypothesis of, say, .90 (a strong relationship) and you subsequently do not reject the null but it's false, you will have missed a golden opportunity to be able to accurately predict weight from height. Therefore, you should feel free to decrease beta to .05 (increase power to.95) or even less.

b. If you're going to use interval estimation, you should first SPECIFY the maximum margin of error you will be able to tolerate when you make your inference from sample to population, along with the associated specification of how confident you want to be in making that inference. The former might be something like .10 (you'd like to come that close to the population correlation).

The latter is conventionally taken to be 95% but, like alpha and beta in significance testing, is always "researcher's choice".

c. Once those specifications have been made, the next step is to use one of the various formulas and tables that are available for determining (OBTAINING) the sample size that will satisfy the specifications. If you've intellectualized things properly, it will be a "Goldilocks sample" (not too large, not too small, but just right).

d. For significance testing you are now ready to HYPOTHESIZE: one value (or set of values) for a parameter for the null hypothesis, and a competing value (or set of values) for the alternative hypothesis. For a study of the relationship between two variables (in your case, height and weight), the null-hypothesized parameter is almost always zero, i.e., the conservative claim that there is no relationship. Things are much trickier for the alternative hypothesis. You might want to hypothesize a particular value other than zero, e.g., .60, if you believe that the relationship is positive and reasonably large. (You probably would not want to hypothesize something like .98 because you can't imagine the

relationship to be that strong.) Or you might not want to stick your neck out that far, so you might merely hypothesize that the correlation in the population is positive. (That is the conventional alternative hypothesis for a relationship study, whether or not it is actually stated.) There are other possibilities for the

alternative hypothesis, but those should do for the present.

e. For interval estimation you get off easy, because there are no values to hypothesize. You have made your speciifications regarding tolerable margin of error and degree of confidence, but you are uninterested, unwilling, or unable to speculate what the direction or the magnitude of the relationship might be.

No matter whether you choose to use significance testing or interval estimation, if the Pearson product-moment correlation coefficient is to be the statistic of

principal interest you will need to ASSUME that in the population there is a

f. You're now ready to draw (OBTAIN) your sample, collect (OBTAIN) the actual heights and weights, and calculate (OBTAIN) the sample correlation. If you've chosen the significance testing approach you can TEST the null hypothesis of no relationship against whatever alternative hypothesis you thought to be relevant, and see whether the p-value corresponding to the sample correlation is less than or greater than your pre-specified alpha. If it is less, the sample correlation is statistically significant; if it is greater, the sample correlation is not. If you've chosen the interval estimation approach, you can construct (OBTAIN) the confidence interval around the sample correlation and make the inference that you are X% confident that the interval "captures" the unknown population correlation.

g. You will not have PROVEN anything, but if you've chosen the significance testing route you will have made the correct inference or you will have made either a Type I error (by rejecting a true null) or a Type II error (by not rejecting a false null) but there's no way you would be subject to a Type I error and a Type II error (you can't both reject and not reject the null). Unfortunately, alas, you will never know for sure whether you're right or not, but "the odds" will usually be in your favor. Similarly, if you've chosen interval estimation your inference that the parameter has been captured or has not been captured can be either right or wrong and you won't know which. But once again "the odds" will be in your favor. That should be comforting.

What you should not do

The first thing you should not do is use both significance testing AND interval estimation. As you might already know, a confidence interval consists of all of the values of a parameter that are "unrejectable" with a significance test. There is an unfortunate tendency these days to report the actual p-value, e.g., .003, from a significance test ALONG WITH a confidence interval (usually 95%) around the obtained statistic.

The second thing you should not do is report the so-called post hoc (or

retrospective or observed) power, along with or (worse yet) instead of the a priori (ordinary) power. Post hoc power adds no important information, but has

unfortunately been incorporated into some computer packages, e.g., SPSS's Analysis of Variance routines. It is perfectly inversely related to p-value.

Both things drive me up a wall. Please don't do either of them. Thank you.

CHAPTER 22: THE INDEPENDENCE OF OBSERVATIONS