• Aucun résultat trouvé

Hypothesis: the research page. Part 3: Power, sample size, and clinical significance.

N/A
N/A
Protected

Academic year: 2022

Partager "Hypothesis: the research page. Part 3: Power, sample size, and clinical significance."

Copied!
3
0
0

Texte intégral

(1)

VOL 47: JULY • JUILLET 2001Canadian Family PhysicianLe Médecin de famille canadien 1441

Resources

Ressources

T

his third ar ticle in a series on basic statistics deals with power and sample size, α and β errors, and clinical and statistical significance.1-3

In plain English

Confidence inter vals (CIs) and P values help us determine the likelihood that a difference in a study is due to chance. Power deals with the opposite issue; if we do not see a statistical difference, how can we be sure there really is no difference?

The chance factor (P value), set arbitrarily at 5% or .05 and accepted as the standard, is also called α. If we set αtoo high, say at 10% or .1, we run the risk of mak- ing an “αerror” where we say a difference exists when in reality the difference was due to chance. If we set α too low, we run the risk of missing a difference that does exist. The possibility of concluding that a differ- ence does not exist when it does is called a “βerror.”

By convention, a βof .2 or 20% is thought to be the minimum needed. It seems we are more willing to risk making a βerror (incorrectly concluding that a differ- ence does not exist) than we are of making an αerror (incorrectly concluding that a difference exists).

The power of a study is the degree to which we are certain that, if we conclude a difference does not exist, it in fact does not exist. This is determined by 1 minus β(power = 1-β). Since βis generally set at .2, the accepted level of power is 1–.2 = .8 or 80%. The way to ensure a power of 80% is to do a sample size calcula- tion, which answers the question, “If I want to be 95%

certain that any difference I see is not due to chance and 80% certain that if I conclude there is no difference I am correct, how many people do I need in this study?”

If an article concludes no difference was found, the authors should tell you the level of cer tainty (power) with which they can make that conclusion.

Of course, if a statistical difference is seen (P < .05 or the 95% CI does not include 1), then by definition there was sufficient power. If the power is really high (ie, if there is a huge sample size compared with the number actually needed so that the power is, say 99%), statistical differences can be seen even when

ver y small real clinical dif ferences exist. For instance, an RR of 1.2 with a 95% CI of 1.1 to 1.3 might be highly statistically significant; the CI is very narrow due to the large sample size. This means the difference is likely to be real and not due to chance, but is the difference clinically significant? Sometimes it is, depending on the seriousness of the issue. If the RR indicates a child is 1.2 times more likely to die within the next 3 months if exposed to X, it is highly clinically significant. If it indicates that people are 1.2 times more likely to get a runny nose if they go out in cold weather without a hat, it is less important.

In statistical terms

Suppose the prevalence of angelitis in your city is 10%.

A new drug, angel dust, has been discovered that seems to be effective for treating angelitis. You want to do a study to determine how effective it actually is.

The first thing you need to consider is what a clin- ically significant decrease in angelitis would be. How much would the drug have to decrease the preva- lence of angelitis to make it useful? You decide that, if angel dust can reduce the prevalence of angelitis from 10% to 6%, it would be worthwhile. You now have to determine how many people you need in your study to make your results statistically signifi- cant should they show a decrease to 6%.

You know that true prevalence in the untreated pop- ulation is 10% (Figure 1A). If your city had a population of 100 000 and if you randomly sampled 500 of these people, you might not get exactly 10% with angelitis.

You might get 9% or 8% or perhaps 12%. If you kept tak- ing different 500-person samples, you would get a range of prevalences that followed a normal cur ve (Figure 1B), and 95% of all results would fall within two standard deviations (SD) of the middle of that curve; 5% would be outside (2.5% above and 2.5% below) those two SDs. The larger the samples (eg, 1000-person samples), the narrower and taller this normal curve would look (Figure 1C). If you took smaller samples, the curve would be spread out and flatter (Figure 1D).

How does this af fect your study? Figure 2A shows the 10% prevalence rate with a normal curve

Part 3: Power, sample size, and clinical significance

Marshall Godwin, MD, CCFP, FCFP

Hypothesis: The Research Page

(2)

1442 Canadian Family PhysicianLe Médecin de famille canadienVOL 47: JULY • JUILLET 2001

Resources

Ressources

from taking many 200-person samples. The 6% mark comes well within the normal curve of the population with a mean prevalence of 10%. This means that, if you see prevalence reduced to 6% after treatment with angel dust, it could be due to chance because a sample taken from the population not taking angel dust could also give you the same result. To be 95%

certain that angel dust is having an effect, we need to change the shape of that curve so that the 6% mark comes below 2 SDs of the normal curve. To do that, we increase the sample size to 750 and get the result shown in Figure 2B. Now, if the people taking angel dust have a 6% prevalence rate, we can say with 95%

certainty that it is truly a difference and not likely to be a chance occurrence. The possibility that we are wrong is less than 5% (P < .05).

There is another problem, however. If, as we hoped, the true prevalence of angelitis in people taking angel dust is 6%, then the results we will get as we do more studies will not always be exactly 6%. They will follow a normal curve around 6% (Figure 3A).

At a sample size of 750, we thought we were safe, and we are if the result we get is 6% or ver y close to it because that will show a statistical difference. But what if we get a result of 7.5%? It is inside 2 SD of the population curve where 10% is the mean (the curve on the right), so we say it is not significant. But it is also well within the normal distribution of the curve where 6% is the mean (the curve on the left). It could be coming from either population, so we could be making a mistake, a βerror (where we say a difference does not exist when it does).

To avoid this, we must make the normal curves even narrower to decrease the overlap. We want the overlap to be 20% or less (remember the βof 20% and the power of 80% discussed above). Figure 3B shows the effect of increasing the sample size in both popu- lations to 1200. A result of 7.5 is still not statistically significant, but the likelihood that we are making a mistake is less; there is less likelihood that the 7.5 result belongs to the population curve with 6% as its mean. The formulas for calculating sample size can tell exactly when the degree of overlap is 20% or less.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Prevalence %

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Prevalence %

A

B

N=200

N=750 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Prevalence

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Prevalence %

Prevalence %

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Prevalence %

A

B

C

D

N=500

N=1000

N=200

Figure 1. Estimating prevalence: A) True prevalence is 10%. B) Results of repeated samples follow a normal bell curve. C) Larger samples increase height and decrease width of curve. D) Smaller samples decrease height and increase width of curve.

Figure 2. What happens when sample size is increased? A) Normal curve from many 200- person samples. B) Taller, narrower curve from 750-person sample.

(3)

Resources

Ressources

Sometimes it is not possible, for logistical reasons, to increase the sample size. The solution, apparent from the figures, is to accept being able to show statistical significance for a larger difference. If you decide to look for a decrease to 4% prevalence, you would need a smaller sample size because the means of the curves are further apart. You would have to accept the fact that, if you found that angel dust decreased the preva- lence to 6%, you might not be able to say it was statisti- cally significant because of the lower power.

Dr Godwin is an Associate Professor and Director of Research in the Department of Family Medicine at Queen’s University in Kingston, Ont.

References

1. Norman GR, Streiner N. PDQ statistics. Philadelphia, Pa: B.C. Decker Inc; 1986.

2. Abramson JH. Making sense of data: a self instruction manual on the interpreta- tion of epidemiologic data. 2nd ed. New York, NY: Oxford University Press; 1994.

3. Sackett DL, Straus SE, Richardson WS, Rosenberg W, Haynes RB. Evidence- based medicine. How to practice and teach EBM. 2nd ed. Toronto, Ont: Churchill Livingstone; 2000.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Prevalence %

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Prevalence %

A

B

N=750 N=750

N=1200 N=1200

Figure 3. Reducing β error: A) Results that fall within overlap of bell curves could be a mis- take. B) Increasing sample size reduces likelihood of making a mistake.

Références

Documents relatifs

Midzuno (1951) (see also Sen, 1953) proposed a sampling algorithm to select a sample with unequal probabilities, while estimating unbiasedly a ratio.. It may be of interest with

[4] propose another technique based on a variant of the Shallue-Woestijne-Ulas (SWU) function, and explain how to construct secure hash functions to elliptic curves based on

In the second case, no finite population size is assumed, but the gain from future patients is geometrically discounted, so that the gain from patient j if they receive treatment i is

probability, for random variables X with infinite second moment, in the domain of attraction of the normal law, or of the other stable laws, or in?. the partial domain

From this result and considering the Gibbs and Adam analysis 9 , we define the “sample size—temperature” equivalence writing that the entropy of an infinite sample at the

It is most directly observed in the yield stress or the flow stress at small plastic strain, but many authors also report that the work-hardening rate shows the same inverse

The smaller the value of E the greater the sample size required as technically speaking sample error is inversely proportional to the square root of n,

The methods and tools we compare our approach against are (i) uniform sampling with RNAsubopt achieved using Boltzmann sampling at extremely high temperatures (10 6 ◦ C), (ii)