Hypothesis: the research page. Part 2: Confidence intervals and P values.

Download (0)

Full text


1044 Canadian Family PhysicianLe Médecin de famille canadienVOL 47: MAY • MAI 2001




his second article in a series on basic statistics discusses confidence inter vals (CI) and P val- ues.1-3Odds ratios (OR) and relative risks (RR) tell us how much more likely an event is to occur in one situ- ation than another. But how accurate are OR and RR?

If we repeated a study, would we get the same result?

We need to know how close the result is to the truth, or least what the range of possible “truths” might be.

Confidence inter vals

The range of possible truths is the confidence inter- val. The smaller the range, the closer we are to the truth. Generally, researchers use a 95% CI. We can be 95% certain that the true value lies within the range of values given by the 95% CI. This is not a strictly cor- rect definition of a CI, but “little is lost [using this]

less pure interpretation.”3 A 100% CI is not used because its limits would be infinity on either side, and that would be the same as not having a CI at all.

P values and confidence inter vals

If a study finds a difference between two groups, there are three possible explanations: the difference

is real; the difference is due to chance; or the differ- ence is due to bias or confounding caused by study design. We want the first explanation to be correct.

To be certain, we have to rule out the other two pos- sibilities. Bias and confounding are addressed in the study design or by controlling for factors statistically.

To determine whether the dif ference was due purely to chance, CI and P values are used. P values tell us only the probability that the dif ference occurred by chance. Confidence inter vals, in addi- tion to providing an estimate of the probability that the difference occurred by chance, also provide the range of possible differences. They give us a sense of how uncertain we are about the results.

If we are 95% sure that the difference between two groups is real and not due to chance, then that differ- ence is said to be “significant.” Of course, if we are 95% sure the difference is not due to chance, there remains a 5% (.05) possibility that the difference is due to chance. This possibility is the P value. It seems that 5% (P < .05) has been chosen by convention as the level of risk we are willing to tolerate. We could use 10% (P < .10), but we would be taking a greater risk

Part 2: Confidence intervals and P values

Marshall Godwin, MD, CCFP, FCFP

Hypothesis: The Research Page

Study 1 Study 2 Study 3 Study 4

Study 5

RR = 2.3, 95% CI 0.9-4.8, P = .12, N = 400

RR = 8.0, 95% CI .97-17.3, P = .07, N = 79 RR = 1, 95% CI 0.5-2.0, P = 1, N = 800

RR = 3.4, 95% CI 1.4-5.4, P = .009, N = 700 RR = 0.9, 95% CI 0.3-1.9, P = .23, N = 500

.05 0.1 1 10 20 RELATIVE RISK

Figure 1.Results of five fictitious studies given as relative risks, 95% confidence inter vals, and P values


VOL 47: MAY • MAI 2001Canadian Family PhysicianLe Médecin de famille canadien 1045



that the difference was due to chance. We could use 1% (P < .01), but then we would need to have a much bigger difference to be able to “prove” it was signifi- cant, so we might miss some important differences.

We use 95% CI and we choose 95% (P < .05) as the level of certainty that a result is not due to chance.

It turns out that, if the equivalency point is included within the 95% CI, then the P value is not significant (P≥.05). The equivalency point is the value for a statistical measure when two groups are considered to be the same. For RR and OR this value is 1; for means and relative risk reductions (RRR), the equivalency point is 0. Suppose we have the follow- ing result: RR = 2.8, 95% CI 1.7 to 3.9. Unless we do more calculations, we do not know the exact P value. We do know that it is < .05 because the inter- val from 1.7 to 3.9 does not include the value 1, the equivalency point for RR.

Consider the series of fabricated RR, 95% CI, and P values shown in Figure 1. Study 1 says that smok- ers are 2.3 times as likely to develop hypertension as non-smokers. The result is not significant, however, because the 95% CI includes the number 1 and actu- ally goes down to 0.9, meaning that smokers might be less likely to develop hypertension. Nevertheless, most of the CI is above 1, and smokers might be as much as 4.8 times as likely to develop hypertension as non-smokers. The study was relatively small with a sample size of 400. A larger study would likely

“tighten up” this CI and give a significant result with a P value of < .05 and a CI that does not include 1.

The sample size issue is demonstrated well in study 2, which has a large effect size (RR = 8). The sample size is so small, however, that statistical significance is

not achieved. Study 3 is the perfect non-significant study. Study 4 has highly significant results that can be accepted with great confidence. Its sample size is ade- quate, its CI is small, its effect size is large, and its P value is tiny. The results of study 5 are not significant, and the main effect is opposite to what was expected.

The RRs in Figure 1 are from cohort studies looking at exposures causing increases in some condition. The RR in randomized controlled trials tend to have 95% CI to the left of the equivalency point (1) because a decrease in the event rate is expected. In fact, study 5 might resemble the result of a randomized controlled trial if the treatment tended toward an effect but did not reach significance.


Relative risk and OR tell us the size of the difference between groups, CI tells us how precise the result is and whether other conclusions are possible, and P values tell us the likelihood that the difference is due to chance. Having all three pieces of information allows us to interpret results more clearly.

Dr Godwin is an Associate Professor and Research Director in the Department of Family Medicine at Queen’s University in Kingston, Ont, and is currently on sabbatical at the Centre for Evidence-Based Medicine in Oxford, England.


1. Norman GR, Streiner N. PDQ statistics. Philadelphia, Pa: B.C. Decker Inc; 1986.

2. Abramson JH. Making sense of data: a self instruction manual on the interpretation of epidemiologic data. 2nded. New York, NY: Oxford University Press; 1994.

3. Sackett DL, Straus SE, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine. How to practice and teach EBM. 2nded. Toronto, Ont: Churchill Livingstone; 2000.




Related subjects :