• Aucun résultat trouvé

SEVEN: A COMMENTARY REGARDING CRONBACH’S COEFFICIENT ALPHA

A population of seven people took a seven-item test, for which each item is scored on a seven-point scale. Here are the raw data:

ID item1 item2 item3 item4 item5 item6 item7 total 1 1 1 1 1 1 1 1 7 2 2 2 2 2 2 3 3 16 3 3 4 6 7 7 4 5 36 4 4 7 5 3 5 7 6 37 5 5 6 4 6 4 5 2 32 6 6 5 7 5 3 2 7 35 7 7 3 3 4 6 6 4 33 Here are the inter-item correlations and the correlations between each of the items and the total score:

item1 item2 item3 item4 item5 item6 item7 item2 0.500

item3 0.500 0.714

item4 0.500 0.536 0.750

item5 0.500 0.464 0.536 0.714

item6 0.500 0.643 0.214 0.286 0.714

item7 0.500 0.571 0.857 0.393 0.464 0.286

total 0.739 0.818 0.845 0.772 0.812 0.673 0.752

The mean of each of the items is 4 and the standard deviation is 2 (with division by N, not N-1; these are data for a population of people as well as a population of items). The inter-item correlations range from .214 to .857 with a mean of .531.

[The largest eigenvalue is 4.207. The next largest is 1.086.] The range of the item-to-total correlations is from .673 to .845. Cronbach’s alpha is .888. Great test (at least as far as internal consistency is concerned)? Perhaps; but there is at least one problem. See if you can guess what that is before you read on.

While you’re contemplating, let me call your attention to seven interesting sources that discuss Cronbach’s alpha (see References for complete citations):

1. Cronbach’s (1951) original article (naturally).

2. Knapp (1991).

3. Cortina (1993).

OK. Now back to our data set. You might have already suspected that the data are artificial (all of the items having exactly the same means and standard

deviations, and all of items 2-7 correlating .500 with item 1). You’re right; they are; but that’s not what I had in mind. You might also be concerned about the seven-point scales (ordinal rather than interval?). Since the data are artificial, those scales can be anything we want them to be. If they are Likert-type scales they are ordinal. But they could be something like “number of days per week”

that something happened, in which case they are interval. In any event, that’s also not what I had in mind. You might be bothered by the negative skewness of the total score distribution. I don’t think that should matter. And you might not like the smallness (and the “seven-ness”? I like sevens…thus the title of this chapter) of the number of observations. Don’t be. Once the correlation matrix has been determined, the N is not of direct relevance. (The “software” doesn’t know or care what N is at that point.) Had this been a sample data set, however, and had we been interested in the statistical inference from a sample Cronbach’s alpha to the Cronbach’s alpha in the population from which the sample has been drawn, the N would be of great importance.

What concerns me is the following:

The formula for Cronbach’s alpha is kravg /[1 + (k-1)ravg ], where k is the number of items and ravg is the average (mean) inter-item correlation, when all of the items have equal variances (which they do in this case) and is often a good

approximation to Cronbach’s alpha even when they don’t. (More about this later.) Those r’s are Pearson r’s, which are measures of the direction and magnitude of the LINEAR relationship between variables. Are the relationships linear?

I have plotted the data for each of the items against the other items. There are 21 plots (the number of combinations of seven things taken two at a time). Here is the first one.

item2 - *

6.0+ *

- *

4.0+ *

- *

2.0+ *

- *

----+---+---+---+---+---+--item1 1.2 2.4 3.6 4.8 6.0 7.2

I don’t know about you, but that plot looks non-linear, almost parabolic, to me, even though the linear Pearson r is .500. Is it because of the artificiality of the data, you might ask. I don’t think so. Here is a set of real data (item scores that I have excerpted from my daughter Katie’s thesis (Knapp, 2010)): [They are the responses by seven female chaplains in the Army Reserves to the first seven items of a 20-item test of empathy.]

ID item1 item2 item3 item4 item5 item6 item7 total 1 5 7 6 6 6 6 6 42 2 1 7 7 5 7 7 7 41 3 6 7 6 6 6 6 6 43 4 7 7 7 6 7 7 6 47 5 2 6 6 6 7 6 5 38 6 1 1 3 4 5 6 5 25 7 2 5 3 6 7 6 6 35

Here are the inter-item correlations and the correlation of each item with the total score:

item1 item2 item3 item4 item5 item6 item7 item2 0.566

item3 0.492 0.826

item4 0.616 0.779 0.405

item5 0.060 0.656 0.458 0.615

item6 0.156 0.397 0.625 -0.062 0.496

item7 0.138 0.623 0.482 0.175 0.439 0.636

total 0.744 0.954 0.855 0.746 0.590 0.506 0.566

Except for the -.062 these correlations look a lot like the correlations for the artificial data. The inter-item correlations range from that -.062 to .826, with a mean of .456. [The largest eigenvalue is 3.835 and the next-largest eigenvalue is 1.479] The item-to-total correlations range from .506 to .954. Cronbach’s alpha is .854. Another great test?

But how about linearity? Here is the plot for item2 against item1 for the real data.

item2 - * * * *

6.0+ *

- *

4.0+

2.0+

- *

----+---+---+---+---+---+--item1 1.2 2.4 3.6 4.8 6.0 7.2

That’s a worse, non-linear plot than the plot for the artificial data, even though the linear Pearson r is a respectable .566.

Going back to the formula for Cronbach’s alpha that is expressed in terms of the inter-item correlations, it is not the most general formula. Nor is it the one that Cronbach generalized from the Kuder-Richardson Formula #20 (Kuder &

Richardson, 1937) for dichotomously-scored items. The formula that always

“works” is: α = [k/(k-1)]{1-(∑σi 22)}, where k is the number of items, σi 2 is the variance of item i (for i=1,2,…,k) and σ2 is the variance of the total scores. For the artificial data, that formula yields the same value for Cronbach’s alpha as before, i.e., .888, but for the real data it yields a value of .748, which is lower than the .854 previously obtained. That happens because the item variances are not equal, ranging from a low of .204 (for item #6) to a high of 5.387 (for item #1).

The item variances for the artificial data were all equal to 4.

So what? Although the most general formula was derived in terms of inter-item covariances rather than inter-item correlations, there is still the (hidden?)

assumption of linearity.

ordinal reliability coefficient advocated by Gadermann, et al. (2012). There are tests of linearity for sample data, but this chapter is concerned solely with the internal consistency of a measuring instrument when data are available for an entire population of people and an entire population of items (however rare that situation might be).

References

Cortina, J. M. (1993). What is coefficient alpha? An examination of theory and applications. Journal of Applied Psychology, 78, 98-104.

Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests.

Psychometrika, 16, 297-334.

Cronbach, L. J. (2004). My current thoughts on coefficient alpha and successor procedures. Educational and Psychological Measurement, 64, 391-418. [This article was published after Lee Cronbach’s death, with extensive editorial assistance provided by Richard Shavelson.]

Gadermann, A.M., Guhn, M., & Zumbo, B.D. (2012). Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual, empirical, and practical guide. Practical Assessment, Research, & Evaluation, 17 (3), 1-13.

Knapp, K. (2010). The metamorphosis of the military chaplaincy: From hierarchy of minister-officers to shared religious ministry profession.

Unpublished D.Min. thesis, Barry University, Miami Shores, FL.

Knapp, T.R. (1991). Coefficient alpha: Conceptualizations and anomalies.

Research in Nursing & Health, 14, 457-460. [See also Errata, op. cit., 1992, 15, 321.]

Kuder, G.F., & Richardson, M.W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151-160.

Sijtsma, K. (2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74, 107-120.

Tan, S. (2009), Misuses of KR-20 and Cronbach's Alpha reliability coefficients.

Education and Science, 34 (152), 101-112.

CHAPTER 8: ASSESSING THE VALIDITY AND RELIABILITY OF LIKERT