Conference Presentation
Reference
On the Myth and the Reality of the Long-Term Stability of French WISC-IV Scores
KIENG, Sotta, et al.
Abstract
Tests of intelligence are often used for diagnostics and intervention purposes. Beyond these goals, tests of intelligence are used to identify cognitive strengths and weaknesses. These diagnostic applications are based on the hypothesis that intelligence is an enduring trait.
While several studies have investigated short-term stability of intelligence tests scores, few have assessed the long-term stability of tests scores. However, it is essential that diagnostics and intervention are based on stable intelligence tests scores. The objective of this study was to investigate the long-term stability of the French Wechsler intelligence scale for Children – Fourth Edition (WISC-IV) with non-clinical children. To achieve this goal, a test-retest procedure was used. The WISC-IV was administered twice to 250 non-clinical children aged from 7 to 12 years, with an average test-retest interval of 1.84 years (range 1.09 – 3.33). The long-term stability was analyzed according to interindividual stability (mean level of change and stability coefficient: correlation between test and retest scores) and intra-individual stability [...]
KIENG, Sotta, et al . On the Myth and the Reality of the Long-Term Stability of French WISC-IV Scores. In: The 9th Conference of the International Test Commission (ITC) , San
Sebastian (Spain), 2-5 july 2014, 2014, p. 2-29
Available at:
http://archive-ouverte.unige.ch/unige:39021
Disclaimer: layout of this document may differ from the published version.
O N THE MYTH AND THE REALITY OF THE LONG - TERM STABILITY OF F RENCH WISC - IV SCORES *
Sotta Kieng (University of Geneva) Nicolas Favez (University of Geneva) Jérôme Rossier (University of Lausanne) Sophie Geistlich (University of Geneva) Thierry Lecerf (University of Geneva)
*This work was supported by Grant 100014_135406 awarded by the Swiss
National Science Foundation (Long-term stability of the WISC-IV: Standard and
CHC composite scores, Lecerf, Favez & Rossier)
O VERVIEW
Introduction
Objectives of the study
Method
Results
Comparisons between studies
Conclusion 1
I NTRODUCTION
As intelligence is presumed to be an enduring
trait, intelligence test scores should be stable over time
Individual tests of intelligence are often used to diagnose and to guide interventions
While several studies have investigated short-
term stability of intelligence tests scores, only few have assessed the long-term stability
The Wechsler Intelligence Scale for Children- Fourth Edition – WISC-IV – is among the most widely used tests to assess the cognitive abilities of children
3
STRUCTURE OF THE WISC - IV (1)
4
FSIQ
VCI
Similarities Vocabulary Comprehension
Information Word reasoning
PRI
Block design Picture concepts Matrix reasoning Picture completion
WMI
Digit span Letter-number seq.
Arithmetic
PSI
Coding Symbol search
Cancellation
Verbal Comprehension Index
verbal reasoning and comprehension
Perceptual Reasoning Index
fluid reasoning in the perceptual domain
Working Memory Index
ability to hold information in mind temporarily and
perform some operation with it Processing Speed Index
ability to quickly perform
simple cognitive tasks
STRUCTURE OF THE WISC - IV (2)
5
GAI
FSIQ
VCI
Similarities Vocabulary Comprehension
PRI
Block design Picture concepts Matrix reasoning
WMI
Digit span Letter-number seq.
PSI
Coding Symbol search
CPI
General Ability Index Knowledge and Problem- solving index ;
(Prifitera, Weiss & Saklofske, 1998; Raiford, Weiss,
Rolfhus, Coalson, 2005;
Sattler & Ryan, 2009)
Cognitive Proficiency Index
Sustained attention and psychomotor speed index (Sattler, & Ryan, 2009; Weiss
& Gabel, 2006, Weiss & al., 2006)
Note: Supplemental subtests are in italics
O BJECTIVES OF THE STUDY (1)
Investigating the long-term stability of the French Wechsler Intelligence Scale for
Children – Fourth Edition (WISC-IV)
Test-Retest procedure
Non clinical children
Interval Test-Retest > 1 year
6
O BJECTIVES OF THE STUDY (2)
To our knowledge, only 2 studies have focused on the long-term stability of the WISC-IV
scores:
Lander (2010)
N = 131
American children aged 6 to 13 years
With learning or/and emotional disabilities
Average Test-Retest interval of 2.89 years
Watkins & Smith (2013)
N = 344
American children aged 6 to 16 years
With difficulties eligible for special education
Average Test-Retest interval of 2.84 years
7
M ETHOD (1)
Participants
250 non-clinical French-speaking Swiss children aged from 7 to 12 years
Instrument:
French WISC-IV
Individually administered in a classroom
Administration of the 10 core subtests
8
M ETHOD (2)
Sample of 250 non-clinical children
9
First Testing Second Testing Total Mean Age (SD) Mean Age (SD)
Boys 8.37 (.90) 10.25 (1.18) 120 Girls 8.48 (.80) 10.35 (1.12) 130 Total 8.42 (.85) 10.30 (1.14) 250
Test-Retest interval
A Test-Retest interval shorter than 1 year is generally considered as “short-term stability” (Sattler, 2008)
M (SD) Min Max
Test-Retest Interval 1.84 (.54) 1.09 3.33
E VALUATION OF LONG - TERM STABILITY
Mean level of change (group level)
Stability of mean group across time ?
Rank order consistency (differential stability)
Stability of interindividual differences across time ?
Intraindividual difference in change
Stability of intraindividual level of performance across time ?
What percentage of children are stable across time?
10
M EAN LEVEL OF CHANGE - RESULTS (1)
Mean level of change
Interindividual comparisons
Is there significant mean differences between first and second testing?
Is there practice effect?
11
M EAN LEVEL OF CHANGE - RESULTS (2)
12
Test Retest Δ M p d
M (SD) M (SD)
VCI 105.20 (14.70) 104.80 (14.83) -.4 .51 -.03
PRI 99.28 (14.62) 100.54 (13.71) +1.26 .06 .09
WMI 94.89 (14.38) 97.42 (14.20) +2.53 < .05 .18 PSI 101.56 (14.00) 107.50 (13.97) +5.94 < .01 .42 FSIQ 100.97 (13.91) 103.26 (12.69) +2.65 < .05 .20
GAI 102.85 (14.41) 103.26 (13.31) +.41 .43 .03
CPI 97.73 (13.94) 103.09 (13.61) +5.36 < .01 .39
Significant increase between both assessments
M EAN LEVEL OF CHANGE - RESULTS (3)
Children with higher FSIQ at the first assessment benefit less of the second assessment than
children with lower FSIQ
FSIQ T1 x Test-Retest interval = - .45**
Is the duration of Test-Retest interval related to the magnitude of the practice effect?
Test-Retest interval x Δ FSIQ = -.19**
The longer is the duration of the Test-Retest
interval, the lower is the gain of FSIQ scores 13
M EAN LEVEL OF CHANGE – RESULTS (4)
14 -30
-20 -10 0 10 20 30
12 17 22 27 32 37
de lta F SI Q
M EAN LEVEL OF CHANGE - RESULTS (5)
Is there practice effect after 12 months?
15
Test-Retest interval
13-18 months 19-26 months 27-39 months
Mean age T1 8.07 8.80 8.48
Mean age T2 9.37 10.63 11.09
Boys 44 36 40
Girls 49 42 39
Total 93 78 79
R ESULTS (6)
16 75 85
105 95 115
VCI PRI WMI PSI FSIQ GAI CPI
13-18 months (N = 93)
First Testing Second Testing
75 85 105 95 115 125
VCI PRI WMI PSI FSIQ GAI CPI
19-26 months (N = 78)
First Testing Second Testing
* * * *
80 90 100 110 120
VCI PRI WMI PSI FSIQ GAI CPI
27-39 months (N = 79)
First Testing Second Testing
In our data, it seems that more than 2 years is necessary to
* * * *
d = .15 d = .39 d = .21 d = .33
d = .41 d = .74 d = .42 d = .77
R ANK ORDER CONSISTENCY – RESULTS (1)
Rank order consistency (differential stability)
Interindividual comparisons
Is stability coefficient r ≥ .70 (for research purpose)?
R ANK ORDER CONSISTENCY - RESULTS (2)
18
VCI PRI WMI PSI FSIQ GAI CPI r c .80 .71 .67 .65 .81 .82 .69
Note: correlations corrected for the variability of standardization sample (Allen &
Yen, 1979; Magnusson, 1967).
VCI, PRI, FSIQ and GAI index scores have
satisfactory stability coefficients
I NTRAINDIVIDUAL DIFFERENCES IN CHANGE – RESULTS (1)
Intraindividual differences in change
Intraindividual comparisons
Is performance at the second testing within ± 2
standard errors of measurement (SEM) of the first testing?
Performances between both assessments are considered
“stable” if more than 70% of children are within ±2 SEM
Are performances at the first testing and the second testing included in the same normative category?
Performances between both assessments are considered
“stable” if more than 70% of children are within the same normative category
19
I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - RESULTS (2)
20
Traditional descriptive System for WISC-IV (7 categories)
Standard score range Description of performance
≥ 130 Very superior
120-129 Superior
110-119 High average
90-109 Average
80-89 Low average
70-79 Borderline
≤ 69 Extremely low
Source: Table 6.3, Wechsler, 2003.
I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - RESULTS (3)
Standard score range
Descriptive
classification Description of performance
≥ 116 Above average
Normative Strength (16% of population)
85-115 Average range
Within Normal Limits (68% of population)
≤ 84 Below average
Normative Weakness (16% of population)
21
Normative descriptive System (3 categories)
Source: Rapid Reference 4.3, Flanagan & Kaufman, 2009.
I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - RESULTS (4)
22
% of “stable”
individuals within ±2 SEM 1
% of “stable” individuals according to normative descriptive system (3 cat.)
% of “stable” individuals according to traditional descriptive system (7 cat.)
VCI 68.8 71.6 49.6
PRI 70.4 73.2 48.0
WMI 65.2 74.0 47.2
PSI 57.6 72.0 40.8
FSIQ 58.4 76.8 54.4
GAI 72.8 80.4 57.6
CPI 56.4 68.8 44.0
1
Standard error of measurement.
Only PRI and GAI index scores are stable at an intraindividual
level between first and second assessment
Second Testing
FSIQ ≤ 84 85 ≤ FSIQ ≤ 115 FSIQ ≥ 116 Total First Testing
FSIQ ≤ 84 11 21 0 32
I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - RESULTS (5)
23
Focus on «below average» children (FSIQ < 85 at the first testing; N=32)
For FSIQ scores:
34.4% of children are within the same “below average” category between both assessments
66.6% of children change to “average range” at
the second testing
Second Testing
GAI ≤ 84 85 ≤ FSIQ ≤ 115 GAI ≥ 116 Total First Testing
GAI ≤ 84 14 12 0 26
I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - RESULTS (6)
24
Focus on «below average» children (GAI < 85 at the first testing; N=26)
For GAI scores:
53.8% of children are within the same below
“average category” between both assessments
46.1% of children change to “average range” at
the second testing
C OMPARISONS BETWEEN STUDIES (1)
25
Stability coefficient for WISC-IV index scores
Short-term stability (< 1 year) Long-term stability (> 1 year) Wechsler
(2003) (US, N=243)
Wechsler (2005) (FR, N=93)
Ryan & al.
(2010) (US, N=43)
Lander (2010) 1 (US, N=131)
Watkins &
Smith (2013) (US, N=344)
Kieng & al.
(2014) (CH, N=250)
VCI .93 .88 .76 .65 .78 .80
PRI .89 .83 .68 .62 .76 .71
WMI .89 .78 .75 .54 .70 .67
PSI .86 .83 .54 .52 .65 .65
FSIQ .93 .91 .88 .70 .84 .81
1
uncorrected test-retest correlations.
C OMPARISONS BETWEEN STUDIES (2)
26
Lander (2010)
(US, N=131) Clinical children
Kieng & al. (2014) (CH, N=250) Non clinical children
VCI 78 68.8
PRI 78 70.4
WMI 73 65.2
PSI 70 57.6
FSIQ 73 58.4
GAI - 72.8
CPI - 56.4
Percentage of stable children (within ± 2 SEM)
C OMPARISONS BETWEEN STUDIES (3)
27
Percentage of stable performances ( ± 10 points)
Watkins & Smith (2013) (US, N = 344)
Clinical children
Kieng & al. (2014) (CH, N = 250)
Non clinical children
VCI 71 68.8
PRI 61 62
WMI 63 64
PSI 56 44.8
FSIQ 75 72.8
GAI - 74.8
CPI - 56.4
C ONCLUSION
28
These results from a group of 250 non clinical children suggest that Perceptual Reasoning (PRI) and General Ability Index (GAI) have an acceptable predictive value
Potential clinical relevance to use GAI rather than FSIQ
Like Watkins & Smith (2013), our results show that almost 30% of children earned FSIQ scores that differed by 10 or more points between both assessments
According to Canivez & Watkins (2001), the effect sizes are quite small when practice effect is observed in long-term stability (> 1 year)
However, we find moderate effect sizes for PSI and CPI,
suggesting practice effect even if test-retest interval exceeded 1 year
Unlike the test’s publisher, we don't recommend traditional descriptive system; normative descriptive system is
more useful since it allows better predictions
Thank you for your attention !
29
REFERENCES
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Canivez, G. L., & Watkins, M. W. (2001). Long-term stability of the Wechsler Intelligence Scale for Children--Third Edition among students with disabilities. School Psychology Review, 30(3), 438-453.
Deary, I. J., Whalley, L. J., Lemmon, H., Crawford, J. R., & Starr, J. M. (2000). The Stability of Individual Differences in Mental Ability from Childhood to Old Age: Follow-up of the 1932 Scottish Mental Survey. Intelligence, 28(1), 49-55.
Flanagan, D. P., & Kaufman, A. S. (2009). Essentials of WISC®-IV Assessment. Second Edition. USA: Wiley.
Golay, P., & Lecerf, T. (2011). Orthogonal higher order structure and confirmatory factor analysis of the French Wechsler Adult Intelligence Scale (WAIS-III). Psychological Assessment, 23(1), 143-152. doi: 10.1037/a0021230
Kieng, S., Rossier, J., Favez, N., & Lecerf, T. (2013). Étude exploratoire de la stabilité à long terme des indices standard du WISC-IV. Pratiques Psychologiques, 19(3), 163-178. doi: http://dx.doi.org/10.1016/j.prps.2013.07.003
Lander, J. (2010). Long-term stability of scores on the Wechsler Intelligence Scale for Children- fourth edition in children with learning disabilities. (71), ProQuest Information & Learning, US. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&db=psyh&AN=2010-99220-484&site=ehost- live Available from EBSCOhost psyh database.
Lecerf, T., Reverte, I., Coleaux, L., Favez, N., & Rossier, J. (2010). Indice d’aptitude général pour le WISC-IV: Normes francophones. Pratiques Psychologiques, 16(1), 109-121. doi: 10.1016/j.prps.2009.04.001
Lecerf, T., Reverte, I., Coleaux, L., Maillard, F., Favez, N., & Rossier, J. (2011). Indice d’aptitude général et indice de compétence cognitive pour le WISC-IV: Normes empiriques versus normes statistiques. European Review of Applied Psychology/Revue Européenne de Psychologie Appliquée, 61(2), 115-122. doi: 10.1016/j.erap.2011.01.001
Magnusson, D. (1967). Test theory. Reading, MA: Addison-Wesley Publishing.
Prifitera, A., Weiss, L. G., & Saklofske, D. H. (1998). The WISC-III in context. In A. A. Prifitera & D. H. Saklofske (Eds.), WISC-III Clinical Use and Interpretation: Scientific-practitioner perspectives. San Diego: Elsevier Academic Press.
Raiford, S. E., Weiss, P. D. L. G., Rolfhus, P. D. E., & Coalson, P. D. D. (2005). General Ability Index: Hartcourt Assessment, Technical Report.
Reverte, I., Golay, P., Favez, N., Rossier, J., & Lecerf, T. (2014). Structural validity of the Wechsler Intelligence Scale for Children (WISC-IV) in a French-speaking Swiss sample. Learning and Individual Differences, 29(0), 114-119. doi: http://dx.doi.org/10.1016/j.lindif.2013.10.013
Ryan, J. J., Glass, L. A., & Bartels, J. M. (2010). Stability of the WISC-IV in a sample of elementary and middle school children. Applied Neuropsychology, 17(1), 68-72. doi: 10.1080/09084280903297933
Sattler, J. M. (2008). Assessment of children. Cognitive foundations. Fifth Edition. San Diego: Jerome M, Publisher, Inc.
Sattler, J. M., & Ryan, J. J. (2009). Assessment with the WAIS-IV. La Mesa: Jerome M. Sattler, Publisher, Inc.
Watkins, M. W., & Smith, L. G. (2013). Long-term stability of the Wechsler Intelligence Scale for Children—Fourth Edition. Psychological Assessment, 25(2), 477-483. doi: 10.1037/a0031653
Wechsler, D. (2003). Manual for the Wechsler Intelligence scale for children - Fourth edition. San Antonio: Psychological Corporation.
Wechsler, D. (2005). Manuel de l'Echelle d'Intelligence de Wechsler pour Enfants - 4e édition. Paris: Editions du Centre de Psychologie Appliquée.
Weiss, L., & Gabel, A. D. (2008). WISC-IV technical report #6: Using the Cognitive Proficiency Index in psychoeducational assessment. Upper Saddle River, NJ: Pearson Education, Inc.