On the Myth and the Reality of the Long-Term Stability of French WISC-IV Scores

(1)

Conference Presentation

Reference

On the Myth and the Reality of the Long-Term Stability of French WISC-IV Scores

KIENG, Sotta, et al.

Abstract

Tests of intelligence are often used for diagnostics and intervention purposes. Beyond these goals, tests of intelligence are used to identify cognitive strengths and weaknesses. These diagnostic applications are based on the hypothesis that intelligence is an enduring trait.

While several studies have investigated short-term stability of intelligence tests scores, few have assessed the long-term stability of tests scores. However, it is essential that diagnostics and intervention are based on stable intelligence tests scores. The objective of this study was to investigate the long-term stability of the French Wechsler intelligence scale for Children – Fourth Edition (WISC-IV) with non-clinical children. To achieve this goal, a test-retest procedure was used. The WISC-IV was administered twice to 250 non-clinical children aged from 7 to 12 years, with an average test-retest interval of 1.84 years (range 1.09 – 3.33). The long-term stability was analyzed according to interindividual stability (mean level of change and stability coefficient: correlation between test and retest scores) and intra-individual stability [...]

KIENG, Sotta, et al . On the Myth and the Reality of the Long-Term Stability of French WISC-IV Scores. In: The 9th Conference of the International Test Commission (ITC) , San

Sebastian (Spain), 2-5 july 2014, 2014, p. 2-29

Available at:

http://archive-ouverte.unige.ch/unige:39021

Disclaimer: layout of this document may differ from the published version.

(2)

O N THE MYTH AND THE REALITY OF THE LONG - TERM STABILITY OF F RENCH WISC - IV SCORES *

Sotta Kieng (University of Geneva) Nicolas Favez (University of Geneva) Jérôme Rossier (University of Lausanne) Sophie Geistlich (University of Geneva) Thierry Lecerf (University of Geneva)

*This work was supported by Grant 100014_135406 awarded by the Swiss

National Science Foundation (Long-term stability of the WISC-IV: Standard and

CHC composite scores, Lecerf, Favez & Rossier)

(3)

O VERVIEW

 Introduction

 Objectives of the study

 Method

 Results

 Comparisons between studies

 Conclusion ¹

(4)

I NTRODUCTION

 As intelligence is presumed to be an enduring

trait, intelligence test scores should be stable over time

 Individual tests of intelligence are often used to diagnose and to guide interventions

 While several studies have investigated short-

term stability of intelligence tests scores, only few have assessed the long-term stability

 The Wechsler Intelligence Scale for Children- Fourth Edition – WISC-IV – is among the most widely used tests to assess the cognitive abilities of children

3

(5)

STRUCTURE OF THE WISC - IV (1)

4 FSIQ

VCI

Similarities Vocabulary Comprehension

Information Word reasoning

PRI

Block design Picture concepts Matrix reasoning Picture completion

WMI

Digit span Letter-number seq.

Arithmetic

PSI

Coding Symbol search

Cancellation

Verbal Comprehension Index

 verbal reasoning and comprehension

Perceptual Reasoning Index

 fluid reasoning in the perceptual domain

Working Memory Index

 ability to hold information in mind temporarily and

perform some operation with it Processing Speed Index

 ability to quickly perform

simple cognitive tasks

(6)

STRUCTURE OF THE WISC - IV (2)

5 GAI

FSIQ

VCI

Similarities Vocabulary Comprehension

PRI

Block design Picture concepts Matrix reasoning

WMI

Digit span Letter-number seq.

PSI

Coding Symbol search

CPI

General Ability Index  Knowledge and Problem- solving index ;

(Prifitera, Weiss & Saklofske, 1998; Raiford, Weiss,

Rolfhus, Coalson, 2005;

Sattler & Ryan, 2009)

Cognitive Proficiency Index

 Sustained attention and psychomotor speed index (Sattler, & Ryan, 2009; Weiss

& Gabel, 2006, Weiss & al., 2006)

Note: Supplemental subtests are in italics

(7)

O BJECTIVES OF THE STUDY (1)

Investigating the long-term stability of the French Wechsler Intelligence Scale for

Children – Fourth Edition (WISC-IV)

 Test-Retest procedure

 Non clinical children

 Interval Test-Retest > 1 year

6

(8)

O BJECTIVES OF THE STUDY (2)

 To our knowledge, only 2 studies have focused on the long-term stability of the WISC-IV

scores:

 Lander (2010)

 N = 131

 American children aged 6 to 13 years

 With learning or/and emotional disabilities

 Average Test-Retest interval of 2.89 years

 Watkins & Smith (2013)

 N = 344

 American children aged 6 to 16 years

 With difficulties eligible for special education

 Average Test-Retest interval of 2.84 years

7

(9)

M ETHOD (1)

 Participants

 250 non-clinical French-speaking Swiss children aged from 7 to 12 years

 Instrument:

 French WISC-IV

 Individually administered in a classroom

 Administration of the 10 core subtests

8

(10)

M ETHOD (2)

 Sample of 250 non-clinical children

9 First Testing Second Testing Total Mean Age (SD) Mean Age (SD)

Boys 8.37 (.90) 10.25 (1.18) 120 Girls 8.48 (.80) 10.35 (1.12) 130 Total 8.42 (.85) 10.30 (1.14) 250

 Test-Retest interval

 A Test-Retest interval shorter than 1 year is generally considered as “short-term stability” (Sattler, 2008)

M (SD) Min Max

Test-Retest Interval 1.84 (.54) 1.09 3.33

(11)

E VALUATION OF LONG - TERM STABILITY

 Mean level of change (group level)

 Stability of mean group across time ?

 Rank order consistency (differential stability)

 Stability of interindividual differences across time ?

 Intraindividual difference in change

 Stability of intraindividual level of performance across time ?

 What percentage of children are stable across time?

10

(12)

M EAN LEVEL OF CHANGE - RESULTS (1)

 Mean level of change

 Interindividual comparisons

 Is there significant mean differences between first and second testing?

 Is there practice effect?

11

(13)

M EAN LEVEL OF CHANGE - RESULTS (2)

12 Test Retest Δ M p d

M (SD) M (SD)

VCI 105.20 (14.70) 104.80 (14.83) -.4 .51 -.03

PRI 99.28 (14.62) 100.54 (13.71) +1.26 .06 .09

WMI 94.89 (14.38) 97.42 (14.20) +2.53 < .05 .18 PSI 101.56 (14.00) 107.50 (13.97) +5.94 < .01 .42 FSIQ 100.97 (13.91) 103.26 (12.69) +2.65 < .05 .20

GAI 102.85 (14.41) 103.26 (13.31) +.41 .43 .03

CPI 97.73 (13.94) 103.09 (13.61) +5.36 < .01 .39

 Significant increase between both assessments

(14)

M EAN LEVEL OF CHANGE - RESULTS (3)

 Children with higher FSIQ at the first assessment benefit less of the second assessment than

children with lower FSIQ

 FSIQ T1 x Test-Retest interval = - .45**

 Is the duration of Test-Retest interval related to the magnitude of the practice effect?

 Test-Retest interval x Δ FSIQ = -.19**

 The longer is the duration of the Test-Retest

interval, the lower is the gain of FSIQ scores ¹³

(15)

M EAN LEVEL OF CHANGE – RESULTS (4)

14 -30

-20 -10 0 10 20 30

12 17 22 27 32 37

de lta F SI Q

(16)

M EAN LEVEL OF CHANGE - RESULTS (5)

 Is there practice effect after 12 months?

15 Test-Retest interval

13-18 months 19-26 months 27-39 months

Mean age T1 8.07 8.80 8.48

Mean age T2 9.37 10.63 11.09

Boys 44 36 40

Girls 49 42 39

Total 93 78 79

(17)

R ESULTS (6)

16 75 85

105 95 115

VCI PRI WMI PSI FSIQ GAI CPI

13-18 months (N = 93)

First Testing Second Testing

75 85 105 95 115 125

VCI PRI WMI PSI FSIQ GAI CPI

19-26 months (N = 78)

First Testing Second Testing

* * * *

80 90 100 110 120

VCI PRI WMI PSI FSIQ GAI CPI

27-39 months (N = 79)

First Testing Second Testing

 In our data, it seems that more than 2 years is necessary to

* * * *

d = .15 d = .39 d = .21 d = .33

d = .41 d = .74 d = .42 d = .77

(18)

R ANK ORDER CONSISTENCY – RESULTS (1)

 Rank order consistency (differential stability)

 Interindividual comparisons

 Is stability coefficient r ≥ .70 (for research purpose)?

(19)

R ANK ORDER CONSISTENCY - RESULTS (2)

18 VCI PRI WMI PSI FSIQ GAI CPI r _c .80 .71 .67 .65 .81 .82 .69

Note: correlations corrected for the variability of standardization sample (Allen &

Yen, 1979; Magnusson, 1967).

 VCI, PRI, FSIQ and GAI index scores have

satisfactory stability coefficients

(20)

I NTRAINDIVIDUAL DIFFERENCES IN CHANGE – ^RESULTS (1)

 Intraindividual differences in change

 Intraindividual comparisons

 Is performance at the second testing within ± 2

standard errors of measurement (SEM) of the first testing?

 Performances between both assessments are considered

“stable” if more than 70% of children are within ±2 SEM

 Are performances at the first testing and the second testing included in the same normative category?

 Performances between both assessments are considered

“stable” if more than 70% of children are within the same normative category

19

(21)

I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - ^RESULTS (2)

20 Traditional descriptive System for WISC-IV (7 categories)

Standard score range Description of performance

≥ 130 Very superior

120-129 Superior

110-119 High average

90-109 Average

80-89 Low average

70-79 Borderline

≤ 69 Extremely low

Source: Table 6.3, Wechsler, 2003.

(22)

I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - ^RESULTS (3)

Standard score range

Descriptive

classification Description of performance

≥ 116 Above average

Normative Strength (16% of population)

85-115 Average range

Within Normal Limits (68% of population)

≤ 84 Below average

Normative Weakness (16% of population)

21 Normative descriptive System (3 categories)

Source: Rapid Reference 4.3, Flanagan & Kaufman, 2009.

(23)

I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - ^RESULTS (4)

22 % of “stable”

individuals within ±2 SEM ¹

% of “stable” individuals according to normative descriptive system (3 cat.)

% of “stable” individuals according to traditional descriptive system (7 cat.)

VCI 68.8 71.6 49.6

PRI 70.4 73.2 48.0

WMI 65.2 74.0 47.2

PSI 57.6 72.0 40.8

FSIQ 58.4 76.8 54.4

GAI 72.8 80.4 57.6

CPI 56.4 68.8 44.0

1

Standard error of measurement.

 Only PRI and GAI index scores are stable at an intraindividual

level between first and second assessment

(24)

Second Testing

FSIQ ≤ 84 85 ≤ FSIQ ≤ 115 FSIQ ≥ 116 Total First Testing

FSIQ ≤ 84 11 21 0 32

I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - ^RESULTS (5)

23  Focus on «below average» children (FSIQ < 85 at the first testing; N=32)

 For FSIQ scores:

 34.4% of children are within the same “below average” category between both assessments

 66.6% of children change to “average range” at

the second testing

(25)

Second Testing

GAI ≤ 84 85 ≤ FSIQ ≤ 115 GAI ≥ 116 Total First Testing

GAI ≤ 84 14 12 0 26

I NTRAINDIVIDUAL DIFFERENCES IN CHANGE - ^RESULTS (6)

24  Focus on «below average» children (GAI < 85 at the first testing; N=26)

 For GAI scores:

 53.8% of children are within the same below

“average category” between both assessments

 46.1% of children change to “average range” at

the second testing

(26)

C OMPARISONS BETWEEN STUDIES (1)

25 Stability coefficient for WISC-IV index scores

Short-term stability (< 1 year) Long-term stability (> 1 year) Wechsler

(2003) (US, N=243)

Wechsler (2005) (FR, N=93)

Ryan & al.

(2010) (US, N=43)

Lander (2010) ¹ (US, N=131)

Watkins &

Smith (2013) (US, N=344)

Kieng & al.

(2014) (CH, N=250)

VCI .93 .88 .76 .65 .78 .80

PRI .89 .83 .68 .62 .76 .71

WMI .89 .78 .75 .54 .70 .67

PSI .86 .83 .54 .52 .65 .65

FSIQ .93 .91 .88 .70 .84 .81

1

uncorrected test-retest correlations.

(27)

C OMPARISONS BETWEEN STUDIES (2)

26 Lander (2010)

(US, N=131) Clinical children

Kieng & al. (2014) (CH, N=250) Non clinical children

VCI 78 68.8

PRI 78 70.4

WMI 73 65.2

PSI 70 57.6

FSIQ 73 58.4

GAI - 72.8

CPI - 56.4

Percentage of stable children (within ± 2 SEM)

(28)

C OMPARISONS BETWEEN STUDIES (3)

27 Percentage of stable performances ( ± 10 points)

Watkins & Smith (2013) (US, N = 344)

Clinical children

Kieng & al. (2014) (CH, N = 250)

Non clinical children

VCI 71 68.8

PRI 61 62

WMI 63 64

PSI 56 44.8

FSIQ 75 72.8

GAI - 74.8

CPI - 56.4

(29)

C ONCLUSION

28  These results from a group of 250 non clinical children suggest that Perceptual Reasoning (PRI) and General Ability Index (GAI) have an acceptable predictive value

 Potential clinical relevance to use GAI rather than FSIQ

 Like Watkins & Smith (2013), our results show that almost 30% of children earned FSIQ scores that differed by 10 or more points between both assessments

 According to Canivez & Watkins (2001), the effect sizes are quite small when practice effect is observed in long-term stability (> 1 year)

 However, we find moderate effect sizes for PSI and CPI,

suggesting practice effect even if test-retest interval exceeded 1 year

 Unlike the test’s publisher, we don't recommend traditional descriptive system; normative descriptive system is

more useful since it allows better predictions

(30)

Thank you for your attention !

29

(31)

REFERENCES

 Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.

 Canivez, G. L., & Watkins, M. W. (2001). Long-term stability of the Wechsler Intelligence Scale for Children--Third Edition among students with disabilities. School Psychology Review, 30(3), 438-453.

 Deary, I. J., Whalley, L. J., Lemmon, H., Crawford, J. R., & Starr, J. M. (2000). The Stability of Individual Differences in Mental Ability from Childhood to Old Age: Follow-up of the 1932 Scottish Mental Survey. Intelligence, 28(1), 49-55.

 Flanagan, D. P., & Kaufman, A. S. (2009). Essentials of WISC®-IV Assessment. Second Edition. USA: Wiley.

 Golay, P., & Lecerf, T. (2011). Orthogonal higher order structure and confirmatory factor analysis of the French Wechsler Adult Intelligence Scale (WAIS-III). Psychological Assessment, 23(1), 143-152. doi: 10.1037/a0021230

 Kieng, S., Rossier, J., Favez, N., & Lecerf, T. (2013). Étude exploratoire de la stabilité à long terme des indices standard du WISC-IV. Pratiques Psychologiques, 19(3), 163-178. doi: http://dx.doi.org/10.1016/j.prps.2013.07.003

 Lander, J. (2010). Long-term stability of scores on the Wechsler Intelligence Scale for Children- fourth edition in children with learning disabilities. (71), ProQuest Information & Learning, US. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&db=psyh&AN=2010-99220-484&site=ehost- live Available from EBSCOhost psyh database.

 Lecerf, T., Reverte, I., Coleaux, L., Favez, N., & Rossier, J. (2010). Indice d’aptitude général pour le WISC-IV: Normes francophones. Pratiques Psychologiques, 16(1), 109-121. doi: 10.1016/j.prps.2009.04.001

 Lecerf, T., Reverte, I., Coleaux, L., Maillard, F., Favez, N., & Rossier, J. (2011). Indice d’aptitude général et indice de compétence cognitive pour le WISC-IV: Normes empiriques versus normes statistiques. European Review of Applied Psychology/Revue Européenne de Psychologie Appliquée, 61(2), 115-122. doi: 10.1016/j.erap.2011.01.001

 Magnusson, D. (1967). Test theory. Reading, MA: Addison-Wesley Publishing.

 Prifitera, A., Weiss, L. G., & Saklofske, D. H. (1998). The WISC-III in context. In A. A. Prifitera & D. H. Saklofske (Eds.), WISC-III Clinical Use and Interpretation: Scientific-practitioner perspectives. San Diego: Elsevier Academic Press.

 Raiford, S. E., Weiss, P. D. L. G., Rolfhus, P. D. E., & Coalson, P. D. D. (2005). General Ability Index: Hartcourt Assessment, Technical Report.

 Reverte, I., Golay, P., Favez, N., Rossier, J., & Lecerf, T. (2014). Structural validity of the Wechsler Intelligence Scale for Children (WISC-IV) in a French-speaking Swiss sample. Learning and Individual Differences, 29(0), 114-119. doi: http://dx.doi.org/10.1016/j.lindif.2013.10.013

 Ryan, J. J., Glass, L. A., & Bartels, J. M. (2010). Stability of the WISC-IV in a sample of elementary and middle school children. Applied Neuropsychology, 17(1), 68-72. doi: 10.1080/09084280903297933

 Sattler, J. M. (2008). Assessment of children. Cognitive foundations. Fifth Edition. San Diego: Jerome M, Publisher, Inc.

 Sattler, J. M., & Ryan, J. J. (2009). Assessment with the WAIS-IV. La Mesa: Jerome M. Sattler, Publisher, Inc.

 Watkins, M. W., & Smith, L. G. (2013). Long-term stability of the Wechsler Intelligence Scale for Children—Fourth Edition. Psychological Assessment, 25(2), 477-483. doi: 10.1037/a0031653

 Wechsler, D. (2003). Manual for the Wechsler Intelligence scale for children - Fourth edition. San Antonio: Psychological Corporation.

 Wechsler, D. (2005). Manuel de l'Echelle d'Intelligence de Wechsler pour Enfants - 4e édition. Paris: Editions du Centre de Psychologie Appliquée.

 Weiss, L., & Gabel, A. D. (2008). WISC-IV technical report #6: Using the Cognitive Proficiency Index in psychoeducational assessment. Upper Saddle River, NJ: Pearson Education, Inc.

30

On the Myth and the Reality of the Long-Term Stability of French WISC-IV Scores

Conference Presentation

Reference

On the Myth and the Reality of the Long-Term Stability of French WISC-IV Scores

KIENG, Sotta, et al.

Abstract

Tests of intelligence are often used for diagnostics and intervention purposes. Beyond these goals, tests of intelligence are used to identify cognitive strengths and weaknesses. These diagnostic applications are based on the hypothesis that intelligence is an enduring trait.

KIENG, Sotta, et al . On the Myth and the Reality of the Long-Term Stability of French WISC-IV Scores. In: The 9th Conference of the International Test Commission (ITC) , San

Sebastian (Spain), 2-5 july 2014, 2014, p. 2-29

Available at:

http://archive-ouverte.unige.ch/unige:39021

O N THE MYTH AND THE REALITY OF THE LONG - TERM STABILITY OF F RENCH WISC - IV SCORES *

Sotta Kieng (University of Geneva) Nicolas Favez (University of Geneva) Jérôme Rossier (University of Lausanne) Sophie Geistlich (University of Geneva) Thierry Lecerf (University of Geneva)

*This work was supported by Grant 100014_135406 awarded by the Swiss

National Science Foundation (Long-term stability of the WISC-IV: Standard and

CHC composite scores, Lecerf, Favez & Rossier)

O VERVIEW

 Introduction

 Objectives of the study

 Method

 Results

 Comparisons between studies

 Conclusion 1

I NTRODUCTION

 As intelligence is presumed to be an enduring

trait, intelligence test scores should be stable over time

 Individual tests of intelligence are often used to diagnose and to guide interventions

 While several studies have investigated short-

term stability of intelligence tests scores, only few have assessed the long-term stability

 The Wechsler Intelligence Scale for Children- Fourth Edition – WISC-IV – is among the most widely used tests to assess the cognitive abilities of children

3

STRUCTURE OF THE WISC - IV (1)

4

FSIQ

VCI

Similarities Vocabulary Comprehension

Information Word reasoning

PRI

Block design Picture concepts Matrix reasoning Picture completion

WMI

Digit span Letter-number seq.

Arithmetic

PSI

Coding Symbol search

Cancellation

Verbal Comprehension Index

 verbal reasoning and comprehension

Perceptual Reasoning Index

 fluid reasoning in the perceptual domain

Working Memory Index

 ability to hold information in mind temporarily and

perform some operation with it Processing Speed Index

 ability to quickly perform

simple cognitive tasks

STRUCTURE OF THE WISC - IV (2)

5

GAI

FSIQ

VCI

Similarities Vocabulary Comprehension

PRI

Block design Picture concepts Matrix reasoning

WMI

Digit span Letter-number seq.

PSI

Coding Symbol search

CPI

General Ability Index  Knowledge and Problem- solving index ;

(Prifitera, Weiss & Saklofske, 1998; Raiford, Weiss,

Rolfhus, Coalson, 2005;

Sattler & Ryan, 2009)

Cognitive Proficiency Index

 Sustained attention and psychomotor speed index (Sattler, & Ryan, 2009; Weiss

& Gabel, 2006, Weiss & al., 2006)

Note: Supplemental subtests are in italics

O BJECTIVES OF THE STUDY (1)

Investigating the long-term stability of the French Wechsler Intelligence Scale for

Children – Fourth Edition (WISC-IV)

 Test-Retest procedure

 Non clinical children

 Conclusion ¹

interval, the lower is the gain of FSIQ scores ¹³