• Aucun résultat trouvé

2. Methodology

2.4 Administration of tests -Training of examiners

86 In order to ensure optimal testing conditions (increase accuracy and decrease any examiner’s bias), it was decided to use one examiner for each type of test, namely the vocabulary-phonology section, the Lise-DaZ and NWR test, and the tests of non-verbal reasoning (CPM and DMVR). The training of three undergraduate Speech and Language students at the University of Fribourg lasted one full day and included presentation of materials, explanations, and exercise by means of role-play and was implemented by the researcher and one of the supervisors. The three examiners then were responsible to organise and administer the corresponding type of test to the children at different settings (schools, homes, SLT clinics). For their participation, all three examiners were paid a basic fee for each administration.

Due to practical and logistics issues, four students had to be tested by the researcher (vocabulary and phonology subtests). Finally, the examiners did not receive any prior information regarding whether the children received or not any speech therapy. Nonetheless, this information was hard to conceal for a few caseload children (N= 6) as the administration had to take place at the speech therapy setting where they received support, as this was most practical for all parties. The examiners were instructed to avoid discussing any case history information with the teachers and carers and ensure that the children could be seated in a quiet room without distractions. The only information that the examiners could discuss was whether the children’s preferred language of instruction was High-German or Swiss-German and proceed accordingly.

In all cases, students were asked to ensure some time at the beginning of each session to build up a rapport with the child usually by means of a quick game. Each testing session lasted approximately from 25’ to 35’. When necessary, short breaks were provided. Also, whenever necessary, the presence of a parent or SLT or teacher was allowed in the testing area.

87

3. Quantitative Results

3.1 Results concerning Hypothesis 1

Our initial assumption was that Dynamic measures will provide a more valid differentiation of the two groups’ abilities compared to the static ones. To examine this assumption, we set out to compare static and dynamic performance by looking into the mean differences of each group across all different tasks. To do so, we used one-way ANOVA’s and Mann-Whitney tests, as appropriate.

These differences are presented and commented upon in small clusters of different linguistic areas (vocabulary, phonology), as well as based on the instrument that was used (see tables 10, 11, 12 and 13). Regarding the assumptions for the use of the ANOVA, as already mentioned these were overall met; in cases of considerable departures from normality, non-parametric tests were used.

Normality of variables was mainly assessed with qq-plots and normality tests, which were positive for normality on most cases. Assumption of homogeneity of variance was, also, checked and was found to be confirmed for most cases. All reported significant results were also confirmed through non-parametric tests, wherever necessary. However, it is worth remembering that ANOVA is robust enough to overcome modest violations of parametric requirements (Mayers, 2013).

Therefore, comparison of mean differences across groups provides an initial indication to this end.

The results are presented in the following tables ensuring that the sample size is always mentioned.

With regards to the ANOVA tests, where appropriate, the eta squared effect sizes were, also, calculated. Αs for the variable “Sentence Assembly”, which is an ordinal scale, the Mann-Whitney test was employed, as well as a non-parametric method to calculate the effect size (See below).

Cohen’s (1988) benchmarks of eta squared values are taken as reference, whereby eta squared

=.01 is considered to correspond to a small effect size, eta squared = .06 to a medium effect size and eta squared = .14 to a large effect size.

3.1.1 Vocabulary measures

The following table (Table 10) presents the mean differences of performance of each group on the different vocabulary measures, along with the standard deviations, as well as the differences (ANOVA) and effect sizes thereof, and the size of each group.

88

* p <.05, **p <.01, (1-tailed)

In terms of group size, the control group comprised 27 children, whereas the caseload group 18.

As regards the mean performance on the Static Vocabulary subtest, the control group performed only slightly better (M = 10.7) than the caseload (M = 9.2), whereas for both groups the standard deviation was, also, almost identical (SD control = 3.8; SD caseload = 3.6). Scores ranged from 0 to 20.

With regards to the Mean Dynamic Mediation, the observed difference of performance was bigger, with the control group performing higher on average (M = 2.6) than the caseload (M = 2.2), for scores ranging from 0 to 3. The standard deviations were almost equal for the two groups (SD control = 4.7; SD caseload = 4.5). For all other dynamic tasks, the possible scores ranged from 0 to 3, as well.

Concerning the dynamic task of Immediate expressive recall, the control group scored on average higher (M = 1.5) than the caseload (M = 0.8), but both groups presented with similar variability (SD control = 1.3, SD caseload = 1). The same pattern was observed both for the subsequent dynamic retention tasks: the control group performed better on average both on the Expressive retention (M = 1.7) but, also, at the Receptive retention task (M = 3.7) compared to the caseload group, (M = 0.9) and (M= 2.9), respectively for both tasks. For both tasks the variability of responses of the two groups was somewhat larger than the previous tasks; for instance, for Expressive retention the standard deviation of responses of the Control group was 1.4, whereas the standard deviation of responses of the caseload was 0.9.

It is evident that both groups performed better on average at the last Receptive retention task, which is in line with the pilot study and might be logically explained by the increased exposure to the test items.

In terms of the statistical significance and effect size of the observed differences, these were the following:

89 Static vocabulary: The static test shows little differentiation of the receptive skills of the two groups (small effect size), which was non-significant, as determined by an one-way ANOVA [F(1, 43) = 1.71, p = .09], η2 = .04

Dynamic vocabulary: The mean dynamic mediation that children received in order to identify the unknown words during the vocabulary task differentiated the abilities of the two groups (large effect size) and this effect was significant, as determined by an one-way ANOVA [F(1, 43) =7.08, p = .006], η2 = .14.

Performance on the other dynamic tasks (immediate recall, expressive and receptive retention of hard words) provided some differentiation of the skills of both groups (medium effect size), as indicated by the respective One-way ANOVA’s: a) immediate recall: [F(1, 43) = 1 , p = .048], η2

= .06, b) expressive retention: [F(1,43) = 3.41, p = .04], η2 = .74, c) receptive retention: [F(1, 43)

= 2.71 , p = .054], η2 = .06. The difference of performance on the immediate recall and expressive retention tasks was significant but on the receptive retention it was not.

The large effect size associated with the Mean Dynamic Mediation indicates that it provides as strong differentiation of the skills of the two groups and the medium effect sizes of the other dynamic measures confirm this trend, even though the observed differences between the groups were smaller.

Conclusion: Children’s receptive skills as measured through a standardized test did not differ between the two groups. On the other hand, the control group’s ability to make use of the provided cues was higher compared to the caseload group, leading to better overall performance on the expressive retention tasks. Interestingly, the control group’s receptive retention was less affected than the expressive tasks. This is rather surprising and should be discussed later.

3.1.2 Phonology Measures

Table 11 presents the mean differences of performance of each group on the different phonology measures, along with the standard deviations, as well as the differences (AVOVA) and effect sizes thereof, and the size of each group.

90 Table 11. Means (SDs), ANOVAs, and effect sizes of both groups’ performance on phonology measures

* p < .05, **p < .01, (1-tailed)

The phonology measures were completed by 16 children of the caseload group and 27 of the control group. This means that for 2 caseload children the specific task was too demanding/ hard. On all tasks, the control children performed better on average than the caseload. It is important to note that for the test Stimulability and Inconsistency lower scores represent higher (less deviant) competences.

More specifically, at the static phase, the control group on average correctly produced 6.8 words out of 10, whereas the caseload only 3.8 words; variability was almost equal: SD control = 2.4, SD caseload = 2.2. Similarly, the control group correctly produced 26.9 sounds on average out of 29, whereas the caseload group produced 21.7. The variability of responses of the caseload was much higher here (SD = 5.3) compared to the control (S.D. = 2.2). A note should be made regarding the maximum score of this subtest, i.e., 29: this score represents the sum of all possible sounds that were included in the words of the static test, including those in blends, apart from the sound /ts/, which is considered as one phoneme.

At the dynamic phase, i.e., following mediation, both groups showed improvement but, again, the control group produced considerably more words correctly on average (M = 8.4) compared to the caseload (M = 4.8), variability of responses was almost the same for both groups; (SD control = 1.8, SD caseload = 2). A similar pattern was observed in relation to the sounds that were produced following mediation: the control group on average (M = 27; SD = 1.9) produced more sounds correctly and overall, less variable scores than the caseload group (M = 23; SD = 3.5).

Measure Mean (SD)

91 As regards the Stimulability test, which measured how many sounds were not stimulable following cueing (lower scores represent better outcomes), once again the caseload group performed less well on average (M = 1.5) than the control (M = 0.5) presenting slightly higher variability of responses (SD caseload = 0.89, SD control= 0.76).

Finally, the Inconsistency rate between the two elicited productions (a percentage calculated as the proportion of all inaccurate words to the total of all produced words across the two trials), ranging from 0-100, whereby a percentage over 40% indicates the possibility of a phonological disorder and warrants further assessment, confirmed the previously observed patterns; the caseload children on average obtained a much higher, i.e. more “inconsistent” score (M = 35.7), which actually neared a lot the 40% “threshold”, compared to the control group (M = 18.1). There was not a big difference regarding the observed variability of responses (SD caseload = 13.9; SD control= 15).

All differences were significant. The levels of the performed ANOVA, as well as the effect sizes are presented below.

Static phonology tests: Children’s ability to produce both sounds and words at static level differentiated both groups (large effect size) and this effect was significant as determined by one-way ANOVA tests: a) static test words: [F(1, 41)= 17.66, p <.001], η2 = .30, b) static test sounds:

[F(1, 41) =15.23, p < .001], η2 = .27

Dynamic phonology tests: Both groups’ ability to produce sounds and words following mediation, i.e., dynamically, also differentiated the two groups very well and showed even larger effect sizes, as determined by one-way ANOVA tests: a) dynamic test words [F(1, 41) = 35.89, p <.001], η2 = .47, b) dynamic test sounds: [F(1, 41) = 19.95, p < .001], η2 = .33

As will be discussed later, the performance of children to produce words and sounds following mediation (dynamic tests) seems to better differentiate the two groups of children, as for both variables the respective effect sizes are larger than the static measures. This seems to be particularly true for the item “Phonology Words” (static phonology words test η2 =.30 compared to dynamic phonology words test η2 =.47).

Dynamic stimulability and inconsistency (Phonology): The dynamic ability of children to imitate difficult to produce sounds following a model (stimulability), as well as the inconsistency rate of their productions before and after mediation (inconsistency), differentiated both groups well, as indicated by the large effect sizes, and these differences were significant as determined by the respective one-way ANOVA tests: a) Stimulability [F(1, 40) =13.84, p = .003], η2 = .26, b) Inconsistency: [F(1, 40) =11.61, p = .002], η2 = .23.

92 Conclusion: Large effect sizes were noted with regards to the vocabulary Dynamic Mediation task and all phonological tasks. It is interesting to note that the effect sizes of performance on the dynamic measures of phonology and vocabulary were larger than the respective performance on static tasks.

3.1.3 General conclusion: According to Hypothesis 1 of this study Dynamic Measures will provide a more valid differentiation of the two groups’ abilities compared to the static ones. An initial conclusion is that the effect sizes of the group differences of all dynamic measures are higher than the corresponding effect sizes of the differences on the static measures and this is an indication that dynamic measures better differentiate the abilities of both groups. This was especially true, i.e., significant, for the required assistance given to the children in order to identify an unknown word (vocabulary) and to produce “hard” words and sounds (phonology), following modelling.

Due to the small sample sizes, however, not all differences were significant.

3.2 Results regarding Hypothesis 2

The second hypothesis that we expect no differences between the two groups’ dynamic nonverbal reasoning performance (as measured by means of the DNVR) but small differences (effect sizes) in regard to their static reasoning performance (as measured by means of the CPM) was investigated using a two-way Analysis of Variance, respectively. With regard to the Dynamic Nonverbal Reasoning (DNVR) task, after observing that the group means were actually identical, as illustrated on Table 12, (MCONTROL = 7.3, SD = 2.9; MCASELOAD = 7.3, SD = 2.4) it was decided that there was not a need to actually run an ANOVA given that this part of the hypothesis was already confirmed.

Table 12 illustrates the mean differences of performance of each group on these measures, i.e., the CPM and the DNVR, along with the standard deviations, the differences (AVOVA) and effect size thereof, as well as the size of each group. As might be observed, the control group performed on average better (M = 13.5, SD = 2.9) than the caseload (M = 11.6, SD = 3.2) on the static test of nonverbal reasoning, CPM.

93 Table 12. Means (SDs), ANOVAs, and effect sizes of both groups’ outcomes on CPM

* p < .05, (2-tailed)

Based on a one-way ANOVA performance on the CPM was found to be marginally different between the two groups, [F (1, 39) =3.16, p = .08]; (Table 12).

Conclusion: These results confirm that there were small differences between the groups regarding their nonverbal reasoning abilities as measured with a static test, but no differences between the groups when using a dynamic measure of reasoning.

3.3 Results regarding Hypotheses 3-5

Research Questions 3-5: These questions aim to address the relationship of the dynamic measures with static instruments and procedures that have already been validated, such as the Lise-DaZ, and, thus, explore possible indications of criterion (concurrent) validity of the DA (Law & Camilleri, 2007), as well as convergent validity. This was performed through Spearman’s correlations between the two groups level. For the research questions and hypotheses 4 & 5 the instrument that was used as reference for all the correlations was the Lise-DaZ assessment. As a general observation, it needs to be clarified that as mentioned earlier, the selection of the Lise-DaZ (Tracy & Schulz, 2011) was made on the grounds that is has been standardised on bilingual children, even though it focuses on the ability to use and understand linguistic morphology rather than semantics. Due to this factor, as well as due to its static nature, it is already important to note that any anticipated correlations with the Dynamic tasks would be low-moderate.

These research questions are mentioned and analysed separately in each case, as follows:

Research question 3: What is the relationship between performance on the static vocabulary test and dynamic measures of vocabulary? Despite the fact that PDSS is not a “gold standard” with regards to the tested sample, i.e., not standardised with bilingual children, the fact that it is an established measure of lexical ability, enables its use to gauge how much the DA indeed may be

Measure Mean

94 used as a measure of lexical learning potential (See Law & Camilleri, 2007), i.e. how much it

measures a similar construct. In this sense, we would expect a rather high correlation between the described tasks. However, this idea automatically is challenged once one considers the different nature of said tests (static versus dynamic), which will naturally affect the magnitude of the expected relationship. Therefore, Hypothesis 3 might be expressed as follows: We expect moderate correlations between static and dynamic tests of vocabulary for both groups.

3.3.1 Results regarding Hypothesis 3

Hypothesis 3, i.e., the assumption that static vocabulary performance would corelate moderately to dynamic vocabulary performance, was investigated through Spearman’s correlations (rho).

These correlations are illustrated on Table 13. As a reference, the caseload comprised 17 children, whereas the control 28 children.

Table 13. Spearman rho correlations between static (PDSS) vocabulary test and Dynamic measures

Mean Mediation Immediate

Hypothesis 3, as can be observed in Table 13, appears to be confirmed, as there were indeed moderate significant correlations between almost all dynamic tasks and the static vocabulary measure. For instance, static vocabulary performance of the caseload group was moderately correlated with Mean Mediation (rs = .44, p = .05) and with immediate recall (rs = .44, p = .048), as well as with receptive retention (rs = .50, p = .05). Similar patterns were observed among the correlations of static vocabulary performance of the control group and the corresponding Mean Mediation (rs = .38, p = .05), as well as immediate recall (rs = .47, p = .01). The following relationships particularly stand out: a) the high relationship between the expressive retention performance of the caseload group (rs = .72, p = .01) and b) the lack of a significant correlation between the receptive retention performance and the static measure (rs = .14, p = .25), given that it is a test of receptive vocabulary. Possible explanations are discussed in the following chapter.

95 3.3.2 Results regarding Research Questions 4 (a & b) and Hypothesis 4 (Lise-DaZ)

Research question 4a: How did performance on the already validated Lise-DaZ differentiate the skills of both groups?

As already mentioned, Lise-DaZ is a validated instrument when it comes to bilingual children with German as an additional language and, therefore, there is substantial evidence that it might provide a valid differentiation of children’s linguistic abilities. Therefore, we chose one-tailed tests for our statistical tests.

Table 14 presents the mean differences of performance of each group on the different Lise-DaZ measures, along with the standard deviations, as well as the differences (AVOVA) and effect sizes thereof, and the size of each group. It should be noted that for the subtest/ variable Sentence Assembly, which is an ordinal scale -of four levels-, the difference between the groups was measured through the Mann-Whitney test and the calculation of the effect size was based following the procedure described by Field (2005) for non-parametric tests (designated by r), i.e., the formula Z / √ N. The reported effect size levels of this test are 0.3 for a medium and 0.5 for a big effect size.

Firstly, it should be mentioned that not all children were able to complete all subtests of the Lise-Daz, as this proved to be too hard for many of the participants of both groups; this is further analysed below. The Comprehension task appeared to be the easiest of all, as 15 out of 18 caseload and 26 out 27 control children completed it. The control group (M = 9, SD = 2.1) performed only slightly better than the caseload (M = 8.8, SD = 2.7).

Table 14. Means (SDs), ANOVAs, and effect sizes of both groups’ performance on Lise-DaZ subtests (receptive and expressive)

* p < .05, **p< .01, (1-tailed), a: SVA ranges from 0-1, b: Non-parametric Mann-Whitney U and r (effect size)

Measure Mean (SD)

96 As mentioned in the Method section, all other (expressive) Lise-DaZ tasks (Sentence Assembly, SV Agreement, and all Word classes subtests) were not completed by the children directly but scored by the examiners based on a sample of expressive language that was elicited through the story of “Lise and her friends”. Overall, as can be observed on Table 14, not all children -from both groups- were able to produce any -or sufficient- expressive responses to “qualify” for the scoring of specific tasks. For instance, as per the Lise-DaZ manual, scoring of a specific “level” at the Sentence Assembly task requires responses in the form of verbs, and specifically, at least 3 occurrences thereof; therefore, for that task the responses of only 11 caseload and 23 control children were considered as valid. As expected, the control group produced more complex phrases (M = 2.5, SD = 1) than the caseload group (M = 2, SD = 0.77). For reference, as mentioned earlier, (Section 2.3.6.1), a level 2 (II) corresponds to the use of two-word phrases following the “typical”

word placement, whereas a level 3 (III) corresponds to a typical main clause; the highest attainable level is an IV (4) and corresponds to the use of subordinate clauses. It should be noted that as this

word placement, whereas a level 3 (III) corresponds to a typical main clause; the highest attainable level is an IV (4) and corresponds to the use of subordinate clauses. It should be noted that as this