• Aucun résultat trouvé

CHAPTER IV. GENERALIZATION TO OTHER TYPES OF SCHWA WORDS

IV.2 Experiment 4: Initial schwa words in Swiss French

The aim of this first experiment is to confirm with a completely different task our preceding finding, i.e., that schwa words with a schwa in their first syllable are represented in the production lexicon of Swiss speakers with two different lexemes. Pseudohomophone and pseudoword immediate and delayed naming tasks are used for this purpose. As discussed above, we predict an advantage for pseudohomophones (e.g., “seriziez”) over pseudowords (e.g., “serasien”), for pseudohomophones corresponding to schwa variants (e.g., cerisier

‘cherry tree’ realized as [sizje]) as well as for pseudohomophones corresponding to non-schwa variants (e.g., cerisier realized as [sizje]).

138

IV.2.1 Method

IV.2.1.1 Participants

Twenty-four participants took part in the experiment. They were all monolingual French speakers, aged between 18 and 35 years, with no reported hearing, reading or language impairment. They were all born in the French part of Switzerland (in the districts of Neuchâtel, Fribourg, Jura or Vaud), and have always lived there. They were paid for their participation.

IV.2.1.2 Material

Fifty French polysyllabic nouns were selected (see APPENDIX 6). Each word has a schwa in the first syllable and can be produced either with (e.g., [sizje] ‘cherry tree’) or without schwa (e.g., [sizje]). The cumulated lexical frequencies of the phonological forms corresponding to these words (here and in the following analyses of this chapter, when we refer to lexical frequency, we always mean the frequencies as given by the database Lexique for films, New, Pallier, Ferrand & Matos, 2001) vary between 0 and 140.2.

For each of these 50 schwa words, two pseudohomophones were generated; one for the schwa variant and one for the non-schwa variant. For instance, the pseudohomophone

“seriziez” was associated with the schwa variant of the word cerisier and the pseudohomophone “srizier” to the non-schwa variant of this same word.

In addition, two pseudowords were generated per schwa word, one for its schwa variant and another for its non-schwa variant. The two pseudowords only differed in the presence/absence of the vowel in the initial syllable (hence mirroring the presence/absence of schwa in the two variants of the schwa word). Consonantal phonemes were identical in the variants and their corresponding pseudowords; only some of the vowels were changed.

For instance, the pseudoword “serasien” (i.e., [sazj]) was associated with the schwa variant of the word cerisier and “srazien” (i.e., [sazj]) with its non-schwa variant.

All pseudowords and pseudohomophones respected French orthotactic rules and were consistent (i.e., with only one possible pronunciation). Furthermore, for each variant type, the two groups of stimuli (pseudohomophones and pseudowords) were balanced with respect to many variables known to affect the reading of words, pseudohomophones and pseudowords, as well as the oral naming of words and pseudowords, as shown by unpaired t-tests (see APPENDIX 7 for statistical values). The following variables were balanced

139

between pseudohomophones and pseudowords for each variant type: number of letters (e.g., Spinelli et al., 2005), the ratio between the number of letters and the number of phonemes (i.e., graphemic complexity, see Rastle & Coltheart, 1998; Ray & Schiller, 2005), number of phonemes (Roelofs, 2002), number of syllables (Santiago, MacKay, Palma & Rho, 2000), positional letter frequency (Grainger & Jacobs 1993), positional bigram frequency (Seidenberg et al., 1996), bigram frequency (Massaro, Venezky & Taylor, 1979 in Seidenberg et al., 1996), orthographic neighborhood density (i.e., number of orthographic neighbors, following Coltheart, Davelaar, Jonasson & Besner, 1977, we defined an orthographic neighbor as a word differing in the identity of one letter, see McCann & Besner 1987, Laxon, Masterson, Pool, & Keating, 1992), orthographic neighborhood frequency (i.e., number of more frequent orthographic neighbors, Grainger, 1990), initial syllable frequency (Laganaro & Alario, 2006; Cholin, Levelt & Schiller, 2006), positional segment and diphone frequency (Vitevich, Armbruster & Cho, 2004), phonological neighborhood frequency (i.e., mean frequency of all phonological neighbors, Vitevich & Sommers, 2003) and identity of first phoneme (Rastle, Croot, Harrington & Coltheart, 2005). Unfortunately, we were not able to match the pseudohomophones and pseudowords associated with non-schwa variants on phonological neighborhood density (i.e., number of phonological neighbors, Vitevich, 2002). Table 17 shows the values for these properties for the pseudohomophones and matched pseudowords for both variant types.

We also included fillers in the experiment, to prevent our participants from focusing on schwa. For each variant of each schwa word, we chose a non schwa word, with exactly the same number of phonemes and syllabic structure as the variant. For instance, the word clavier ‘keyboard’ was chosen for the non-schwa variant of the word cerisier and the word financier ‘financier’ for its schwa variant. For each of these filler words, a pseudohomophone and a pseudoword were generated according to the same criteria as for the target items.

Our final set of stimuli therefore contained 400 tokens: 50 schwa variants pseudohomophones and their 50 matched pseudowords, 50 non-schwa variants pseudohomophones and their 50 matched pseudowords, 100 fillers and their 100 matched pseudowords. The stimuli were separated in two lists with an equal number of schwa variant pseudohomophones and pseudowords, non-schwa variant pseudohomophones and pseudowords, filler pseudohomophones and filler pseudowords in each list. The pseudohomophones corresponding to the two variants of a given schwa word were always

140

part of different lists. Furthermore, a given pseudohomophone was not in the same list as its matched pseudoword. For instance, the items “seriziez” and “srazien” were part of one list, whereas the items “srizier” and “serasien” were part of the other list. Items were then randomized within each list.

Table 17. Properties of pseudohomophones and pseudowords associated with schwa and non-schwa variants of initial schwa words in Experiment 4.

Stimulus type

Schwa variant Non-schwa variant

Variables Pseudohomophone Pseudoword Pseudohomophone Pseudoword

Nb of letters 7.46 7.46 6.36 6.36

Graphemic complexity 1.32 1.32 1.37 1.37

Nb of phonemes 5.72 5.72 4.72 4.72

Nb of syllables 2.34 2.34 1.34 1.34

Positional letter frequency 345540.30 326682.50 194601.60 200657.20

Positional bigram frequency 27118.90 27181.22 11145.21 11334.59

Bigram frequency 159077.30 163049.10 98024.32 103796.10

Orthographic neighborhood

density 0.48 0.46 0.26 0.26

Orthographic neighborhood

frequency 0.46 0.42 0.24 0.18

Initial syllable frequency 17144.93 14408.88 1.08 6.29

Positional segment frequency 0.31 0.33 0.213 0.215

Positional diphone frequency 0.0365 0.0333 0.011 0.010

Phonological neighborhood

frequency 20.39 20.08 85.83 29.78

Phonological neighborhood

density 4.04 3.22 3.54 2.30

IV.2.1.3 Design and procedure

The experiment took place in a soundproof cabin at the Neuropsycholinguistic Laboratory in Neuchâtel. Participants performed three tasks in one experimental session: an immediate pseudohomophone and pseudoword naming task, a delayed pseudohomophone and

141

pseudoword naming task, and a variant relative frequency estimation task. Tasks were separated by short pauses. The whole session lasted about 1h15.

Variant relative frequency estimation task

This task was identical to the one used in Experiment 1 except for two characteristics.

Firstly, the relative frequencies were only collected for words in isolation. Secondly, the estimation task involved 50 fillers, which corresponded to words with a schwa not in their first syllable (as in casserole ‘pot’). We included these fillers in order to generate more variation in our participants’ responses and prevent them from feeling obliged to use the whole scale even though they usually only used one variant type for most words.

Pseudohomophone and pseudoword naming tasks

The immediate and delayed naming tasks were run with DMDX (Forster & Forster, 2003).

Immediate naming task: The stimuli appeared one by one on a computer screen. Participants were told to pronounce them as fluently and quickly as possible. They were informed about the nature of the stimuli. Vocal responses were recorded with a voice key. Each trial had the following structure: a fixation cross was shown at the center of the screen for 300 ms, followed by the presentation of the letter string, in lowercases. The letter string disappeared with the onset of the response or after 2000 ms if no response was given. The next trial started after a 750 ms blank screen interval. The voice key was activated at the onset of the letter string presentation.

Each participant was presented with the two lists, separated by a short pause. The order of the lists was counterbalanced across participants. The experiment started with a few training items.

Delayed naming task: After a short pause, participants had to perform a delayed naming task with the same stimuli. The delayed naming task was identical to the immediate naming task except that participants had to wait for the appearance of a response cue before providing their response. Each trial had the following structure: the letter string was presented on the screen for 1500 ms followed by a 1000 or 1500 ms blank screen interval. A cue then appeared on the screen and participants had to produce their response as soon as possible.

The cue stayed on the screen for 2000 ms or until a response was given. A 300 ms blank screen interval separated trials. The voice key was activated at the onset of the cue. The variable interval of blank screen before the appearance of the cue was introduced so that

142

participants could not anticipate when the response had to be given. The experiment again started with a few training items.

IV.2.2 Results

IV.2.2.1 Variant relative frequency estimation task

The mean relative frequency for non-schwa variants is 4.83 (95% confidence interval: ± 0.28) and the median is 5. The ratings are correlated to those obtained in Experiment 1 for the 19 items present in both experiments (Spearman rho = 0.68, S = 362.3, p < 0.01). In addition, the mean relative frequency for non-schwa variants does not differ in both experiments for these 19 items (mean difference = 0.17, t(27) = 0.52, p > 0.1).

As in Experiment 1, we find a correlation between our estimations (values averaged over speakers) and Racine’s values (Swiss speakers: Spearman rho = 0.45, S = 30210.7, p < 0.01;

French speakers: Spearman rho = 0.41, S = 29291.9, p < 0.001). There is also a small but significant correlation between the ratings for the non-schwa variants and the words’

frequencies in films (Spearman rho = 0.33, S = 27723.3, p < 0.05).

IV.2.2.2 Pseudohomophone and pseudoword naming tasks

Each vocal response was checked for accuracy. Two participants were excluded as their error rate was over 35 % in a given condition (either pseudohomophone or pseudoword naming, in the immediate or delayed naming task). For the remaining 8800 responses, hesitations, disfluencies, reading errors, productions of the wrong variant, onset uncertainty measures, variant uncertainty and anticipations (i.e., responses given before the onset of the cue in the delayed naming task) were considered as errors and removed from the analysis.

Delayed naming task

In the delayed naming task, there were 324 errors (7%), most of them being reading errors (n

= 163, 50% of errors). Latencies below 100 ms (13 data points) were automatically removed.

A visual inspection of the distribution further led us to disregard the 19 data points above 1200 ms. Latencies for the 4044 remaining correct responses ranged from 104 to 1186 ms with an overall mean of 427 ms. No further analyses were conducted on these latencies.

They were used to control for differences in ease of articulation in the statistical model for the immediate naming task.

143

Immediate naming task

Analysis of responses

In the immediate naming task, the number of errors totaled 547 (12%). Most errors were reading errors (n = 371, 68% of errors). An analysis of error type for pseudohomophones showed that six non-schwa variants were produced with the schwa and 16 schwa variants were produced without the schwa.

Excluding errors due to measurements (i.e., uncertainty in the onset of the response or in the variant produced) we ran a generalized mixed-effects model on errors. Participant and item were entered as random terms, variant type and stimulus type (pseudohomophone versus pseudoword) were entered as fixed effects. Results show that the probability of making an error was higher for pseudowords (17% of errors) than for pseudohomophones (6% of errors, β = -1.35, F(1,4343) = 135.7, p < 0.0001) and for schwa variants (14%) than for non-schwa variants (9%, β = 0.58, F(1,4343) = 31.0, p < 0.0001). There was no interaction between these two predictors. The partial effects (i.e., effect of each predictor while other predictors of the model are held constant) of this model are shown in Figure 18.

144

Figure 18. Partial effects of the statistical model for errors in Experiment 4 (initial schwa words produced by Swiss speakers).

Analysis of latencies: pseudohomophones and pseudowords

We further withdrew the 217 data points for which we did not have a value for the delayed latencies (including the 32 outliers described above). The latencies for the 3636 remaining responses were adjusted whenever necessary using the software CheckVocal (Protopapas, 2007). Visual inspection of the resulting latencies showed that the distribution was right-skewed. Most of this skewness was removed by performing a reciprocal transformation (following the Box-Cox test, Box & Cox, 1964) and taking out the one data point above 2000 ms. Further analyses were restricted to the 3635 remaining correct responses. Figure 19 gives the mean latencies and 95% confidence intervals for the 3635 correct responses in the immediate naming task as a function of stimulus type and variant type.

Stimulus type Variant type

0.860.880.900.920.940.960.98

Probability of correct response

Pseudohomophone Pseudoword

0.860.880.900.920.940.960.98

Probability of correct response

With schwa Without schwa

145 Figure 19. Mean production latencies as a function of stimulus type for each

variant type in Experiment 4 (initial schwa words produced by Swiss participants). The bars represent the 95% confidence intervals (n = 3635).

We analyzed the data by means of a mixed-effects model with the reciprocal latencies as the dependent variable and with word and participant as crossed random effects. Four different models were run. In Model 1, only the variables related to our research questions are entered as predictors (i.e., Delayed latencies, Variant type and Stimulus type), together with the order of presentation of the stimuli. In Model 2, we included in addition other variables known to influence latencies in written and oral naming tasks. Model 3 is identical to Model 2, but applied only to alternating words according to our participants’ estimations. Model 4 is identical to Model 2 with an additional predictor, accounting for the phonological similarity of the pseudohomophones with existing words.

Model 1

In this first model, we entered the following predictors: stimulus type, variant type, the interaction between these two variables, the latencies obtained for the given items in the delayed naming task, and whether the stimulus was the first or second pseudohomophone or pseudoword of a given schwa word to be produced (i.e., Order of presentation).

Residuals larger than 2.5 times the standard deviation (51 data points, forming 1.4 % of the data) were considered outliers and removed. The random terms for participant and for word significantly improve the final model according to likelihood-ratio tests (participant: χ2(1) = 1920.7, p < 0.0001; word: χ2 (1) = 785.6, p < 0.0001). The final model is summarized in

146

Table 18. Summary of Model 1 for Experiment 4. The intercept represents a schwa variant pseudohomophone, being the first pseudohomophone of a given schwa word to be produced by the speaker in the experiment. For categorical variables, the statistical values correspond to the contrast between the intercept and the level of the variable indicated in round brackets.

Variable β F p

Delayed latencies 9.00 10-8 12.28 <0.001

Order of presentation -4.87 10-5 30.32 <0.0001

Stimulus type (Pseudoword) 1.12 10-4 163.83 <0.0001

Variant type (Without schwa) -2.35 10-5 7.43 <0.01

The model shows main effects for all predictors. There is no interaction between stimulus type and variant type. Latencies increase when latencies for delayed naming increase, they decrease when a given item is the second pseudohomophone or pseudoword of a given schwa word to be produced. Importantly, latencies are shorter for pseudohomophones than pseudowords (i.e., Pseudohomophone effect) and shorter for non-schwa variants compared to schwa variants.

Model 2

We conducted an additional analysis in order to examine to which extent the advantage for pseudohomophones and for non-schwa variants could be explained by phonological and/or orthographical differences between pseudohomophones and pseudowords and between schwa and non-schwa variants respectively. While pseudohomophones and pseudowords lists were balanced according to most variables known to affect written and oral naming latencies, the removal of data points may have led to an unbalanced data set. Furthermore, even though statistical tests showed no significant differences between pseudohomophones and pseudowords on most variables, we cannot exclude the possibility that some differences played a role when applied to so many data points.

One way to examine whether the effects of stimulus and variant type are independent of the many variables known to influence latencies in our paradigm is to include these variables as predictors in the regression model. Due to the high number of variables and their potentially high level of collinearity, we first applied two procedures in parallel in order to restrict these variables to the more important ones. We first ran a random forest (e.g., Lunetta, Hayward,

147

Segal & Van Eerdewegh, 2004; Breiman, 2001) using the package party (Hothorn, Hornik &

Zeileis, 2006) in R and the function cforest. Random forest is a statistical technique used to identify relevant predictors in settings with a large number of predictors. It is based on classification trees building23. A random forest is a collection of classification trees which provides a single measure of importance for each explanatory variable. We included the following variables in the random forest analysis: number of letters, number of phonemes, number of syllables, bigram frequency (positional and non-positional), positional letter frequency, phonological neighborhood density, phonological neighborhood frequency, positional segment frequency, positional diphone frequency, orthographic neighborhood density, orthographic neighborhood frequency and initial syllable frequency, together with variant type and stimulus type. Results for the Random forest analysis are detailed in APPENDIX 8.

Along with variable importance, we examined the correlation among variables as some of them are likely to be highly correlated (e.g., positional diphone frequency and positional segment frequency or orthographic neighborhood density and orthographic neighborhood frequency). We conducted a hierarchical analysis of clustering using the Hmisc package in R and the varclus function. Such analyses are usually used to assess collinearity between variables and to group them into clusters that can be considered as a single variable (see the R documentation). Results for this analysis are presented in APPENDIX 9. Highly correlated variables cannot be entered together in linear regression models such as mixed-effects models, as they are likely to generate collinearity. Thus, when several variables appeared to form a cluster (i.e., were highly correlated) given the cluster analysis, we selected the one variable in the cluster having the higher measure of variable importance in the random forest analysis to represent the cluster. For instance, positional segment frequency and positional diphone frequency appeared highly correlated; we selected the variable Positional diphone frequency as a measure of phonotactic probability since it obtained a higher value in the random forest analysis than Positional segment frequency.

Random forests and hierarchical clustering allowed us to select the following variables:

number of letters, phonemes and syllables, orthographic neighborhood density, phonological

23 Classification trees building is a statistical method in which trees are constructed by recursively partitioning the data into homogeneous subgroups. At each node, the variable giving the most homogeneous subgroups is selected (e.g., Lunetta et al., 2004).

148

neighborhood density, positional bigram frequency, bigram frequency, positional diphone frequency, and initial syllable frequency.

We then ran a linear mixed-effects model with participant and item as random terms and the reciprocal of the naming latencies as the response. We entered the selected variables as fixed effects in the model together with Order of presentation, Delayed naming latencies, Stimulus type, Variant type and the interaction between Stimulus type and Variant type. The selected variables were always entered in the model before Stimulus type and Variant type. We systematically tested for correlation between predictors before entering them jointly in the statistical model. The predictors that were correlated above 0.3 were orthogonalized.

Orthogonalization between two variables (e.g., A and B) was performed as follows. We first ran a linear model in which variable A was predicted by variable B. We then used the residuals of this linear model instead of the raw values for variable A as fixed effect in the mixed-effects model. This way, both variables could be included in the model without introducing collinearity.

Residuals larger than 2.5 times the standard deviation (53 data points, forming 1.5% of the data) were considered outliers and removed. The random terms for participant and for word significantly improved the final model according to likelihood-ratio tests (participant: χ2(1) = 2135.3, p < 0.0001; word: χ2 (1) = 168.7, p < 0.0001).

Results for this second analysis replicate the results for Model 1; latencies were shorter for pseudohomophones compared to pseudowords and for non-schwa variants compared to schwa variants. Hence, the effects of stimulus type and variant type in Model 1 were not due to structural differences between pseudohomophones and pseudowords or between schwa and non-schwa variants. In addition, results show that latencies increased with the latencies

Results for this second analysis replicate the results for Model 1; latencies were shorter for pseudohomophones compared to pseudowords and for non-schwa variants compared to schwa variants. Hence, the effects of stimulus type and variant type in Model 1 were not due to structural differences between pseudohomophones and pseudowords or between schwa and non-schwa variants. In addition, results show that latencies increased with the latencies