• Aucun résultat trouvé

2.2 Abstract knowledge of canonical and non-canonical word order

2.2.4 Study 3: Corpus analysis

We conducted an analysis of the French Lyon corpus (Demuth & Tremblay, 2008) from the CHILDES database. The recordings stem from a longitudinal study of language development in French with recordings made over a 28-months period (2002–2005) on spontaneous interactive face-to-face conversations in the child’s family. The age of the children (Anaïs, Marie, Théotime, and Nathan) during this period ranged from 1;0 to 3;0 years, but we analyzed only recordings until age 1;11 in order to investigate the input children are exposed to before their second birthday, corresponding to the ages tested in our experiments.

Methods

A total of 33,168 utterances of child-directed speech were analyzed. For the sake of our analysis focusing on word order, constituents were coded S for phrasal and pronominal subjects, O for phrasal and pronominal direct or indirect objects (including reflexives), and V for verbs.

Wh-elements were also coded as S or O according to their function. A total of 11 different structures were identified and categorized in three word order categories: VO structures (VO, SVO, VSO, VOS), OV structures (OV, SOV, OSV, OVS) and structures without object (V, SV, and VS).

We excluded all utterances that didn’t contain a conjugated verb (13,320 utterances, 40.2 %) and special cases where the verb was not used productively like in s’il te plaît ‘if it you pleases’ (lit. ‘please’), in injunctions like allez ‘come on’ or in songs (349 utterances, 1.1 %).

When utterances contained more than one clause, each one was analyzed separately in order to record each instance of word order (since one utterance can result in two or three clauses that can involve different word orders). In these cases, the clausal complement was coded as O in the main clause, and subordinate clauses were themselves coded separately. For example, tu veux que je t’aide ‘you want that I you help’ was coded as [1] SVO (tu veux que X ‘you want that X’) [2]

SOV (je t’aide ‘I you help’). When arguments were realized through multiple instances within a clause (an NP and a clitic), we considered the first instance for attributing the clause to one of the three word order categories. This is the case with subject and object dislocations, illustrated in (4)–(7).

(4) Elle fait le tour, la locomotive. (subject right dislocation) she does the round the locomotive

‘It goes around, the locomotive.’

(5) Le bébé, il prend sa douche. (subject left dislocation) the baby he takes her shower

‘The baby, she is taking her shower.’

(6) Je les regarde, les petits poissons. (object right dislocation) I them watch the littles fishes

‘I’m watching them, the little fishes.’

(7) Le dessin, on le fait sur la feuille. (object left dislocation) the drawing, we it do on the sheet

‘The drawing, we are doing it on the sheet.’

Clauses like (4) were coded as sVOS (i.e., SVO sentence type and VO category), (5) were coded as SsVO (i.e., SVO sentence type and VO category), (6) were coded as soVO (i.e., SOV sentence type and OV category), and (7) were coded as OsoV (i.e., OSV sentence type and OV category).

Results

Of the 33,168 utterances, 19,848 (59.8 %) contain a conjugated verb. A total of 15,117 (76.2 %) out of these sentences are composed of one clause, 4,000 (20.2 %) of two clauses and 731 (3.7 %) of three clauses, resulting in a total of 25,310 clauses. The proportion of occurrences of each sentence type illustrated in Figure 5, as well as all other frequencies reported hereafter, were calculated over these 25,310 clauses, unless mentioned otherwise.

A total of 15,602 clauses contain both arguments (61.6 %), 3,057 (12.1 %) contain only a subject, 3,573 (14.1 %) contain only an object, and 3,078 (12.2 %) only a verb. Structures without objects involve sentences with intransitive verbs that for the most part have SV order (2,872 clauses, 11.3 %, e.g., la lune disparaît ‘the moon disappears’), and a few VS (185 clauses, 0.7 %, e.g., continue, Nathan! ‘continue, Nathan!’). Among the 15,602 clauses containing both arguments, the vast majority follow the subject-object order (13,471 clauses, 86.3 %), the canonical SVO order being the most frequent (10,417 clauses, i.e., 66.8 % of the two-argument clauses and 41.2 % of all clauses, e.g., le sapin cache le tout ‘the tree hides everything’). Overall, 64.5 % of all clauses start with the subject (representing 87.5 % of the clauses that contain a subject).

Among those clauses that contain an object and for which information in regard to head-direction is therefore available (19,175 clauses, 75.8 %), about two-third follow the canonical VO order (13,048 clauses, 68.0 %), either in declarative SVO sentences (10,417 clauses) or in various types of imperatives (VO: 2,538 clauses, 10.0 %, e.g., lance la balle! ‘throw the ball!’; VOS: 80 clauses, 0.3 %, e.g., pousse le tracteur, Anna! ‘push the tractor, Anna’; VSO:

13 clauses, 0.1 %, e.g., regarde, Tim, le tambour! ‘Watch, Tim, the drum!’). Focusing on the remaining third with the non-canonical OV order (6,127 clauses, 32.0 %), about half consist in preverbal object clitics within SOV structures (3,041 clauses, e.g., je l’ai lancé ‘I it have thrown’).

The other half all instantiate object fronting in sentence-initial position, which is found in imperatives (OV: 1,035 clauses, 4.1 %, e.g., les hérissons, regarde! ‘the hedgehogs, watch!’), object questions (OVS: 69 clauses, e.g., que prépares-tu? ‘what prepare you?’), and OSV

sentences (1,982 clauses) involving object questions (e.g., qu’est-ce que tu lui dis? ‘what you him tell?’), object relatives (e.g., c’est Marie que l’on voit ‘it is Mary that we see’) and object topicalization (e.g., ta poupée, tu la mets dedans ‘the doll, you it put inside’). Across all clauses, object fronting represents about 7.8 %.

Figure 5

Distribution of sentence types in French child-directed speech

The object topicalization structure used in Study 2 occurs in only 27 clauses (0.1 %, see example in (7)), all with a pronominalized subject. Other types of dislocations are also found: 52 object right dislocations (see example in (6)), 103 subject left dislocations (see example in (5)) and 228 subject right dislocations (see example in (4)). Taken altogether, dislocations occur in about 1.6 % of the clauses, a rate which is slightly lower than that observed in French child-directed speech corpora with slightly older children (1–3 years), where 4.8 % of the clauses involved a dislocation (Dautriche, 2012). Still, both corpus analyses show that object left dislocations are the least frequent, while subject right dislocations are the most frequent.

Discussion

The analysis of word order distributional characteristics in French child-directed speech reveals four important facts. First, 62 % of the sentences contain both an overt subject and an overt object. Second, among two-argument sentences, the subject is positioned before the object (SVO, SOV, VSO) in the vast majority of the sentences (86 %) and the canonical SVO order occurs in 67 %. Third, with respect to head-direction, 76 % of all sentences contain an object.

Among those sentences, 68 % have it in its canonical, post-verbal position. These percentages go down to 52 % VO and 24 % OV sentences when considering their distribution over the whole corpus. Fourth, the structure tested in Study 2, object topicalization, is extremely rare, representing 0.1 % of all clauses.

One first important conclusion from this corpus study is that before their second birthday, children were rarely exposed to object topicalization in French, but can already comprehend this sentences type. Moreover, the few instances of object topicalization in the corpus were not even the same as the one used in Study 2, as these instances all contain a pronominalized subject whereas full subject NPs were used in our experimental sentences. This suggests that, in contrast to what is assumed in usage-based approaches, the kind of knowledge that underlies children’s comprehension of object fronting does not consist in a stored template of that particular structure.

Furthermore, object topicalization violates some major distributional facts of French worder order: it starts with the object, while 65 % of all corpus sentences start with a subject, and the object precedes the subject, while 86 % of the sentences in our corpus have subject-object order.

This suggests that children’s comprehension of OSproV sentences can go against distributional regularities of the language. We elaborate on the nature of their knowledge in the General discussion.

A second line of reflection concerns the cross-linguistic picture that arises from the various studies using the same intermodal preferential looking paradigm with pseudo-verbs and conducted on languages for which distributional information from corpus analyses is available (Hindi-Urdu: Gavarró et al., 2015; Mandarin: Yeh, 2015; Japanese: Matsuo et al., 2012, and Omaki et al., 2012). An interesting cross-linguistic variation comes from the finding that while French, Hindi-Urdu and Mandarin children comprehend sentences with canonical word order as early as 17–19 months of age, Japanese only appear to succeed at age 28–32 months (Matsuo et al., 2012; Omaki et al., 2012). We examine the languages with respect to (a) the rate of two-argument structures, (b) the rate of sentences containing an object, (c) the rate of sentences depicting the canonical order over all sentences containing an object, and (d) the rate of object drop in transitive contexts. With respect to (a), only 12 % of the sentences contain both arguments in Japanese: that is much lower than in Mandarin (43 %), French (62 %) and Hindi-Urdu (69 %).

With respect to (b), 43 % of the Japanese sentences contain an overt object against 61 % in Mandarin, 76 % in French and 85 % in Hindi-Urdu. With respect to (c), 62 % of the sentences containing an object have the canonical position in Mandarin, 68 % in French, 85 % in Hindi-Urdu, while here Japanese shows a higher stability with 98 % of the sentences adopting the

canonical OV order. With respect to (d), object drop is found in 34 % in Mandarin, 22 % in Japanese, and less than 15 % in Hindi-Urdu. French does not have the option of object drop.

In sum, the most striking difference between Japanese and the three other languages at hand concerns the rate of two-argument structures: Japanese children have considerably less opportunities than other children to selectively assign both subject and object roles within a sentence, since only 12 % of the sentences contain both arguments. Moreover, they also have less opportunities to encode the canonical position of the object given that less than half of the sentences contain an object. The fact that Japanese provides highly reliable word order information, higher than other languages, does not seem to compensate the fact that this information is rarely available in the input. In addition, among the 48 % single-argument sentences, 65 % are OV but 35 % are SV: thus, in one third of these sentences, the preverbal single argument is actually not the object but the subject. Of course, Japanese contains case markers that allow disambiguation, but case markers are often dropped (available in only 9 % of the transitive sentences; Matsuo et al., 2012).

It is thus plausible that the delay in assigning correct agent and patient roles found in Japanese children is due to the impoverished input. Yet, the poor performance found in intermodal preferential looking experiments does not necessarily mean that Japanese children do not have developed grammatical knowledge about head-directionality; it may also be due to difficulties to efficiently assign thematic interpretations to sentences with two overt arguments (Mazuka, 1998).

In the General discussion, we address the possibility that the impoverished Japanese input may be responsible for either a delay in grammatical development or a delay in parsing strategies.

2.2.5 General discussion

We reported two eye-tracking experiments using the intermodal preferential looking paradigm to explore French-learning 19–22-month-olds’ ability to comprehend sentences with canonical and non-canonical word orders and pseudo-verbs. Taken together with other observations collected on French-exposed children at that age with the same paradigm, data can be summarized as follows:

i) Interpretation of grammatical NVN sentences: The SVO preference over S1VS2 shows that when there is a NP after the verb, it is interpreted as its object, and not as a subject (Franck et al., 2013).

ii) Interpretation of grammatical NVN sentences: The SVO preference over OVS shows that the postverbal NP, and not the preverbal NP, is interpreted as the object (Study 1).

iii) Interpretation of ungrammatical NNV sentences: The absence of preference for SOV or S1S2V shows that when the sentence is ungrammatical, the preverbal NP is not interpreted as the object (Franck et al., 2013).

iv) Interpretation of grammatical NNproV sentences: The OSproV preference over SOproV shows that when the sentence is grammatical, the preverbal NP is not interpreted as the object, while the fronted NP is interpreted as the object, despite occupying a non-canonical position (Study 2).

The combination of these data points provides consistent evidence that French-exposed children aged 19–22 months have abstract knowledge not only of the canonical position of the object, but also of the non-canonical position it can occupy in structures involving object movement. Two radically different views have been developed in the literature to account for word order development: the lexical hypothesis (usage-based approach; e.g., Tomasello, 2000), and the grammatical hypothesis (parametric approach of grammar; e.g., Hyams, 1986; Wexler, 1998). The two hypotheses differ on two main tenets: the assumption of an early stage of lexicalized word order representation, and the role of distributional information in the input. We discuss them in turn in regard to empirical evidence, and end with a brief discussion of the mechanism that best fits the data.

Is word order represented in a lexical format in young children?

The hypothesis that during the first years of life, word order is represented in terms of lexicalized knowledge tied to each verb that the child knows received most of its support from experimental work using the WWO paradigm (Abbot-Smith et al., 2001; Aguado-Orea et al., 2019; Akhtar, 1999; Matthews et al., 2005, 2007). In this paradigm, the experimenter presents children with sentences violating the word order of the language and then asks them to describe new scenes. Two key empirical findings of WWO studies arose from these studies. First, when confronted to novel verbs, children around age 2–3 would tend to reproduce the ungrammatical orders more often than children around age 4–5. Second, the younger children would tend to correct ungrammatical orders less often than the older children. Both arguments have been found to be unfounded upon critical inspection of various aspects of those studies including logical reasoning, methods, data analyses (Franck & Lassotta, 2012). The authors pointed that missing

data is extremely high in some studies (sometimes more than 90 %), and that it is critically not missing at random. As a result, when taking them into account, younger children appear to reproduce ungrammatical word orders at a rate similar to older children and both groups reproduce the grammatical word order more often than ungrammatical word orders. When children (younger and older) modified the ungrammatical word order presented, that modification systematically resulted in producing the grammatical word order, and never another possible ungrammatical word order, which would be expected if they had no representation of what is grammatical and what is not. Importantly, while children (younger as well as older) show the hallmarks of grammatical productivity (like pronominalizations, past tense markers, auxiliaries) in their sentences with grammatical order, they never did so when producing ungrammatical orders, which they consistently reproduced rigidly. This suggests that a very different mechanism underlies grammatical and ungrammatical productions: while the former are generated on the basis of the grammatical machinery, the latter reflect a mechanism of inflexibly imitating the experimenter’s sentence structure, a possibility which was further supported by the finding that children’s performance is modulated by the social context (Chang et al., 2009).

Beyond the fact that the arguments from WWO studies themselves are untenable, results from Studies 1 and 2 suggest that if such a lexical stage of representation were to take place, this would be way before the child’s second birthday.

What is the role of distributional information in the development of word order?

The lexical hypothesis also assumes that word order knowledge is tributary to the frequency with which verbs are encountered in the input, and that massive input is necessary for that knowledge to generalize into abstract rules around age 4. Studies within that framework have for the most part focused on one specific type of statistical information: verb frequency. Frequent verbs are assumed to involve stronger representations that will generalize into abstract knowledge earlier than rare verbs. In line with that prediction, studies using the WWO paradigm have found that frequent verbs give rise to less reproductions of the ungrammatical word orders presented to children, and to more corrections (see Aguado-Orea et al., 2019, for a recent review). However, and critically, lexical frequency effects are typically found in both younger and older children (Franck & Lassotta, 2012). More generally, substantial evidence has been reported for lexical frequency effects on sentence processing in adults (ever since the seminal work by MacDonald et al., 1994). The fact that lexical frequency affects not only younger children but also older children

as well as adults, who are assumed to have mature grammatical representations, suggests that lexical frequency effects cannot be taken as a signature of a lack of grammatical knowledge, but are plausibly grounded in parsing mechanisms.

Lexical frequency is not the only distributional factor affecting children’s performance:

word order frequency itself, both within languages and across languages, plays a role in children’s sentence comprehension. With respect to within-language effects, a wide range of studies have shown that sentences with frequent word orders are comprehended earlier and better (e.g., subject vs. object relatives; Guasti et al., 2018). Along the same lines, a cross-linguistic study revealed that SVO sentences (simple canonical transitives) are mastered earlier and processed faster by learners of a language where this sentence type has a higher frequency (like English) than by learners of a language where it is less frequent (like Mandarin, due to frequent argument drop;

Candan et al., 2012). Importantly, psycholinguistic research consistently reported that adults are also sensitive to word order frequency, showing not only slower processing times with infrequent orders (e.g., Trueswell et al., 1993), but also a substantial rate of comprehension errors with infrequent orders (e.g., Ferreira, 2003). Again, the fact that the frequency with which a particular word order appears in the input not only affects children’s performance but also adults’ suggests that this factor plays a role on parsing mechanisms, rather than on the mechanisms by which word order knowledge develops.

The current study brings further insight about how word order frequency affects very young children, since data were collected both on a frequent SVO structure in Study 1 (41 % of all clauses of the corpus Study 3) and on a very infrequent OSproV structure in Study 2 (< 1 %).

While the preference for the correct interpretation showed up already at the first SVO sentence-video pairing, it only showed up at the third OSproV sentence-video pairing. Hence, the data show that children need more time to parse the non-canonical OSproV structure than the canonical SVO structure, but succeed with both, suggesting that they do have the grammatical knowledge to build the correct parse, and that differences in timing reflect differences at the level of parsing mechanisms.

Furthermore, our cross-linguistic comparison of word order frequency in French, Japanese, Hindi-Urdu and Mandarin (see Discussion of Study 3) pointed out that simple canonical transitives are understood earlier by learners of languages where this sentence type is frequently available in the input (SVO in French and Mandarin as well as SOV in Hindi-Urdu) than by learners of languages with only a few instances of this structure in the input (SOV in Japanese).

Both the frequency with which children are exposed to such two-argument sentences, and the

frequency with which they are exposed to sentences containing an object are particularly low in the Japanese input in comparison to the three other languages, suggesting that they play a role in the delay of their performance. We consider that this delay reflects processing difficulties in Japanese children, but not delayed acquisition of grammatical knowledge for two reasons. First, Japanese 8-month-olds already know the surface statistical/prosodic correlates of head-direction (Gervain et al., 2008). Although this does not attest to their ability to use these surface characteristics of heads and complements to assign thematic roles, it suggests that grammatical knowledge is on its way. Second, results from our intermodal preferential looking studies have shown that children need more time to process a structure that is rare in the input (the interpretative preference for the object topicalization only appeared after five sentence presentations, among which three were sentence-video-pairings, in Study 2) than to process a frequent structure (the preference for SVO already appeared after two sentence presentations, among which one was a sentence-video pairing, in Study 1). In the Japanese experiments, while 27-month-olds presented with only three test sentences (i.e., three sentence-video pairings; Omaki et al., 2012) failed to reach the correct parse of the canonical (but rare) SOV structure, 28-month-olds exposed to six sentence presentations (among which three were sentence-video pairings; Matsuo et al., 2012) processed it successfully. Again, these observations show that when children have sufficient time to process experimental sentences, they reach the correct interpretation, which strongly suggests

frequency with which they are exposed to sentences containing an object are particularly low in the Japanese input in comparison to the three other languages, suggesting that they play a role in the delay of their performance. We consider that this delay reflects processing difficulties in Japanese children, but not delayed acquisition of grammatical knowledge for two reasons. First, Japanese 8-month-olds already know the surface statistical/prosodic correlates of head-direction (Gervain et al., 2008). Although this does not attest to their ability to use these surface characteristics of heads and complements to assign thematic roles, it suggests that grammatical knowledge is on its way. Second, results from our intermodal preferential looking studies have shown that children need more time to process a structure that is rare in the input (the interpretative preference for the object topicalization only appeared after five sentence presentations, among which three were sentence-video-pairings, in Study 2) than to process a frequent structure (the preference for SVO already appeared after two sentence presentations, among which one was a sentence-video pairing, in Study 1). In the Japanese experiments, while 27-month-olds presented with only three test sentences (i.e., three sentence-video pairings; Omaki et al., 2012) failed to reach the correct parse of the canonical (but rare) SOV structure, 28-month-olds exposed to six sentence presentations (among which three were sentence-video pairings; Matsuo et al., 2012) processed it successfully. Again, these observations show that when children have sufficient time to process experimental sentences, they reach the correct interpretation, which strongly suggests