• Aucun résultat trouvé

Linguistic Complexity and Information Rate: Quantitative Approaches!

N/A
N/A
Protected

Academic year: 2021

Partager "Linguistic Complexity and Information Rate: Quantitative Approaches!"

Copied!
1
0
0

Texte intégral

(1)

Linguistic Complexity and Information Rate: Quantitative Approaches!

Yoon Mi OH!

Laboratoire Dynamique du Langage - UMR 5596, Université de Lyon 2, yoon-mi.oh@univ-lyon2.fr!

References

Ferrer i Cancho, R., & Díaz-Guilera, A. (2007). The global minima of the communicative energy of natural communication systems.

Journal of Statistical Mechanics: Theory and Experiment, 2007, P06009.!

Pellegrino, François, Coupé, C. and Marsico, E. (2011). A cross- language perspective on speech information rate. Language, 87:3.!

Conclusions and further work

By adding more languages with distinctive phonological features to our project, we aim to observe a negative correlation (trade-off) between information density and syllable rate, which regulates information rate in our hypothesis. !

In future, the notion of complexity which is currently limited to phonological level will be expanded to morphosyntactic level.!

Objective and Hypothesis

What?

Main goal of the present project is to investigate the relations between linguistic complexity and information rate. !

Why?

According to the cross-linguistic research in the laboratory DDL (Pellegrino et al., 2011), there is a negative correlation between syllable complexity and speech rate. The more complex the syllable, the slower the transmission of information.!

How?

by adding more languages with various syllable structure and phonological inventory. !

by analyzing multilingual oral and text corpus of 12 languages. !

About the corpus

For analyzing each language, two types of corpus are required.!

① Oral corpus

▷ made up of 20 texts translated from the original texts in English with slight modification if necessary. The same semantic information!

▷ 10 native speakers (5M & 5F) are recorded for each language. !

② Text corpus

▷ large amount of plain text corpus which contains more than 60k words, in order to get a usage-based syllable frequency list. !

❕In case of information density and information rate, corresponding values were calculated respectively by pairwise comparisons of the length of data (number of syllables) and the mean duration of data, using Vietnamese as an external reference.

② Text corpus

▷ for calculating syllabic inventory (SI), syllable complexity (SC) and syllabic entropy (H: cognitive cost of using a syllable (Ferrer i Cancho et al., 2007)). !

▷ Automatic syllabification by specific rules for each language

syllable frequency list syllable inventory and syllable complexity!

Syllable complexity (SC): number of syllable constituents !

Syllabic entropy (HL) !

(NL = syllable inventory, i = each syllable, pi = frequency of each syllable) !

❕You can also download a PDF-version of this poster on my personal page of DDL website at www.ddl.ish-lyon.cnrs.fr or here.

Preliminary results

① Comparison of information density, syllable rate and information rate

Methodology

① Oral corpus

▷ for calculating syllable rate (SR: number of syllables uttered per second), information density (ID: amount of linguistic information per syllable) and information rate (IR: amount of information transmitted per unit of time).!

Silence intervals longer than 150ms were removed.!

② Relation between information density and syllabic entropy

Figure 1 shows a comparison of information density, syllable rate (left axis for both) and information rate (right axis) and illustrates similar information rate values regardless of distinct differences between their information density and syllable rate values. Trade-off between information density and syllable rate

Figure1: Comparing information density (ID), syllable rate (SR) & information rate (IR) of 12 languages (JA: Japanese, SP: Spanish, BAS: Basque, CAT: Catalan, TUR: Turkish, KOR:

Korean, IT: Italian, FR: French, GE: German, WO: Wolof, MA: Mandarin, EN: English)!

The strongest correlation is observed between information density (ID, x-axis) and syllabic entropy (H, y-axis) (Pearson’s cor = 0.86, p- value = 0.0004, Spearman’s rho = 0.81, p-value = 0.002).!

It reveals that there is a close correlation between syntagmatic dimension (information density: the encoding of linguistic information) and paradigmatic dimension (syllabic entropy: the distribution of syllable frequencies) of linguistic complexity.

!

Figure2: Correlation between information density and syllabic entropy!

0.0!

0.2!

0.4!

0.6!

0.8!

1.0!

1.2!

0!

1!

2!

3!

4!

5!

6!

7!

8!

9!

10!

JA! SP! BAS! CAT! TUR! KOR! IT! FR! GE! WO! MA! EN!

ID*10!

SR!

IR!

5.5!

6.0!

6.5!

7.0!

7.5!

8.0!

8.5!

9.0!

9.5!

10.0!

0.3! 0.4! 0.5! 0.6! 0.7! 0.8! 0.9! 1.0!

JA!

BAS!

SP!

TUR!

CAT! KOR!

IT!

FR!

GE!

WO!

EN!

MA!

Références

Documents relatifs

Generalizing the ideas underlying the construction of U for these two rate points, we derive new rate points on the boundary of the rate region, which are below the time-sharing

In this position paper we describe the actual state of the development of an integrated set of tools (called SCHUG) for lan- guage processing supporting interaction with

A drastic change in the first moment of the price of an asset, in fact, may well be the resuit of a shock of extraordinary large dimensions (an improbable draw) when the variance

The above-referenced channel is now used to model the transmission of strategic information in a long-run game with signals that is, a game where a given player has a certain

However, the material removal rate (MRR) during polishing depends on parameters such as contact pressure, velocity, tool path and tool wear.. We have thus developed a

tion carried by syllables is proportional to their complexity (defined as their number of constituents), then information density is positively related to the average syllable

This function works very similar to Factor Oracle except that (1) it accepts continuous data flow rather than symbolic data, (2) does not assign symbols to transitions and instead

Threats to Information are not specific only to computers and computer networks, they are present in nearly any Information Processing System (IPS) - social and business