• Aucun résultat trouvé

BALI: a tool to build experimental materials in psycholinguistics

N/A
N/A
Protected

Academic year: 2021

Partager "BALI: a tool to build experimental materials in psycholinguistics"

Copied!
1
0
0

Texte intégral

(1)

BALI: a tool to build experimental materials in psycholinguistics

Christophe Coupé

1,2

1Laboratoire Dynamique du Langage, CNRS - Université Lyon 2, 2Institut Rhône-Alpin des systèmes complexes

Target

Offer a flexible and intuitive software that assists scientists in designing ex- perimental materials involving sets of items and multiple constraints.

Background

Many experiments require building materials i) contrasting specific modali- ties and ii) averaging out factors out of the focus of the study.

• E.g. building lists of words contrasted in terms of emotional valence, but similar in terms of frequency, number of letters or syllables etc.

• As the number of factors increases, picking and switching items ‘manual- ly’ becomes difficult.

• A NP-complete problem in theoretical computer science, meaning that fin- ding the best solution is too demanding (i.e. favor heuristic approaches)

• Software already exists to assist psycholinguists in creating lists (Van Cas- teren & Davis, 2006; 2007), but is often limited in functionality.

Theoretical approach

A heuristic optimization paradigm mimicking natural selection in biology:

genetic algorithms (Holland, 1975)

• Successive generations of genomes, describing the parameters of solutions to a given problem

• A fitness function mathematically translating this problem, and used to eva- luate the genomes of each generation

• The selection of the highest ranked genomes, and their mutation to produce the next generation of ‘offsprings’

• A repetition of this cycle until a satisfying solution is reached

Find how to turn the design of experimental materials into a constraint satis- faction problem (CSP)

Rely on genetic algorithms to find good solutions to this CSP.

Features of BALI (Ba lancing Li sts)

Algorithm

Each item can be defined by any number of open or closed variables

• E.g. frequency in a corpus, syntactic category, number of syllables etc.

Genomes consist in 2-dimensional arrays of items

• Items sorted in columns and lines (i.e. not only 1-dimensional lists)

• Allows for more freedom when preparing experimental materials A number of constraints can be applied to the genomes

• To either a colum or line, each column or line, all columns or lines, the whole array, between columns or lines (depending on the constraint)

• To avoid or require specific items, to minimize, maximize or equalize the values of an open variable, to equalize variances, to request a specific class of a closed variable etc.

Attributing identical or different weights to the constraints

• Each constraint is first assessed independently to normalize its contribu- tion to the fitness function

• Each constraint can be further weighted in the optimization process

A good solution may be obtained depending on the possibility to satisfy each constraint independently and the interactions between constraints.

Software

• Written in a low-level language (C++) for better performances

• Gtk-based interface - no command lines and portability

Procedure

• Load a set of items in which to pick the components of the desired array

• If needed, apply a preliminary filtering (e.g. to retain only nouns)

• Define the dimensions of the array

• Define the constraints

• Launch the optimization process (usually lasts around 1 minute)

References

1. Holland, J. (1975). Adaptation in Natural and Artificial Systems. The MIT Press.

2. Van Casteren, M. & Davis, M. H. (2006). Mix, a program for pseudorandomization. Behavior Research Methods 38(4):584-589.

3. Van Casteren, M. & Davis, M. H. (2007). Match: A program to assist in matching the conditions of factorial experiments. Behavior Research Methods 39(4):973-978.

LYON

UNIVERSIT DE Institut des systmes complexes Complex Systems Institute Rh™ne-Alpes

IXXI

Contact: Christophe COUPE, ccoupe@ish-lyon.cnrs.fr

Main panel

Optimal array obtained after a few thousands cycles of the genetic algorithm; cells can be colored according to the va- lues of the different variables

Choice of constraints

Panel presenting the constraints, their scores when evaluated inde- pendently, and how well they in- teract together

Références

Documents relatifs

5 In his Textbook of Family Medicine, McWhinney framed generalism as a princi- ple: “Family physicians are committed to the person rather than to a particular body of

A best practices approach to any endeavour is to start with good enough and raise the bar to achieve excel- lence—because being an excellent doctor should not

The model gives the traffic information (speed, type of vehicle, traffic lane, traffic flow, etc.) required for noise emission computations the eventual construction of noise

A variable such that the initial portion of its name is the concatenation of the name for a ProNET type network interface followed by the octet string represented symbolically

INPUT: 331M owl:sameAs (179.67M terms) Assign each term to an identity set.. (algorithm described in

Application of Database Management to Geographical Information Systems (GIS); Geospatial Databases; spatial data types [point, line, polygon; OGC data types], spatially-extended SQL

This allows to state problems concerning the support of the continuum, such as confinement problems (the rats should leave the city, or the shepherd dogs should keep sheeps inside

Indeed, not only should the texts be relevant for the determinologisation process (in terms of levels of specialisation, genres and publication dates), but they must