HAL Id: hal-00624905
https://hal.archives-ouvertes.fr/hal-00624905
Submitted on 20 Sep 2011
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
A cognitive computational model of eye movements investigating visual strategies on textual material
Benoît Lemaire, Anne Guérin-Dugué, Thierry Baccino, Myriam Chanceaux, Léa Pasqualotti
To cite this version:
Benoît Lemaire, Anne Guérin-Dugué, Thierry Baccino, Myriam Chanceaux, Léa Pasqualotti. A cogni- tive computational model of eye movements investigating visual strategies on textual material. CogSci 2011 - 33rd annual meeting of the Cognitive Science Society, Jul 2011, Boston, MA, United States.
pp.1146-1151. �hal-00624905�
A cognitive computational model of eye movements investigating visual strategies on textual material
Benoît Lemaire ([email protected]) LPNC, CNRS & University of Grenoble, France
Anne Guérin-Dugué ([email protected]) Gipsa-lab, CNRS & University of Grenoble, France
Thierry Baccino ([email protected]) Lutin Userlab, CNRS & Cité des Sciences et de l'Industrie, Paris, France
Myriam Chanceaux ([email protected]) LPC, CNRS & University of Provence, Marseille, France
Léa Pasqualotti ([email protected])
Lutin Userlab, CNRS & Cité des Sciences et de l'Industrie, Paris, France
Abstract
This article presents a computational model of the visual stra- tegies involved in processing textual material. An experiment is presented in which participants performed different tasks on a multi-paragraph page (searching a target word, searching the most relevant paragraph according to a goal, memorizing paragraphs). The proposed model predicts eye movements based on 5 parameters. The weighting of parameters is determined for each task by means of a multidimensional comparison of participant and artificial scanpaths.
Keywords: Computational model; Eye movements; Visual strategy; Text.
Introduction
Reading a text is a complex task which has been widely studied in cognitive science. Several models have been proposed to account for the peculiarities of human eye movements and especially the sequence of fixations and saccades that can be nowadays easily observed and recorded. For instance, EZ-Reader (Reichle, 2003) proposes a detailed model of how low-level processes such as oculomotor control, attention, visual processing and word identification combine to produce a relevant scanpath. In addition to a theoretical framework, EZ-Reader offers a computational model which can be run on a specific text.
Those models are models of reading. A typical reading scanpath is a sequence of short forward saccades followed by a long backward saccade going to the beginning of the next line, then short forward saccades, etc. until the end of the text. Not all words are fixated and there can be short regressive saccades (up to 20% of all fixations) but the general shape looks like that. However, texts can be processed in different ways: when you are searching information on a web page, not all the words of all the lines are processed. Sometimes, a specific word tells you that the current sentence is probably not relevant and you jump a
few lines. You can also quickly choose to abandon the current paragraph and move to another one.
Another way to process a text is to search for a particular word. The scanpath then looks even more different: only some words are fixated in a very fast browsing of the text.
However, if you read to learn the text, you will show short forward saccades as usual, but also a high proportion of regressive saccades, even moving to previous lines, in order to make sure that information is correctly stored in memory.
Simola et al. (2008) showed that different tasks on textual material produce different kind of scanpaths.
Carver (1990) distinguished five kinds of processes (visual strategies), based on variations of reading rates:
• Scanning is performed at 600 words/min and is used when readers are looking for a particular word;
• Skimming is used when readers need to get a quick overview of the content of the text (450 words/min.);
• Rauding is normal reading (300 words/min.);
• Learning is performed at 200 words/min. It is used when readers try to acquire knowledge from the text;
• Memorizing is used when readers want to memorize the text, therefore constantly verifying that information have been memorized (138 words/min.).
These processes differ in reading rates, but also in the length of saccades, fixation durations and number of regressions.
The aim of the present study was to design a cognitive
computational model of eye movement that would account
for all these strategies. The idea is to base this model on a
very small number of parameters that can generate this
variety of scanpaths, when appropriately tuned. The first
purpose is to know the contribution of each of these
variables in the production of the scanpath. For example, the
spatial distance to the next fixation (saccade amplitude) is a
key variable in rauding (words that are spatially close are
much more likely to be selected than distant words) whereas
it is not as important in scanning.
The second goal is to produce a general model of eye movements on texts which could easily adapt to high-level changes. For instance, a user may be looking for some information, first engaging in a skimming task, then switching to a learning process for a while, then moving to a scanning process because a specific word that occurred previously has to be reread in context. Our claim is that these processes are along a continuum. It is therefore interesting to model this behavior in a continuous way.
In order to build the model, we first gathered experimental data on different ways of processing a text.
Experiment Procedure
An experiment in which participants would generate various kinds of scanpaths was designed. Three tasks were defined:
• Searching for a particular word in the page. This task is likely to generate scanning scanpaths.
• Searching among a set of paragraphs the one which best matches a given goal. For instance, if the goal is “planet observation”, the participant has to select the paragraph which is about that topic, although the paragraph may not contain those words: search has to be done based on semantics. In order to obtain rich scanpaths, several paragraphs may correspond to the goal; participants have to select the closest one. This task is likely to generate skimming scanpaths.
• Reading paragraphs in order to be able to answer comprehension questions afterwards. This task is likely to generate memorizing scanpaths.
Only 3 of the 5 processes defined by Carver were used, but, as we show later, the proposed model is not limited to them.
Materials
20 pages were generated in French. Each page was associated with a specific goal (for the skimming task).
Examples of goals were tribunal international (international tribunal), réhabilitation des logements (housing renovation), associations humanitaires (humanitarian associations), etc.
One target word per page was defined for the scanning task.
Seven paragraphs were produced for each page. In order to control the semantic relatedness of paragraphs to goals, Latent Semantic Analysis (Landauer et al., 2007) was used, a method to compute semantic similarities between texts.
LSA was trained on a 24 million word French corpus composed of all articles published in the newspaper “Le Monde” in 1999. A 300 dimension space was generated from the corpus, by means of a singular value decomposition of the word x paragraph occurrence matrix (see. Martin & Berry (2007) for more details). Each word of the corpus being represented as a 300 dimension vector, new texts can also be represented as vector by means of a simple sum of their words. A cosine function was used to compute
the similarity between vectors. The higher the cosine value, the more similar the two texts are.
From the seven paragraphs designed, two were highly related with the goal (cosine with the goal above .40), two were moderately related (cosine between .15 and .30) and three were unrelated (cosine below .10). In order to have a more realistic situation, an image was also included in the page as well as a banner. Figure 1 presents an example of a page. All paragraphs were organized into the page according to a layout that was randomly selected. There were eight versions of each page, in order to ensure that paragraphs are not processed in the same order.
Because the exact coordinates of words were needed for simulations, all pages were generated by a piece of software of our own which generates the image file and the word coordinates. The font was BitstreamVeraSans 12pt.
Participants
13 participants were recruited in the scanning condition, 8 in the memorizing condition, 34 in the skimming condition.
All participants saw the 20 pages in random order. All scanpaths were recorded using a SR Research Eyelink 2 eyetracker. The images were presented on a 19 inch CRT monitor at a viewing distance of 50 cm.
Model
The main issue of the current model was to select which word to fixate next among all words in the paragraph, using a limited number of variables. That problem can be viewed as an iteration of two steps: weighting all words and selecting the best weighted one.
There are two ways for a variable to weight words: either by increasing the weight values of words likely to be fixated or by decreasing the weight values of words that will probably not be fixated. Some variables thus aim at selecting interesting words, other decrease the weight value of uninteresting words.
Figure 1: Example of page used in the experiments.
In order to present the variables used, let us describe how the different processes operate. Each process will correspond to a specific combination of these variables
1. Scanning
Scanning is the fastest strategy. The aim being to find a particular word (the target), it is likely that users tend to prefer words which match with that target. Since almost all words can only be viewed in peripheral vision, the weighting can only be done on similarity of shape by a kind of pattern-matching process. Shape similarity with the target is therefore the first variable. This variable will probably not be used by the other processes which do not rely on a target word. In addition, it is likely that the scanning process shows longer saccades compared to rauding. The hypothesis is that the closer the process is to classical reading, the shorter the saccades. Distance to the current fixation is therefore our second variable: words spatially close to the current fixation will be preferred.
Scanning is a process which will probably not need a high weight to that variable, as opposed to rauding for example.
Skimming
Skimming differs from scanning in that it takes the content into account. However, not all words need to be fixated in order to keep a high processing speed. For the same reason as before, the decision to select a word or not can only be done under peripheral vision. Although the general shape of a word is certainly not related to its meaning, it is likely that users tend to prefer long words which are known to be more meaningful. Word length is our third variable. It is possible that others processes rely on that variable, but probably to a lesser extent than the skimming process.
Rauding
Rauding is normal reading. Almost all words have to be fixated. Therefore, the linear sequence of words becomes important in order to preserve the meaning of sentences.
Saccades towards the next word tend to be the rule. These saccades are therefore mostly horizontal (including the long saccade going to the beginning of the next line). Saccade horizontality is therefore our fourth variable: it would give higher weights to words reachable with an horizontal saccade. Scanning would probably give a low weight to that variable because saccades may jump from one line to ano- ther. Instead, the number of intervening words between the previously fixated word and the current fixated word could have been used as a variable. That value would be close to 0 in rauding, larger than 0 but positive in skimming and sometimes negative in memorizing. However, that variable would not have captured the fact that in 2D fixating a distant word in the text may result in a short saccade.
1