• Aucun résultat trouvé

poster (PDF)

N/A
N/A
Protected

Academic year: 2022

Partager "poster (PDF)"

Copied!
1
0
0

Texte intégral

(1)

FOR FURTHER INFORMATION

Dilution and user behaviour

Tasks, systems, and participants

Our experiment used:

• Six tasks of controlled difficulty, from Wu et al, across three types: “remember”, “understand”, and “analyse”.

• And two search systems of controlled quality. The “full”

system presented results from Yahoo!. The “diluted”

system had bad, but plausible, results at ranks 1, 3, 5, ... . Users were given the first query but allowed to issue new queries, move to later pages, or view results in a pop-up window.

34 participants completed each of the six tasks, three each (“remember”, “understand”, “analyse”) on each of the

systems (“good”, “diluted”).

Measures

We measured several aspects of user behaviour:

• Time, • Clicks, • Clicks, • Scrolls, • Queries, and • Gaze.

Faced with a poorly-performing search system, users might react in different ways: for example they might issue more queries, read more deeply, or take more time.

We controlled system quality and task difficulty to measure these reactions. Although users could tell the difference between “good” and “bad” systems, their behaviour was very similar. This

means we need to be careful interpreting user behaviour to infer system quality.

Paul Thomas (CSIRO & ANU), Falk Scholer (RMIT University), Alistair Moffat (University of Melbourne)

Results

Our users could tell the difference between the good and bad results, and did not often click on the bad ones:

But other aspects of behaviour were very similar indeed between the two systems, including views:

And queries (median 1 query for full/2 for diluted, difference is not significant);

And everything else (rate of saves 79%/75% ; reading time per snippet 0.59s/0.59s; pagination 7%/10%; reported

satisfaction 4/4 out of 5; reported difficulty 2/2 out of 5;

and so on).

So what?

Our results are consistent with observations that small

differences in e.g. MAP don’t make a real difference in user performance (Turpin & Scholer).

We need to be careful using changes in behaviour to

evaluate search systems. It’s possible to have significantly lower system performance with no noticable changes in

user behaviour at all.

Paul Thomas

e paul.thomas@csiro.au w es.csiro.au

REFERENCES

Turpin and Scholer. User performance versus precision measures for simple web search tasks. In Proc. SIGIR, 2006.

Wu, Kelly, Edwards, and Arguello. Grannies, tanning beds, tattoos and NASCAR: Evaluation of search tasks with varying levels of cognitive complexity. In Proc. IIiX, 2012.

Diluted: every odd- numbered result is bad Full: all results are normal

Final rank viewed Deepest rank viewed

Références

Documents relatifs

The adenosinergic system appears to have a key role in the adaptive response of the cardiovascular system in pulmonary hypertension and heart failure, the most relevant effects

Data extracted from patient records included num- ber of fall victims, height (floor), site of accident (window or balcony, at home or away from home), particularities in

IV-A could be used at least when T is Base (defined in Sec.II-B). Following the same scheme as in Sec. IV-A, we want to add all Coq functions from natural numbers to closed terms,

The mathematicians’ Digital mathematics library (DML), which is not to be confused with libraries of mathematical objects repre- sented in some digital format, is the generous idea

Nacher M, Singhasivanon P, Yimsamran S, Manibunyong W, Thanyavanich N, Wuthisen R, Looareesuwan S: Intestinal helminth infections are associated with increased incidence of

The Organic User Interfaces OUI vision[10, 11] describes a world where everyday objects are equipped with computing capabilities computational objects – COs.. Holman

phagocytophilum (Sinclair et al., 2014) and oncogenic factors (González- Herrero et al., 2018) behave as “epigenators” (Berger et al., 2009; Cheeseman and Weitzman, 2015) because

Those parameters are principally: invasion dynamics (determined by some characteristics of the invasive species and its intrinsic growth rate, as well as the