• Aucun résultat trouvé

Why do we Need Games? Analysis of the Participation on a Crowdsourcing Annotation Platform

N/A
N/A
Protected

Academic year: 2021

Partager "Why do we Need Games? Analysis of the Participation on a Crowdsourcing Annotation Platform"

Copied!
3
0
0

Texte intégral

(1)

HAL Id: hal-01497088

https://hal.archives-ouvertes.fr/hal-01497088

Submitted on 28 Mar 2017

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Why do we Need Games? Analysis of the Participation on a Crowdsourcing Annotation Platform

Alice Millour, Karën Fort

To cite this version:

Alice Millour, Karën Fort. Why do we Need Games? Analysis of the Participation on a Crowdsourcing Annotation Platform. Games4NLP, Apr 2017, Valencia, Spain. �hal-01497088�

(2)

Why do we Need Games?

Analysis of the Participation on a Crowdsourcing Annotation Platform

Alice Millour

Universit´e Paris-Sorbonne / STIH alice.millour@paris-sorbonne.fr

Kar¨en Fort

Universit´e Paris-Sorbonne / STIH karen.fort@paris-sorbonne.fr

1 Introduction

Annotated corpora are necessary both to develop and to evaluate natural language processing tools.

However, building such corpora is notoriously ex- pensive. For less-resourced languages, the (lack of) availability of language experts represents yet another obstacle to overcome. To address both issues, we developed a lightweight crowdsourc- ing platform, Bisame1 (Millour et al., 2017), that aims at collecting part-of-speech (POS) annota- tions for less-resourced languages, testing it on a regional language from France: Alsatian (about 550,000 speakers). Crowdsourcing can take sev- eral forms, including microworking, citizen sci- ence and games with a purpose. We hypothesized that the speakers of a less-resourced language like Alsatian would be motivated to help developing linguistic resources for their own language, with- out the need to develop a full-fledged game.

2 Overview of the Platform

The tagset used for annotation is the Universal POS tagset, defined by Petrov et al. (2012). The annotation task is performed on a whole sentence.

Participants must train on four reference sentences before producing annotations on the raw corpus.

Depending on the results of the pre-annotation (performed by theTreeTagger(Schmid, 1997) for German, following the methodology of Bern- hard and Ligozat (2013), and MElt (Denis and Sagot, 2010), which is regularly trained on the col- lected annotations, participants have to approve or reject a suggested annotation, or pick the correct tag in a shortlist of the most probable tags.2 The only two gamified features that we introduced in this experiment are a leader board (since Novem-

1Seehttp://bisame.herokuapp.com.

2The probability of a tag is based on the confidence score associated to each annotation, which is equal to the confi- dence score of the participant (percentage of words correctly annotated on reference sentences).

ber 2016) and a progress bar (since January 2017), indicating the annotation state of the current cor- pus. The total numbers of participants and anno- tations are also displayed. As of end of Febru- ary 2017, 42 participants produced 8,833 annota- tions in 59 days (between May 2016 and February 2017). Participation peaks were due to communi- cation onFacebookor reminders by email.

3 Analysis of the Participation

Out of the 64 registered users who completed the training phase, only 42 actually produced anno- tations. Among them, 56% (10% of the annota- tions) spent one day on the platform, 33% (24% of the annotations) spent two or three days and 10%

(66% of the annotations) spent more than three days. These observations show that, besides the complexity of attracting participants (motivation) we struggle to retain them on the platform (voli- tion). They also tend to confirm a phenomenon that has already been described (Chamberlain et al., 2013): a minority of participants produces a lot. Nevertheless, while the total number of pro- ductive users has increased by 30% in the last nine months (from 31 to 42) the number of annotations per week has grown by more than 150%. Figure 1 presents the numbers of active users and produced annotations per week. Putting aside the peak ob- served in November, due to a unique user who pro- duced more than 3,000 annotations in two days, annotations are nearly equally distributed between participants. Thus, we observe that in May 2016, 29 users produced 2,944 annotations, while in Jan- uary 2017, half as many produced 7,302 annota- tions: the improvement in the interface design—

easing the annotation task from choosing a tag within a list to approving or rejecting a suggested tag—and the features described earlier, are most probably responsible for the increase in the av- erage number of annotations per user per week (from 87 to 316). This progress is encouraging

(3)

May June July

August

September October

November

December January February 0

20 40

Nb.activeparticipants

0 2,000 4,000

Nb.annotations

Nb. active participants Nb. annotations

Figure 1: Number of productive participants and annotations per week (note the different scales).

and confirms the potential of gamification to re- tain participants. We noticed that participants tend to contribute to reach a numeric goal (leaving less than 100 unannotated words on the current cor- pus, for instance), and we observed that the leader board generated ephemeral competition between participants. Yet, the positive effects of the gam- ification features tend to vanish as the activity on the platform decreases, as the results of late Febru- ary reveal.

4 Conclusion

We observed that helping develop language re- sources (therefore, natural language processing applications) for one’s language is not enough of an incentive to produce the quantity of annota- tions we need to train a POS-tagger. Previous experiments, using crowdsourcing for natural dis- aster relief (Munro, 2013), showed that it is dif- ficult to maintain the motivation of participants in voluntary crowdsourcing, even for potentially life-saving actions. However, we showed that gamification helps keeping the users participating.

The quality of the collected annotations3 and the progress made are promising. We therefore plan to tackle the issues discussed in section 3 by in- troducing new gamification features and allowing users to create their own text on the platform, fol- lowing a suggestion from Liberman (2016).

References

Delphine Bernhard and Anne-Laure Ligozat. 2013.

Es esch f`ascht wie Ditsch, oder net? Etiquetage´

3The annotations produced by the participants reach on average 93% in accuracy.

morphosyntaxique de l’alsacien en passant par l’allemand. InActes de TALARE 2013 : Traitement Automatique des Langues R´egionales de France et d’Europe, pages 209–220, Les Sables d’Olonne, France.

Jon Chamberlain, Kar¨en Fort, Udo Kruschwitz, Math- ieu Lafourcade, and Massimo Poesio. 2013. Using games to create language resources: Successes and limitations of the approach. In Iryna Gurevych and Jungi Kim, editors, The People’s Web Meets NLP, Theory and Applications of Natural Language Pro- cessing, pages 3–44. Springer Berlin Heidelberg.

Pascal Denis and Benoˆıt Sagot. 2010. Exploita- tion d’une ressource lexicale pour la construction d’un ´etiqueteur morphosyntaxique ´etat-de-l’art du franc¸ais. In Traitement Automatique des Langues Naturelles : TALN 2010, Montreal, Canada.

Mark Liberman. 2016. Oral histories: Linguistic doc- umentation as social media. In Actes de NIEUW:

Novel Incentives and Engineering Unique Work- flows, Philadelphia, USA.

Alice Millour, Kar¨en Fort, Delphine Bernhard, and Lu- cie Steibl´e. 2017. Vers une solution l´eg`ere de pro- duction de donn´ees pour le TAL : cr´eation d’un tag- ger de l’alsacien par crowdsourcing b´en´evole. In TALN - Traitement Automatique des Langues Na- turelles : TALN 2017, Orl´eans, France.

Robert Munro. 2013. Crowdsourcing and the crisis- affected community: lessons learned and looking forward from mission 4636. Journal of Information Retrieval, 16(2).

Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012.

A universal part-of-speech tagset. InActes de Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey.

Helmut Schmid, 1997.New Methods in Language Pro- cessing, Studies in Computational Linguistics, chap- ter Probabilistic part-of-speech tagging using deci- sion trees, pages 154–164.

Références

Documents relatifs

In Experiments 1 and 2 (kinematics) and in Experiment 3 (orientation), the facilitation effect was replicated, in that the response times for action verbs were faster when the

In conclusion, while Mathematics and Physics share the same construction principles and deal with generic objects, with no history, Biology can neither rely on the same

His research activity covers artificial in- telligence, natural language processing, intelligent interfaces, human-computer interaction, cognitive science, knowledge-based systems,

The international survey returned results which did not dif- fer much from the ones from the French survey, which lets us think that the question of ethics is quite independent of

The result of this paper is to establish the mechanism of mutual transformation between programming language and natural language through the research and exploration of the

Here, we conduct a comparison of four state of the art NLP tools at the task of recognizing Gene Ontology concepts from biomedical literature using the Colorado Richly

Ivashchenko M.V.: Analiz metodov avtomatizirovannogo vydeleniya klyuchevyh slov iz tekstov na estestvennom yazyke [Analysis of methods for automated ex- traction of keywords from

Hence, several methods were born to learn the best representation for a given problem, as is the case of Deep Neural Networks and Multiple Kernel Learning.. These methods reduce