• Aucun résultat trouvé

poster (PDF)

N/A
N/A
Protected

Academic year: 2022

Partager "poster (PDF)"

Copied!
1
0
0

Texte intégral

(1)Exploratory Search Missions for TREC Topics Martin Potthast. Matthias Hagen Michael Völske Bauhaus-Universität Weimar 99421 Weimar, Germany. Benno Stein. Corpus Overview Topic. We report on the construction of a new text reuse corpus comprising writing interactions and exploratory search missions.. Author. Editor. ChatNoir SE. ClueWeb API. I 150 essays (based on TREC Web Track topics 2009-2011) I 12 professional writers hired on a crowdsourcing platform I Long essay writing task, researching sources using a custom ClueWeb09 search engine. Revision log. Query log. ClueWeb. I Writing and search engine interactions recorded in high detail. Data Collection Authors. Topics. Writer Demographics Age Gender Minimum 24 Female 67% Median 37 Male 33% Maximum 65 Academic degree Country of origin Postgraduate 41% UK 25% Undergraduate 25% Philippines 25% None 17% USA 17% n/a 17% India 17% Australia 8% South Africa 8% Years of writing Search engines used Minimum 2 Google 92% Median 8 Bing 33% Standard dev. 6 Yahoo 25% Maximum 20 Others 8%. Example topic: Native language(s) English 67% Filipino 25% Hindi 17% Second language(s) English 33% French 17% Afrikaans, Dutch, German, Spanish, Swedish each 8% None 8% Search frequency Daily 83% Weekly 8% n/a 8%. Query log Corpus Distribution Σ Characteristic min avg max stdev Writers 12 Topics 150 Topics / Writer 1 12.5 33 9.3 Queries 13 651 Queries / Topic 4 91.0 616 83.1 Clicks 16 739 Clicks / Topic 12 111.6 443 80.3 Clicks / Query 0 0.8 76 2.2 Sessions 931 Sessions / Topic 1 12.3 149 18.9 Days 201 Days / Topic 1 4.9 17 2.7 Hours 2068 Hours / Writer 3 129.3 679 167.3 Hours / Topic 3 7.5 10 2.5. Obama’s family. Write about President Barack Obama’s family history, including genealogy, national origins, places and dates of birth, etc. Where did Barack Obama’s parents and grandparents come from? Also include a brief biography of Obama’s mother.. Original topic 001 of the TREC Web Track 2009: Query. obama family tree Description. Find information on President Barack Obama’s family history, including genealogy, national origins, places and dates of birth, etc. Sub-topic 1. Find the TIME magazine photo essay “Barack Obama’s Family Tree.” Sub-topic 2. Where did Barack Obama’s parents and grandparents come from? Sub-topic 3. Find biographical information on Barack Obama’s mother.. Search mission data will be made available as the Webis-Query-Log-12 (http://www.webis.de/research/corpora). Main Findings A 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. B. C. D. E. F. G. H. I. J. 047 165. 112 33. 023 20. 024 16. 044 158. 037 210. 052 58. 066 70. 064 113. 142 23. 003 18. 028 119. 140 28. 090 27. 053 196. 136 23. 080 347. 017 109. 027 248. 085 40. 018 148. 013 153. 048 113. 082 154. 110 319. 095 64. 069 26. 009 30. 150 18. 010 208. 123 35. 072 24. 026 34. 088 114. 022 284. 084 46. 102 52. 004 60. 012 52. 098 48. 029 66. 075 97. 134 50. 107 138. 040 36. 079 42. 148 34. 015 70. 014 34. 056 57. 099 120. 049 616. 126 74. 145 101. 062 62. 111 32. 118 69. 149 106. 130 4. 131 136. 039 28. 005 108. 114 98. 143 47. 089 46. 021 10. 121 55. 007 50. 139 88. 045 48. 087 198. 086 94. 031 218. 120 48. 058 198. 096 139. 091 56. 108 106. 008 323. 016 70. 133 301. 054 111. 083 69. 034 44. 065 150. 081 112. 030 76. 061 20. 019 147. 146 60. 109 74. 093 104. 038 51. 144 274. 041 48. 105 92. 060 155. 001 170 094 42 127 99. 138 241. 106 58. 097 84. 051 181. 011 40. 002 135. 035 46. 059 118. 067 185. 115 14. 116 29. 025 133. 070 61. 073 17. 124 23. 050 78. 129 24. 063 66. 055 80. 078 33. 117 68. 104 12. 141 162. 125 60. 006 76. 071 62. 128 108. 103 22. 068 42. 076 42. 135 75. 113 69. 046 18. 119 147. 042 208. 020 30. 147 24. 122 173. 137 16. 132 16. 032 52. 077 26. 057 36. 074 9. 036 60. 101 8. 043 30. 033 42. 092 74. 100 64. Spectrum of search behavior I Percentage of queries submitted over time for all 150 search missions I Ranges from majority of queries issued at the start of the task (A1) to most queries towards the end (J15) I In between, sets of queries submitted in bursts (e.g F9) or linear increase (A10) Correlation of searching and writing I Evidence of distinct text reuse strategies (build-up and boil-down) I Only the former clearly reflected in the query log. Author 24 (13 topics). 1. 1. 0.8. 0.8. Average query distribution. 0.6. 0.6. Average text length over time. 0.4. 0.4. 0.2. 0.2. 0. 0. 0. 0.2. 0.4. 0.6. 0.8. 1. 0. 0.2. 0.4. 0.6. 0.8. 1. First Conclusions I Query frequency by itself poor predictor of task completion I Heavy reliance on search engine indicates need to better support exploratory tasks. Web Technology and Information Systems www.webis.de. Author 5 (18 topics). Bauhaus-Universität Weimar.

(2)

Références

Documents relatifs

One of the obstacles for showing hardness (which is required for the cases (3) and (5)) is that the set of all triple patterns occurring in the query has to be acyclic and of

For obtaining the search results two search engines have been utilized: The Chat- noir [3] search engine for queries based on extracted keywords; and the Indri [4] search engine

Specifically, in this research, three involved languages are considered: language in which the user has set up the search tool interface, language of the collections of

Based on the characteristic variations of the distribution of terms common to source-reference pairs our query term extraction process uses the document frequency as

In this paper we research two modifications of recurrent neural net- works – Long Short-Term Memory networks and networks with Gated Recurrent Unit with the addition of an

For example, IQ-rewritability means rewritability of an OMQ from (ALCI, UCQ) into an OMQ from (ALCI , IQ) with the IQ formu- lated in ALCI, and PUBGP-rewritability means

After defining the “value suggestion problem”, we introduced three suggestion functions: an optimal one that is slow for large data sets and complex queries; a range based one that

Specifically, the percentage cube shows the fractional relationship on a measure in every cuboid between fact table rows grouped by a set of columns (detail individual groups) and