• Aucun résultat trouvé

Building Interaction Profiles for Better Search Tools in DLs

N/A
N/A
Protected

Academic year: 2022

Partager "Building Interaction Profiles for Better Search Tools in DLs"

Copied!
1
0
0

Texte intégral

(1)

Building Interaction Profiles for Be�er Search Tools in DLs

Maram Barifah

Università della Svizzera italiana (USI) Faculty of Informatics, Lugano, Switzerland

maram.barifah@usi.ch This research starts by considering users of a digital library (DL) and

aims at using the data extracted from logged�les, including search strategies, and queries, to build e�ective user-interaction pro�les.

and use them to guide designers and systems developers in the production of more usable, useful and e�ective interaction pro�les.

It is important to stress that the proposed interaction pro�les are built by extracting a number of features from the log�les. Thus, they contain information about real searching experiences, including usage patterns, user familiarity with the system, and time intervals.

This study is conducted in collaboration with RERO Doc digi- tal library1. RERO Doc is the network of the libraries of Western Switzerland. Users from di�erent parts of the world can search on di�erent domains: Nursing, Economics, Computer Science and others. RERO Doc provides various document types such as books, articles, theses, periodicals,etc. Thus, the research questions are:

(1) What are the most suitable techniques to produce rich/realistic groups of data extracted from log�les in order to build inter- action pro�les?

(2) What are the main features to characterise interaction pro�les?

(3) What is the minimum size of data to produce robust groups to use for building interaction pro�les?

Data preparing and processing:

Interface analysis:the interface is inspected in order to understand di�erent search options.

Preprocessing phase:we follow the framework of [2] as the follow- ing:

Data loading:the dataset consists of 59 million records 20 GB col- lected over a six-month period.

Data cleaning:including users identi�cations hidden, elimination of the erroneous, and corrupted records.

Data parsing:consists of sessions recognition and removing the non- human sessions e.g. Googlebot, SemanticScolarBot. The session "is a common unit of interaction that is used in search log analysis"

[3]. Session recognition depends on the user interactions and on the features of the interface. This phase is crucial for identifying distinct classes of searching patterns [2, 3]. We identify a session by the combinations of user IP, time stamp, and user agent extracted from the log�les. Also, the non-human requests are removed in this stage and so data is reduced to 9 GB.

Data coding:in this phase, the URL requests were analysed and di- vided into meaningful parts including user IP, time stamp, request, referrer, user agent, session IP. Researchers follow di�erent strate- gies to analyse the URLs embodied on the log�les. For example,[1]

1https://doc.rero.ch

DESIRES, August 2018, Bertinoro, Italy

©2018 Copyright held by the owner/author(s). Publication rights licensed to Associa- tion for Computing Machinery.

build a hierarchical taxonomy of the website, and [2] de�ne a code schema of the all types of the interactions based on analysing and understanding the structure of the website. In this research we consider both strategies.

Features engineering:Based on the interface and log�les analy- sis, the meaningful features are identi�ed.

Mining user behaviour:The remaining sessions are further anal- ysed and grouped based on the available variables in the records.

We started the analysis of the�rst million record which consists of 125000 session records. 72095 sessions were detected with only one record. The aim of this phase is to identify di�erent usage pat- terns among user interactions and group them accordingly. Two main di�erent grouping techniques were used: topic modelling and K-Means. For 6 topic model, the Coherence Score of the topic modelling is 0.35. For K-Means the estimated number of clusters is 6 with Silhouette Coe�cient of 0.95. So far, six di�erent usage patterns have been identi�ed and interpreted qualitatively:

(1) Single sessions or known-item, where searchers visit RERO for downloading documents without any interactions.

(2) Complicated sessions, where the usage pattern is charac- terised by heavily interactions including submitting queries, browsing, and using di�erent functions.

(3) Light navigators who navigate the library for navigating without using di�erent functions on the interface.

(4) Advance navigators whose navigations are characterised by using di�erent functions and many iterations.

(5) Light browsers whose searching is simple, short without using di�erent functions.

(6) Advanced browsers, their interactions is long and including advance search functions.

In conclusion, log�le analysis is an unobtrusive method to de- tect usage patterns of digital library. The aim of this research in progress is to build interactions pro�les in the digital library context in order to gain more insights into users searching experiences. We plan more experiments to test and compare the e�ectiveness of the techniques used to group data for preparing interaction pro�les.

Then, we will involve experts to assess the quality of these inter- action pro�les and how e�ective these are in assisting designers and system developers in the production of more usable, useful and e�ective tools to support searchers.

REFERENCES

[1] Hui-Min Chen and Michael D Cooper. 2002. Stochastic modeling of usage patterns in a web-based information system. Journal of the Association for Information Science and Technology53, 7 (2002), 536–548.

[2] Yu Chi, Tingting Jiang, Daqing He, and Rui Meng. 2017. Towards an integrated clickstream data analysis framework for understanding web users’ information behavior.iConference 2017 Proceedings(2017).

[3] Tony Russell-Rose, Paul Clough, and Elaine G Toms. 2014. Categorising search ses- sions: some insights from human judgments. InProceedings of the 5th Information Interaction in Context Symposium. 251–254.

Références

Documents relatifs

The study of events from real data in the region of kinematic variables where a signal is expected may introduce biases in the event selection resulting in incorrect

Motile cilia, laterality and primary ciliary dyskinesia The plenary speaker Martina Brueckner (Yale University, USA) opened the scientific part of the meeting with an introduction

In our previous research we focus on top-k searching in tree-oriented data structures and we developed various top-k algorithms, which solve top-k problem efficiently and support

To be interesting as a semantic web user interface, I argue the characteristics of the underlying semantic information store should be manifest through the interface, either through

More precisely, he says the following: there exists non-real eigenvalues of singular indefinite Sturm-Liouville operators accumulate to the real axis whenever the eigenvalues of

2, we have plotted the complex quasiparticle spectrum for strong axial tilt while neglecting the ∝ ω terms in the self-energy, resulting in a flat bulk Fermi ribbon with a

Appreciation is expressed to the Social Sciences and Humanities Research Council of Canada (SSHRC) in support of the AIRS Major Collaborative Research Initiative (MCRI) (2008

Overall, this work contributes to an orientation for designers and researchers regarding the interplay be- tween user, system and context in a factory environment by pointing