• Aucun résultat trouvé

Metadata of

N/A
N/A
Protected

Academic year: 2022

Partager "Metadata of"

Copied!
165
0
0

Texte intégral

(1)
(2)
(3)
(4)
(5)

Metadata Visualization of Scholarly Search Results

St. J ohn's

by

© Taraneh Khazaei

A thesis submitted to th e Schoo l of Graduate Studies

in partial fulfilment of th e requirements for the degree of

Master of Sciences

Department of Computer Science Memorial University of Newfoundland

Octo beT 201 2

Newfoundland

(6)

Abstract

Studies of online search behaviour have found th at searchers oft en face difficulties

formul ating qu eri es and exploring the search results sets. These shortcomings may be

esp ecia lly problematic in digital libraries since library searchers employ a wide variety

of informati on-seeking methods (with varying d egrees of supp ort ), and th e corpus to

be searched is often more complex than simple textual information . To address th ese

problems, an interactive Web-based libra ry search interface is p resent ed, which has

been designed to s upport strategic retrieval behaviour of library searchers. T his

system takes advantage of the rich metadata associated with academic document s

and employs information visua lization techniques to provide searchers with additional

informa tion-seeking tools . These tools are designed to facil itate visual a nd interactive

query refinement, search resu lts exploration, a nd citation navigation. User evaluations

illustrate the potentia l benefits of t he design choices in comparison to a list -based

digital lib rary search interface.

(7)

,- - --- - - - -- - - -

Acknowledgeme nts

Firstly, I wo uld like to sincerely t hank my supervisor, Dr. Orland Roeb er, for the support and guidan ce he offered me through ou t my master 's program . This research would not have b een possible without his insight ful advice an d patient encouragement . I would a lso like to acknowledge t he fin ancial, academic, and technical supp ort of th e Department of Comp uter Science and Memorial University. This research is financia lly supported by my supervisor's NSE RC Discovery Gran t and th e scholarship t hat I have received from the School of Graduate Studies of Memoria l University.

Fina lly, I would like to tha nk my best friend , my husba nd, who gave me u ncon- ditional support and encouragement throughout t his process. My love and tha nks also go to my parents and my brot her for always being su pp ortive of my academic pu rsuits regardless whether abroad or at home.

P art of t his research was presented at the 2012 I nternation al Con feren ce on Knowl-

edge Managemen t a nd Knowled ge Technologies [71].

(8)

Contents

Abstract

11

Acknowledgements

111

List of Tables viii

List of Figures ix

1 Introduction 1

1.1 Mot iva tion . 1

1.2 Approach 4

1.3 Research Questions 6

1. 4 Orga nization of the T hesis

8

2 R elated Work 9

2.1 Information V i s ua lization 9

2.2 C it ation Visualization 1 3

2.3 Search R esults Visu alization 16

2.3.1 List Augmenta t ion 16

(9)

2.3.2 Do cument Spatialization .. ..

2.3.3 Document Cluster Visualization 2.3.4 Do cument Hierarchy Visualization .

2.3.5 Visual Representat ion of Auxiliary Information 2.4 Quer y Representation and Visualiz ation

2.4.1 Faceted Navigation Visua lization 2.4 .2 Candidate Terms Visualization 2.5 Discussion . . . . . . . . . . . .

3 Inte ractive S earch R esults Exploration and Discovery

3.1 Motivation . 3.2 Approach .

3.3

3.2. 1 Data Source

3.2.2 List Augmentation using Bow T ies

3 .2.2. 1 Visual Represen tation of Citation Metadata 3.2.2.2 Document Selection for Detailed Evaluation 3.2.3 Do cument Focus with Detailed Bow Ties .. . .. . .

3.2 .3.1 3.2.3 .2 Example . . .

Deta iled Visua lization of Citation Metadata Interactive Citation Exploration .

3.4 Implementa tion Details .

3.4. 1 P latform and Web Technologies 3.4 .2 System Archi tecture

3.5 Discussion . . . . . . . . . .

19

22 24 27 30 30 32 36

39

39 41 41 43 43 46 47 47

49

51

54

54

54

57

(10)

4 Interactive Query Refinement 4.1 Motivation .

5

4.2 App roach .

4. 2.1 Visu al Representation of Keyword Metadata 4. 2.1.1 Inter active Search Results Ex plor ation 4. 2.1.2 Inter active Quer y Refin ement

4. 3 Examp le . . . 4.4 System Architecture 4.5 Discussion . . . . . .

Use r Study

5.1 5.2 5.3

5.4

Purpose Hypotheses

Methodology 5 .3.1 Tasks.

5.3 .2 Procedure 5.3.3 Ana lysis R esults . . . .

5.4 .1 P ar ticipa nt Demographics 5.4.2 Ret rieval Efficiency ..

5 .4.3 R etrieval Effectiveness

5 .4.3. 1 Selected Documents Q uality .

5.4.3 .2 Refined Query Qua lity 5.4.4 Confidence . . . . . . . . . . . .

62

62 64 64 66 67 68 70 72

75

75

77

82

82

84

86

88

88

90

93

93

94

96

(11)

5.4.5 P er ceived Ease of Use and P erceived Usefulness . . . . . . . 97 5.4.5.1 List-based interface vs. Bow Tie Academic Search . 98 5.4.5.2 Bow Tie Academic Search Components .

5.4.6 Prefere nce . . . . . . . 5.4.7 Open-ended Questions 5.5 Discussion . . . . .. . 6 Conclusions and Future Work

6. 1

6.2

R esear ch Contribut ions . . ..

6 .1.1 Design of Bow Tie Academic Search 6.1. 2 User Study Findings

F\1ture Work . . . . . . . . .

6. 2.1 Furt her Enrichment of t he Proposed Tool s

6.2.2 Exploring Vis ualization Techniques to Represen t Other Meta- d at a E lements . . . .

6.2.3 Further Evalu ation s . A User Study Docume ntation

100 102 102 103

107

108 108 110 111 111

112 113

115

(12)

List of Tables

3.1 Metad ata eleme nts that MAS API provides to describe document fea- t ures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.2 Different forms of bow t ie representations along with th eir corresp ond-

ing 1neanings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

5.1 Graeco-Latin Square rotation of search tasks an d search interfaces. . 85 5.2 The relevan ce scores used to rate do cument surrogates. 87 5 .3 Features of t he participant demogra phics. . . . . . . . . 89 5.4 Statistical analysis (AN OVA) of the responses for time to completion

d ata. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.5 Statistical analysis (ANOVA) of the quality of the selected documents. 94 5.6 Stat i stical ana lysis (Wilcoxon-Man n-Whitney tests) of the responses

for the p erception of t he qua lity of t he refined query in comparison to the initial one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5. 7 Statistical an alysis (Wilcoxon-Mann-Whitney tests) of the responses

fo r the d egree of confidence in selecting a good set of documents. . . . 97 5.8 Statistical an alysis (Wilcoxon-M an n-Whitney tests) of the responses

for the perceived ease of use and perceived usefuln ess. 99

(13)

List of Figures

2. 1 A screen shot fr om Butterfly [81].

2.2 A screen shot fr om P ap erLens [77] . . 2. 3 A screenshot fr om TileBa rs [ 45] .. . 2.4 A screenshot from HotM ap [54 , 58].

2.5 A screenshot from Hier axes [100] . 2.6 A screenshot fro m Citiv iz [68] ..

2.7 A screenshot from Envision [90].

2.8 A screenshot from the enhanced version of Envision [114] . . 2.9 A screenshot fro m Cat- a- Cone [ 44].

2. 10 A screenshot from ResultMaps [24].

2.11 A screenshot from Wor dBars [53, 57].

2. 12 A screenshot from Flamen co [121].

2. 13 A screenshot from the work of J oho et a l. [66].

3. 1 Features of the bow tie representation . . . . . . . . 3.2 A screenshot of th e a ugmented list of search r esults.

3.3 A scr eenshot of th e detailed repr esentation of a document.

3.4 Features of the detailed b ow t ie representation.

14 15 17 18 20 21 23 23 26 27 29 31 35

44 46 47

49

(14)

3.5 3.6

A scr eenshot of the filtering method in effect . 50 An example for using the augme nted list of sear ch resu lts to evaluate a nd compar e documents. . . . . . . . . . . . . . . . . . . . . . . . . . 52 3. 7 An example for using the detailed bow tie representation to evaluate

an individual do cument. . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.8 An example for using the filtering operation to explore within citat ions. 53 3.9 Bow Tie Academic Search fr amework . . . . . . . . . . . . . . . . . . 56

4. 1 A screenshot of the enhanced histogram. . . . . . . . . . . . . . . . . 66 4.2 An example for using the enhanced h istogram integrated with the aug-

mented list. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.3 An example for using the coordination between the list a nd the en-

han ced histogram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.4 An example for refining the initia l query usin g the enha nced histogra m. 70 4.5 Bow Tie Academic Search updated fr amework. . 71

5. 1 A screenshot of the list-based interface. . . . . . . . . . . . . . . . . . 77 5.2 The participants' familiarity with the assigned search topic regardless

of the assigned interface. . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.3 The participa nts' familiar ity with t he assign ed search topic for each

interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.4 The aver age t ime to complete the tasks, along with the average time

to co mplete their s ub-tasks. . . . . . . . . . . . . . . . . . . . . . . . 92 5.5 The p er centage of the relevant documents t o th e total selected docu-

ments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

(15)

5.6 The par ticipants' percep tions of the qu ality of the refined query in comparison to t he initial q uery. . . . . . . . . . . . . . . . . . . . . . 95 5. 7 T he participants' con fidence degree in selecting a good set of documents. 97 5.8 T he frequency of response to the differe nt statements of the perceived

ease of use and perceived usefulness for the two interfaces . . . . . . . 99 5.9 T he frequency of response to the different statements of the perceived

ease of u se a nd perceived usefulness for individ ual in terface features. . 101

(16)

Chapter 1

Introduction

1.1 Motivation

The ongoing d evelopment of automated search technology, along with th e rapid growth of available information , has made search a fundamental part of peoples' lives [5 1]. Search systems are now capa ble of providing direct access to large infor- mation spaces, and are continuing to evolve and grow at a rapid pace. As a result, searching for information is now an integral task undertaken by people d aily a nd is regarded as the second most frequently used online app lication [ 43]. As such , research on information retrieval systems has been an active area of study for years.

In the last few decades, improving retrieval a nd ranking algorithms has b een a

primary focus of information retrieval research while less attention h as been paid to

the huma n aspect of search systems [ 51]. Nevertheless, research on search b eh av-

ior [104, 64, 101, 63] h as shown that the current interfaces are not w ll matched wi th

retrieval b eh avior , providing limi ted support for searchers during t heir search pro-

(17)

cess [ 49]. This lack of support can be esp ecially problematic in digital libra ries, wh ere the existence of diverse metad at a elements (e.g., t itle, authors, venues, keywords, and ab stracts) has led to a wide variety of inform ation-seeking meth od s in library searches.

Meanwhile, digita l libraries are becoming the main repository of mankind's k nowl- ed ge, a nd the prima ry means of accessing t hem are search systems [12]. It h as been rep orted t hat almost 70% of faculty and students use Web-b ased search engines d aily to access North American research libraries, while about 90% of college students st art their inform a tion seeking process wit h search engines [ 26]. Thus, in order to take ad- van tage of t he wealt h of information existin g in d igital li braries, fur ther research to improve digital libr ary search interfaces is required .

In s tudies on search behaviour , it has b ecome evident that queries crafted by searchers are often p oorly formulated a nd do not reflect their informatio n needs prop erly and accura tely. This m ay b e due to searchers ' tendency to formulate sh ort queries [ 67, 83 , 104], their incomplete knowledge abou t t heir information needs [38], or t heir inab ili ty to express their info rma tion needs due to a lack of termin ology [ 38].

Searchers ' inabilities to sp ecify accurate queries has long since been documented as on e of the main issues in tradi tiona l librari es, with referen ce lib rarians reporting that few people know how to ask reference questions [69] . In traditional libraries, the active en gagem ent of reference librarians in the search process may enable search ers to subsequ ent ly express th eir information n eeds prop erly. However, in t he context of digital libraries, litt le assistan ce is provided for searchers to craft a nd reformulate their qu eries.

Similar problems exist with the interfaces for presenting search resul ts, where the

simple list-based forma t is commonplace. Such a representation requires search ers to

(18)

extensively utilize their cognitive abilities to evaluate and compare result items by reading documen t surrogates (i. e., titles, snipp ets or ab stracts, a nd URLs) one-by- one. In addition, since the list-based representation provides only a few rudimentary interaction mecha nisms , there is litt le support for navigation a nd exploration with in retrieved documents . This lack of support is even more problematic for large and complex information struct ures such as content-rich metad ata-enhanced digital li- braries, where more a dvanced exploration features are needed to support different search activities.

Although a n ef fective ranking method can help searchers for targeted quen es, there is still a cognitive burden for exploratory searches. Exploratory search tasks a re often m otivated by a complex information need , a p oor understand ing of termi- nology and the information space [ 117], or a desire to learn [ 84]. Such conditions are common s tarting points for library searchers, resulting in their d esire to initiate a search process.

Given th at curre nt search interfaces lack the abili ty to s upport exploratory search tasks, the design a nd study of useful interfaces which a id searchers in their search activit ies is a vita l concern in the digital library contex t. Hence, wit h th e aim of p ro- moting searchers' information-seeking behaviour in a n academic digital library, this t hesis outlines a new a pproach called Bow T ie Academic Search. T h e main purp ose of this approach is to support t he key search activities of search results exploration and query reformulation . Bow T ie Academic Search moves beyond t he traditional informa tion retrieva l interfaces, an d instead provides a "metada ta-enh anced visual in- terface" that ai ms to incorporate the human element int o library search systems [ 99].

The ma in goal is to a llow searchers to take an active role within t he search process ,

(19)

which is particula rly b eneficia l for exploratory search tasks [ 49].

1.2 Approach

In this research , a structured approach has been employed to alleviate t he problems associated with t he current list-based search interfaces. As the first step , t he ex- isting literat ure on online search b ehaviour as well as library search strategies has b een reviewed and s urveyed to identify t he difficulties searchers may encounter dur- ing their library search process. Next , the previous approaches t hat provid e visual and interactive su pport for information retrieval tasks has been surveyed. T hen , the knowledge gained fr om prior literat ure has b een used to design a library search interface intended to promote search ers' information-seeking behav iour by utilizing effective visu alization and interaction techniques.

Finally, an empirical evaluation was conducted with users in order to explore the effectiveness of the specific design choices in comparison to the current search inter- faces . Moreover , thi s user study was designed to validate th e hypoth eses made re- garding t he use of visualization a nd interacti on in search interfaces to support library retrieval tasks. Overall, a d esign-oriented research methodology [ 36] was employed in which t he main contribution is th e knowled ge gained form studying and evaluating a designed artifact. This methodology is in contrast to research-oriented process in which the product developme nt is the foc us of study.

The resu lt ing interface, called Bow T ie Academic Search, has been designed and

developed to facilitate exploratory searches by providing interactive query refi nement

and interactive search results exploration support. Considering the significant ro le of

(20)

metadata elements for retrieval purposes [ 99] a nd the great poten tial of information visuali zation to improve user experience [110], Bow Tie Academic Search makes use of the rich metadata associated with academic documents and employs information visu alization techniques to support searchers during their search p rocess.

This system provides additional interface extensions to the conventional list-based format . Visual r epresentations of citation metadat a are added to th e search results list, allowing searchers to visua lly scan, evalua te , a nd compare search result items in order to find the potential documents based on their citation ch aracteristics. In addition , when a p articular document is selected, searchers are provided with an interactive exploration tool to navigate within the citations of the document . The in- teractive query refinement is supported by providing a visual and interactive overview of the keywords associated with the top search results. These representations pro- vide concise and compact visual encodings of the metadata, and were d esigned t o be lightweight representations in order to prevent searchers from getting overwhelmed with visua l complexity.

Incorpora ting su ch metad ata visualizations into the library search interface sup-

ports searchers in their retrieval t asks by enhancing their abilit ies to perceive, inter-

pret , and understand features and relationships among the search results and their

associated m etadata. In addition, by providing interaction methods, a n effective

combina tion of searcher control and system retri eval power can be achieved. This

is an example of next-gener ation information retrieval systems supporting t he search

process beyond the simple query box and search results list [49, 51].

(21)

1.3 Research Questions

The ma in obj ective of this research is to address the fundamental issues associated with the library search interfaces that follow the t raditional pa radigm of query box and search res ults list through the use of visualization and interaction techniqu es.

Since this approach moves beyond the t raditiona l search p aradigm, it leads to some funda menta l research questions related to the value of the design a nd prototyp e im- plementation of Bow Tie Academic Search :

How can metadata v isuali zations and metadata-based interactions be designed to support interactive search results exp loration and interactive query refinement in digital library search interfaces?

The value of information visualization to improve different aspects of digital li- braries is we ll-recognized [ 123 , 11 , 8], wit h information access and retrieval being one of the potential directions. Even though a variety of interactive and visual tools have been designed and proposed to support information retrieval activiti es, not all of th em have shown to be effective [43]. These differences can be attributed to t he specific design choices used in these approaches, indicating the challenges and complexities assoc iated with t he d esign of useful and effective visual represen tations for search interfaces. Therefore, it is necessary to investigate the best ways to design interactive a nd visual tools that can effectively support search activities of search results explo- ration a nd query refin ement. This research question will be answered via the design of Bow Tie Academic Search (Chapters 3 and 4).

What is the impact of Bow T ie Academic Search on efficiency, effec-

tiveness, and subj ective impressions of the search process?

(22)

Bow Tie Academic Search includes some interactive and visual tools th at allow searchers to take an active r ole in their info r mation seeking process. As such, it can b e classified as an interactive inf ormation retrieval system. In human-compu ter inter action st udies, a key research purpose is to investigate the usability of interactive systems [ 61], which is defined as "the capability to b e used by humans easily and effectively" [98]. Therefore, one of the r esearch questions of this thesis is to explore the impact of Bow Tie Academic Search on t he usability measures of efficiency and effect iveness. In this research, "re trieval efficien cy" refers to t he time it takes for searchers to complete their retrieval tasks, and "retrieval effectiveness" refers to t he quality of th e sear ch outcome.

Sometimes quant itative measures may not reflect t he user experience, p ar ticu larly when the search tasks are vague or ambiguous [50]. Since the approach fo r r epresen- tation and exploration of sear ch results is pa rticularly targeted at exploratory tasks , sub jective measures can also be helpful in validating the p otential b enefits th at the system may provide. This resear ch question will be a nswer ed by th e user st udy (Chapter 5) .

What are the searchers' p e rceptions of the usefulness and ease o f use of each s p e cific component of Bow T ie Acade mic Search?

Beside th e possib le impacts of the entire system on t he retrieval pr ocess, it is criti-

cal to explore t he potential benefits and drawbacks of each of the sp ecific components

of the proposed interface. The Tech nology Acceptance Model (TAM) [ 28] has been

tested in many empirical studies and is consider ed a robust and r eliable instru ment to

measure user's acceptance of an informat ion technology [47]. As such , TAM measures

ar e used to assess the search ers' perception of ease of use a nd usefulness of each of the

(23)

components of Bow Tie Acad emic Search. This research question will be answered via t he user eva luation as well (Chapter 5).

1.4 Organization of the Thesis

The rema inder of this thesis is orga nized as follows: An overview of t he rela ted work

is provided in the n ext Chap ter. Chapter 3 explains the app roach used to supp ort

search results visua lization and exploration, a nd Chapter 4 outlines th e techniques

used to supp ort visua l and interactive query refinement . The d esign and results of t he

user evalua tion a re provided in Ch apter 5. T he thesis concludes in Ch apter 6 wi th

a summary of t he research contribut ions, along with a n overview of fut ure research

activities.

(24)

Chapter 2

Related Work

2.1 Information Vi s ualizat ion

Informat ion visuali zation is defined as the use of computer s upp orted , interactive, vis ual represen tations of abstract data to reinforce huma n cogni tion [ 17] . The main purpose of information visualization is to tr ansfo rm abstract informati on into a vi- sual representation tha t takes advantage of rapid pro cessing capabilit ies of the huma n visu al p erce ption. Considering the abilit ies of human visual system in ra pid interpre- tation of s pecific visual features, information visualization can b e seen as an effective and a useful tool. As Ware [116] stated: "Combining a computer-based information system with flexible human cognitive capa bilities, such as pattern finding , and using the vis ualization as an interface between the two is far more powerful tha n an unaided hum an cogni tive process .", information visua lization can be seen as a visua l tool that aims at supporting th e cogni t ive system of the user.

T he hum an v isua l processing system operates so fast t hat it is for all intents

(25)

a nd purposes parallel processing of information. Information visualization systems use som e basic visua l features to represent mult iple dimen sions or attri butes of the data. E xamples of such features include colour, size, sh ap e, and spatial proximity.

These visu al attributes can b e used to devise a n infor mation intensive visualizat ion ; however, it should b e noted t hat not every possibl e p ermutation of these features can b e easily and separ ately decoded by t he hu man visua l processing system [115] . For instan ce, since colour a nd spatia l location can b e easily sep ar a ted by t he visual system , they can be u t ilized sim ultaneously to represent different att ribu tes of t he d ata, bu t t h e use of colour and shap e for t h e same purp ose is not as easily decod ed [115].

There ar e a numb er of well-known theories a nd design prin ciples that can b e used to design effective visua l representations of inf orm ation. Based on human perceptual capabilities, Clevela nd and McGill [25] have pr ovided a r anking of t he most effective visua l features to represent quant itative information. Mackinlay [ 80] has ext ended t his r anking to include more data t ypes (i.e ., ordinal a nd nomina l). For instance, according to t his r anking , posit ion a nd len gth are the m ost effective v isu al feat ures t o encode qu antitative da ta, w hile position a nd colour hu e are t he m ost effective ones t o represent nominal informa tion. When designing visu al represen tations of info rm ation , t his r anking can be used to assess the relative effectiveness a nd accuracy of alternative design choices.

In information visualizat ion , the use of colour is extrem ely useful as it supports

vis ual d istinction of objects based on t heir colour differences [115]. Drawn from

physiological features of the hum an vision, the opponent process th eory of colour

provides a solid foundat ion for effective use of colour to r epresent data attributes [115].

(26)

According to this theory, t here are six elementa ry colours, which are perceptu ally oppone nt pa irs along t hree axes. T hese colour pairs are black-white, red-green , a nd yellow-blue [115] . As such , these six distinct colours are the most effect ive choices when encoding nominal information with colour hu e. In addition to this theory, suggestions by Tufte [ 109] can guide the colour selection pro cess as he recommends the use of soft colours instead of using s trong a nd bright ones.

T he pre-att entive processing principle introduces a specific set of visual features that can b e identified even after a brief exp os ure by human visual percept ion [ 115] . These features include visua l form (e .g., line le ngth, line width , a nd size), colour (hue and intensity ), mot ion (flashing a nd d irection of motion) [ 11 5], and sp atia l posi- tion (e.g., 2D position and stereoscopic depth) [ 115] . These visua l att ribu tes can be used to encode the most important aspects of information, facilitating their easy and instantaneous ident ification a nd distinction from th e surrounding area.

T he other funda mental aspect of computer-mediated visualizations, in a ddition to envisioning information , is interact ion. Interaction is highly intertwined with visual representation as an interaction with a system m ay t rigger a cha nge in representation [122]. Interaction techni ques can enhance users ' cognitive abilities by allowing th em to manipulate and control in formation b eing visu ally represented. Visua lizations without interaction would become a static image which can only address a very limited number of t asks [31], and its usefulness may be a dversely affected as t he underlying collection becomes large and dense [122].

Yi et al. [122] categori zed t he interaction methods in information visu alization into

seven categories based on users' intention of performing a method . The first method

is called select by which the user is able to mark a set of d ata items to keep track of

(27)

them in different stages of interaction. Explore method a llows users to navigate and examine different subsets of large information spaces. R econfigure provides users with the ability of viewing information from different persp ectives by re-a rranging the d ata points. En code technique a llows users to change the way th at information is encoded such as transforming a pie chart to a histogram. Ab stmct/Elabomte en ables users to sp ecif y the level of de tail to be represen ted in the visuali zation . Filter method a llows users to na rrow the represented dataset down to a su bset based on a criterion.

Although filtering is categorized as a d iffer ent technique, it can be seen as an ex- ploration method since iterative filtering allows the investigation and exploration of d ifferent subsets of a d ataset. Fina lly, connect shows relationships among data items or shows the hidden items rela ted to a sp ecified one. An example of this category is a brushing and linking technique using which allows identification of a selected dat a item in different views of a dataset.

In order to d esign any v isual interface, special care should be t aken to select and

use both effective v isual features and useful interaction techniques as two core com-

ponents of information visuali zation. Even with t he guidan ce of the aforementioned

theories and principles, designing useful interactive and visual tools is a challenging

task th at requires careful considera tion and investigation of the potential efi'ects of

choosing a design alternative on the efi'ectiven ess of the other design ch oices. Due

to the complex interac tion among design choices , in addition to design development,

conducting t horough eva luations through user studies is now an exp ectation in the

informa tion visualization field.

(28)

2.2 Citation Visualization

In li brary collections, acad emic documents are oft en linked together as scientific liter- ature refer to or cite one another. The information associated with s uch relationsh ips has always b een considered valuable as it can be used to assess t he imp orta nce of a ut hors, documents, and topics; and to interpret how these elements relate to each other over the time. Although displaying citation information is a challenging task, a number of approaches have b een proposed to visually represent d ocuments' citations, t heir relations, and their characteristics.

The most common way of representing citation information is the use of node- link diagra ms, which have been used for the purpose of citation ana lysis fo r years [39] . Despite efforts to create more innovative views of node-link diagrams to represent citation information [102, 7], there are inherent issues associated with such views for large and dense d atasets, and it has been shown that users are not p articula rly comfortable with t hem [ 113]. In addition, citation graphs fail to provide other critical metadata, such as documents' titles and authors within the graph representation , and they normally require user actions to provide such information on demand. As su ch , users' initial evaluations and comparisons can only be based on a small set of metadata, which may adversely affect information seeking efficiency and effectiveness.

T herefore, some researchers have moved beyond node-link diagrams , and employed

ot her visualization methods to represent citation m et adata. For example , Butterfly

[81], which is a 3D search interface, uses a butterfly layout, where a document meta-

data is shown at its head in a text ual format (see Figure 2. 1). One wing of the

butterfly includes a list of backward citations (i .e., documents referenced by the orig-

(29)

"'":r·. ,.,, .. ,. r•~

~lU• , , ... , ' ' o ·~: .~ : •

~,., . . ,·.·., '"' I••• •'I ~.ol

•·+('1" .. ·• ,...,., , • ••• ,

. -~

..

-

..

---

Figure 2.1: A scr eenshot from Butterfly [81].

ina l one), while the other wing lists document's forwar d citation s (i. e ., docu ments that cited t he original one) . Colour is used to encode t he source da tabase as well as the number of forward citations for each do cument , and to show if a document was a lready visited. In add ition to the gener al problems users may face in understand- ing 3D representations [37], Butterfly has scalability i ssues as each wing can only represent 22 citations.

In P ap erLens vis ua lization [77] , r ather tha n explicitly show ing t he rela tions among docume nts, simple views of different pieces of information , su ch as "year by year top 10 cit ed documents" and "populari ty of topic by year" , are provided (see Figure 2 .2).

These v iews ar e tightly coup led with brus hing and linking inter action me thods , p ro- viding search er s wit h a powerful tool to recognize influenti al documen ts and to un- derstand trends a nd topics in a field. However , searchers a re not a ble to understand the cita tion char acteristics of individual documents tha t a re not in the top 10 cited documents, or to navigate within the ci tat ions of t he documents.

Another attempt is t he C iteWiz [35] interface, which consists of three d ifferent

(30)

~·-·

-

,, ' .

. ... "' .... _

...

~

... _ ... .

'"' •. c_,... •···

··~ ... . . .

.

-···

• .., hy • ••• '··~ " ' .... ... ••••••

. ..

0 0

-=--=---=-="'" '"-"''-'-"'-'"'-

Figure 2.2: A screenshot from Pa perLens [ 77].

views including timeline displays that show th e gener al chronology a nd importance of documents and authors in a citation network, a nd a node-link di agr am of keyword and a uthorship meta da ta to let searchers gain insight into this metadata. In addition , the growing polygons technique [ 34] is adopted a nd enhan ced to represent t he citation informa tion of a p articular subset of documents. These vis ualizations a re augmen ted with some int er action techniques to support navigation a mong the citation network and d etails-o n-dema nd of the entire citation cha in for a document of interest. Eve n though prelimina ry user s tudies of CiteWiz provid ed positive results, t he cita tion vis ualization component becomes difficult to understan d as t he selected document set grows.

Even t hough these approaches replace common nod e-link d iagr ams with novel

vis ualization m ethods to represent citation metadata, th er e are still complexity and

scala bility issues th at need to be addressed. Considering the r apid growth of scientific

litera ture, and th e value of citation m etad ata for evaluation , comparison , and naviga-

tion purposes, compact and scalable visuali zat ions are required to suppor t document

(31)

discovery and navigation in digital library collections.

2.3 Search Results Visualization

In recent years, many visual interfaces have been designed and developed to provide a better representation of search results and to support explorat ion and navigation within t he search results set. T hese interfaces employ a number of different approaches when visualizing the search results, including augmentation of the list with visual rep- resentations th at encode query-document relationships, sp atialization of documents in a 2D or 3D interface, representation of document collections (flat or hierarchical), a nd visualization of auxiliary information derived from the search results set . Each of these approaches will be explained in more detail in the sections that follow.

2.3.1 List Augmentation

In some studies on search results visualization, researchers have proposed adding interface extensions to t he co nventiona l list-based interface in order to address its shortcomings to some degree. In most of these approaches, t he search results list is augmented by adding small visual representations alongside each document, with each r epresentation visuali zing the relation of query terms wit h t ha t document .

One of the early attempts at list augmentation is t he TileBars interface [ 45] which

simultan eously and compactly shows relative document length , query term frequency,

and explicit query term distribution in a full text informa tion access environment

(see Figure 2.3) . In the TileBars visualization, a rectangular icon is shown beside

each search result item. The rectangle length represents the relative length of the

(32)

I'!JTIIIB.:o,. 1\JI'I

- - - --r,,;-&n· hnn n;.,-;,_.,;;;,um.;; lnfo;,;;;;;ion~~--- ... o.. .. ,..,. RJ-•

~ t:Eiiiiiil~Uii.il K!!iJ 0 z I <t 6 1'1 10

M., H•b Mm Ot$~bl.llWI'l (";.J

lll!nll~tl ~3 f:' 4 6 0 10 ft tO ?fJ ~~ 40 W l"-">!<flt7~o 1:!1

.. o 10 o 10 1:!0 Jo 40 GO

TwrnS.t3 ~& 2 4 o o •o ~ 10%0 30 40 ~

- · ~"~idlohor....n-mm -

- - -

[=:J "1-'arl\(10<1 UOitS ItS Z-U trOUQli1Q SOrtware 10 Ut:;(.; S P.V::> ft-'llfl\QOn lmat;ontjll"

c:::=:=:!

·computer oraohlc~ In meo!c!ne: frcm pictures tc ~naly~I~.~O (lndustl)' Profl

r-=1 I L".::

"l"llVIEW ;: 0 ,;till )Wit OVII• ttlll! hort:on (Nl\tlon .. l ln~t.-....rnl!'nt'io l.._bVIEW Z 0

~::j:::==~, Hoso•rat nrmge!t t!ttanCis ot tllna. (111C:tuoe3 retateo wuctes on ijrowtn ct a me

"VA Ol•tomot!o" menn'") fo:Jter ndml::i:J!on:J (US Deportment of Veteo·nn':J /\Knl•

CJ

[!::!] -,...,m:P' tests prototype IH•ttlefleld lntonunuon syste•n &0 ·

I

.,

Figure 2.3: A screenshot from Tile Bars [45].

document. It is further subdivided into columns rep resenting document segments (using paragraphs , section breaks, or units chosen by TextTilling algorithm [42]) ; each row is also considered as a r epresentat ive of each query term. Then t he qu ery terms frequency within each segment is r epresented us ing grey scale enco ding.

In the work by Heiman a nd Jhaveri [46], a sm all document shaped icon is presen ted on the left side of the res ult item. This icon co ntains four equal sized rows r epresenting document sections (independent of document size) . T he number of occurrences of the all query terms within a 20-word window for each section is d epicted using the same me thod as in T ileBa rs [45] . Both of these a ppro aches r equir e access to t he full textual contents of t he document , which is inefficient when the underly ing search engine is inaccessible.

Another work intended to s uppor t interactive exploration of search results is

HotMap [54, 58] , which visuali zes the frequen cy of each query term in each docu-

me nt surrogate by a colour-coded squ are located alongside the corresponding result

item (see Figure 2.4). In a ddition , each quer y term is sh own ver tically at t he top

(33)

, - - - -- - - -- - - -- - - -- -- - - -- - - ·

r l ( ) r l

HUi,

I • • l<t-~1<:o.oLtli•••~•l'"'f'''

! • •• • ·',& ~~ \ . . ~~''"'''"''"'<!Ill ·.

t "''-·.-~1-"\J>•I·•Ul ~~ ·.~.·,;

aa·iliL"'ll:.a...'i..!.:•u '.::J.:...U..!

'•

..

'./I , '< ••• oJ

~~t>.'b"_!.! l i L

l.' .. .. • • ···• .... ••.• .. ••. ~." ... ' ..

11 •••-~''l•·•t~" t'•l<.. a !-J,, '''''''a 1 a • .J. ... :.J...:....>..u.L~.t'L.<-;;.L.tU'~

!'. a a a ....o.- _ ; o.l.-..lo.;.ll.; )lo..Jl·u.o; l _

lt. • ' . . . . ' , •••••

1; • a. '" .>< "'' "" " ' ~·-,. > I •' f' ' •

• • . . : .... ltt",;.t.o..•" ~~ ~-•. . :..-~

• • • ' •!>'T" . •. ·' · ~·u. ''1!'

• • ~.J.L~ ,.

Figure 2.4: A screenshot from HotMap [54 , 58] .

of the corresponding column of squares enabling the sear cher to easily under stand which squar e is rela ted to which quer y term. Furthermore, sear chers are able to p er- form a nested sorting on the search results by clicking on the qu ery term labels. It a lso offers a zoomed-out representation of search r esults allowing sear chers to see and compare how their query terms are being used across the set of documents. In sim- ilar a pproaches, the r elations between qu ery terms and doc uments a re shown using a colour-coded pie cha rt [2] in which segment size indicates the rela tive frequency of a query term, and a b ar ch art [112] in which overall and single keyword relevance is shown using the length of bars.

WaveLen s [92] a lso intends to let searchers take an active role during their search process by allowing t hem to dy na mically zoom into the do cume nt s urrogate of inter est throu gh fo cus+contex repr esent ation . Focus+context visualization a llows search ers to see the result item of interest in full detail , while at t he same t ime t he surrou nd ing items are shown in less d etail.

Anoth er variation on t he id ea of enriching results lists by displaying query- document

(34)

relation ships is to show miniaturized and small versions of the visual app earance of th e d ocument , known as t humbnails. Thumbnails typically include highlighted colour coded query terms [ 91]. While this appro ach prov ides little supp ort for t he manipu- lation and exploration of search results, it has shown to be beneficial when searchers want tore-find a document from a p revious session [ 119].

These approaches take advantage of t he list simplicity, consisten cy, and scalability, and add extra visu al representations in order to better support search activities.

For search interfaces, which are b eing used by millions of people daily, a drastic cha nge may cause problems of adopt ion [43]. One of the n otable ad vantages of list augment ation met hods is that they avoid t his iss ue by keeping the list as t he main p art of the search interface. However, in t hese approaches, designing information-bearing vis ua l representations is a cha llenging task due to t he sp ace limitations .

2.3.2 Document Spatialization

Many researchers have proposed spatialization of documents or document surrogates

to a 2D or 3D visual overview, wherein spatia l proximity indicates documents' sim-

ilarity [43] . The main differences between these approaches is how they specif y and

calculate document similarity and how they organize and represent the documents in

2D a nd 3D spaces. In most of these interfaces, documents are represented as small

glyphs ma pped to a specific point in the spatially-orien ted interface based on two

or three of t heir attrib utes. In addition , more attributes of the documents can be

depicted through visu al feat ures of the glyph itself su ch as its colour , shape, and / or

size.

(35)

Approaches using spatia lization m ainly use a 2D scatterplot where documents are plotted based on two of their attributes correspondin g to the x a nd y axis. E arly attempts include systems such as xFind [ 3], Envision [90], and FilmFinder [ 1]. As t hese interfaces may overwhelm some searchers, Shneiderman et al. proposed a sim- plified display called Hieraxes [100] (see Figure 2.5) . This display uses categorical a nd hiera rchical axes in which documents are represented in a grid using eith er a set of colour-coded dots or a bar cha rt. Al so, the mapping of pa rticular attrib utes to visual representa tions such as the x-axis , y-axis, icon size, and icon sh ap e, i s cust omizable using drop-down menus.

In Citiviz [68], an interactive anima ted scatterplot is used wit h a hyperbolic tree m a s ingle interface, wherein each d ocumen t is represented in the scatterplot by a tower of colour-coded cylinders (see Figure 2.6). Each level of a tower represents a category to which a document belongs; t herefore, t he taller a tower is, the more categories th e document b elongs to. Citiviz uses a city skyline metaphor in wh ich documents metadata can b e shown simulta neously; however, t here a re problems with

D fqu!ll.cHIIIWal$

djo.~"'' _,.

D i ''-

biEoe<gy t'41j

bjF01ces "U:/

D I G'"""

dj"-'""' OjMe!eorolog)' ·m

biPW-.ts ~":1

01s~o I

ir~b I -

·mL'l

lijl

1H

I

604Aescutes

- I

l!m

- I

..

'foil

,,.

_!o!x!

~~

lill 1"1

-~ iJ

1.•r 11> !i'.\

":'Yh

i:l!f lb

'lt:l!

.

lliU ]t

ijii!h I ;u:

I!,H I !~

J~•ar'Odald iJIConcepts"' jE.,Ih SC~en>IHiseoo~ and [lie SC~enCtjP.,.~ o¥~jf'tr.o~ICdScijSCI«<Ce ¥l!IS~ M Coiorby o,~a:·

~lsra~held ::::J Audoo •r~ Modi.H •le>~~ •vd&o WebSile

w .. flir'O:~Wndow

F igure 2.5: A screenshot from Hieraxes [ 100].

(36)

!- :

·~...

... ·.·

T... ,I , , , .

I J

;co! ·-""':.:

.. : :. · ...

~ '

...

- . ...

lllil

-

::::

J y.J

... ,

~

....

,_ ... ' l>' • • • • . , . , ... ~ ·~· ... , ... .

...

··--.

,. , .. ,,, .

,,,.. ...

Figure 2.6: A screenshot from Citiviz [ 6 ].

occlusion and navigation within the pace.

In two other re lated approaches, documents are organ ized and mapped around th e cent er of a circle based on their similari ty to the query whil thei r proximity to the center indicates their relevance to the query. In DART [ 23], th circular space conta ins s vera! concentri c circles so that the searcher can easily evaluat and com pare the distance of a document from t he query in the center. This space is divided into pie shaped sections , each represen ting a predefin ed cluster. As ·uch , doc uments are mapped to the display b ased on both their s imila rity to the query, and the cluster t o which t hey b elong to.

R ankSpi ral [105] uses th e average rank issued by multiple search engi nes as an indication of relevance. Then do cuments a re mapped a nd organiz d in a spira l in which t he mo t relevant do cuments are closer to the spiral center. Also, other visua l features of the glyph representing each document a re utilized to depict which search engine the document is coming from.

Document spatiali zation approaches h ave b een proposed frequ ntly, a nd they a re

(37)

intended t o help searchers gain insight in to th e relations hips b etween d ocum ents in the search results set , grasp potentially unexpected pat terns in document collections, and find important documents t ha t oth erwise might be missed [43]. However, t hese kinds of graphical overviews of large doc ument spaces have to be proven usefu l and under- standable by searchers . Evaluations cond ucted so far mostly ind icate that search ers face difficul ties understanding such spatia lized representations [ 43] .

2.3.3 Document Cluster Visualization

Some attempts have been mad e to organi ze search results into meaningful groups, and t hen visua lly represent these collections in order to help searchers gain a n overv iew of the search results and easily d etermine their next step in the search process. One way to classify a document space is t hrou gh clu tering. Docu ment clusteri ng refers to the grou ping of docum nts based on som measures of s imila rity. Some of t he clustering algorithms create hi era rchical clusters. This section deals wit h non-hierarchical clus- tering while the fo llowing section deals with hi erarchical clusterin g as well as other approaches for generating hierarchies from search results .

In most of the systems described in th e previous sectio n, in ad dition to do cu-

ment spatia lization , the thematic groups or topics are derived from the text based on

some measur ments a nd are furthe r d isplayed graphically by adding visual cues. For

example , in Envision [90], similarly sized elliptical icons repre en t a set of relevant

document s while the number of documents is shown ins ide t he ellipse, and labels

b elow t hese icons indicate the rank of the two most relevan t d ocumen ts in t he cluster

(see Figure 2.7) . In the enhanced version of Envision for digital libraries [114], the

(38)

, - - - --- - --- - - -- - -- - - -- - - -- - - -- - - -

size of t he cluster ellipse i s related to t he number of documents it conta ins (see Fig- ure 2.8). In ad dit ion, document icons are shown in the cluster icon, allowing search ers to select a n individual document fr om wit hin t he cluster. Documents are placed in the cluster icon by locating them on concentric ellipses with diameter d ifferences of equal magnit ude from outside in.

In xFind [3], the Visisla nd interface was d evelop ed in addi tion to t h e scatterplot and a list-based representat ion to dis play t hematic clustering of search results . First,

1":"';.

l,..,a

r• "'""''''''"

'

... , -R,...,._oll ....

-C-

hiA<In-• • ... ... s~-=

- : ·

,_,

--

0

Figure 2.7 : A screenshot from Envis ion [90].

''~"=""'=---

~; ':/'9

g

~;?,~? ...

0 c " 0

;:(.Q ~~ n 00

c

F igure 2.8: A screenshot from the enhanced version of Envision [ 114] .

(39)

cluster centroids a re randomly ma pped to t he display, a nd th en documents are located in a ring around the corresponding cluster while documents tha t are more similar are placed closer to each other. This procedure fina lly leads to a represent ation in which related topics a re displayed as islands .

Kohonen 's self-organizing feature map algorithm [ 72] is used to organ ize docu- ment collections in number of st udies [2 1, 78]. For examp le, in [ 78], self-organizing clusters a re shown as adj acent polygons in a 2D map in which th eir size and shape indicate how frequently documents are assigned to t he corresponding cluster. The adjacen cy of regions refl ects sema nt ic relations of clus ters within the collection. More- over, when searchers hover the mouse cursor above a ny polygon, a pop-up window will be displayed showing t he titles of documen ts closely linked to t he correspon ding cluste r.

Clustering can clarify new and interesting patterns and trends hidden in the docu- ment space, and it can be done automat ically on a ny text collection. However, choos- ing und ers tanda ble and d escriptive labels for clus ters is a ch allenging task, which can b e problemat ic in search results explora tion [43]. Moreover, clustering may make it difficult for searchers to compare documents within different clusters, and searchers may neglect very s mall clusters even though they mi ght contain the most relevant documents .

2 .3.4 D ocu ment H ierarchy V isu a lizat ion

Anothe r method to classify document collections is the use of category systems. In

category systems, d ocuments are assigned to some organized and meaningfu l la bels

(40)

that represent the domain concepts. Category system structures are usually either hiera rchical, or faceted (which is discussed fur ther in Section 2.4. 1). As mentioned, some clustering algorithms also build a hierarchy of clusters . Hierarchical clustering is often considered as the b etter quali ty clustering ap proach , but it is computationally exp ensive as the size of t he collect ion increases [ 106]. Although category systems are only applicable in well-structured co llections and t heir a utomated methods a re abou t 75% correct on average [ 97], t heir superiority to clustering methods has been show n in terms of usabili ty [43].

Traditional met hods of present ing hierarchical information , na mely listing, out- lines, and stat ic tree diagrams are not feasible because the extraction of informa tion from large hierarchies is quite difficult since the navigation is a great burden , and contents of information are often hidden within nod es [ 65]. In addition, visua lization of large hiera rchies in a limited-size screen is a serious challenge.

One of t h e funda mental ways of hierarchy representation is t he tab le-of-contents view used in books and other inform ation systems [43]. Such a t ree-based h ierarchy ou tline has been used with hyp erlinked Web pages to support both search results organization and navigation [22, 20]. More sophisticated variations of tree diagrams have b een a lso proposed and used in search results visualization. For example, Cat- a-Cone [44] makes use of available subjec t headings in library systems and represents categories associa ted with highly ranked docu ments in the search res ults as well as their ancestors and siblings using a 3D ConeT ree [ 94] (see Figure 2.9) . In Citiviz [68], a hyp erboli c tree is used to show th e hierarch ical structure of documents using the ACM Computing Classificat ion System.

One of the well-known t echniques for representing h ierarchical information is the

(41)

F igure 2.9: A screenshot from Cat-a-Cone [ 44].

use of a TreeMap [ 65]. The original algorith m for TreeMap creation is based on divid ing a rectangle with vertical p artitions to the numb er of bra nches from the root, and then performing the sa me process for emerging rec tan gles, but with horizontal pa rtitions. This recursive a pproach continues until the leaves of th e tree are reached.

Further , the area of each leaf is sp ecified based on t he amoun t of information stored there. In search results visualization li terature TreeMa p is used and evalu ated in a number of studies [41, 13, 24].

For example, in ResultMaps [24], a TreeMap h as been used t o en cod e a digita l

libra ry's full content, rather than encoding relevant items to a query, according to the

available hierarchical taxonomy classification (see Figure 2.10). Presenting the entire

rep ository hierarchy let searchers gain knowledge about t he whole library content as

a sid e effect , which can b e benefici al for their future information seeking tasks. The

TreeMap view a nd search results list are linked via brush ing techniques, a llowing the

searcher to highlight documents in the search results list to see where they are in

the digital libra ry hierarchy, or to highli ght regions of t he digital library hierarchy to

Références

Documents relatifs

Response frequencies predicted by a serial self-terminating search (SSTS) for display sizes 1, 2 and 4 (rows), assuming that each scan results in a unimodal distribution and that

This kind of visualization supports the user in understanding and validating the found mapping relations between input and output concepts of the metadata formats.. An example of

Thus the core contributions of this pa- per are as follows: (1) we present SearchBrowser, a context- aware mobile search interface that enables situated discovery of information, (2)

We present in this paper a method based on softmax regression and tree search to predict the number of control positions based on the latest traffic count previsions for a given

The proposed method allowed to identify computing sites having non-trivial job execution process, and the visual cluster analysis showed parameters affecting or

In combination with the parallel architecture, however, a significant improvement of the results regarding the basic model is achieved, as well, but it gets clear that in all cases

Keywords: Learning analytics, data visualization, Hasse diagram, Competence- based Knowledge Space Theory.. 1

In Table 1, one can see that when using the tuned version and evaluating with F 1 , the best perfor- mance for each baseline algorithm is obtained for the same number of output