• Aucun résultat trouvé

Text and Documents 1

N/A
N/A
Protected

Academic year: 2022

Partager "Text and Documents 1"

Copied!
26
0
0

Texte intégral

(1)

Text and Documents 1

CS 7450 - Information Visualization March 16, 2004

John Stasko

Text is Everywhere

• We use documents as primary information artifact in our lives

• Our access to documents has grown tremendously in recent years due to networking infrastructure

WWW

Digital libraries

(2)

Spring 2004 CS 7450 3

Big Question

• What can information visualization provide to help users in gathering information from text and document collections?

Spring 2004 CS 7450 4

InfoVis Tasks

• Two main tasks that Information

Visualization can assist with in this area

Enhance a person’s ability to read, understand and gain knowledge from a document

Understand the contents of a document or collection of documents without reading them

(3)

Spring 2004 CS 7450 5

More Specific Tasks

• Which documents contain text on topic XYZ?

• Which documents are of interest to me?

• Are there other documents that might be close enough to be worthwhile?

• What are the main themes of a document?

• How are certain words or themes distributed through a document?

We’re Not Doing...

• Information Retrieval

Active search process that brings back particular entities

• InfoVis (on the other hand)

Perhaps not sure precisely what you’re looking for

(4)

Spring 2004 CS 7450 7

Challenge

• Text is nominal data

Does not seem to map to geometric presentation as easily as ordinal and quantitative data

• The “Raw data --> Data Table” mapping now becomes more important

Spring 2004 CS 7450 8

Simple Taxonomy

Single document

Collection of documents

Enhanced presentation (syntax)

Concepts and relationships (semantics)

(5)

Spring 2004 CS 7450 9

Today

• Focus more on syntactic (enhanced presentation of a document) issue

Excentric labeling

Fluid text

Document lens

Tilebars

• Next time: focus more on presentation of concepts and themes

What’s wrong with this picture ?

(6)

Spring 2004 CS 7450 11

Problem

• Where are the labels?

Labeling is difficult to do when so many entities exist

Can add to ball of string problem

Spring 2004 CS 7450 12

Objectives

• Each label for a data point should:

Be readable

Non-ambiguously relate to its graphical object

Not hide other pertinent information

• Completeness (labeling of all objects) is desired but not always possible

(7)

Spring 2004 CS 7450 13

Two types of techniques

• Static

Road maps

Physical presentations

Used in cartography

• Dynamic

Interactive data points

Cartography

(8)

Spring 2004 CS 7450 15

Dynamic Techniques

Tool tip or “cursor sensitive balloon label”

SeeIT

“All or Nothing” technique, zooming (show all then small enough)

FilmFinder

Magic lens, shows labels inside

Dynamic Sampling, only 1->3 labels shown at any one time, context

Chalmers

Spring 2004 CS 7450 16

Excentric Labeling

Area of focus

Line and box color match the color of the data point

Description boxes containing the name of the data point

Fekete and Plaisant CHI ‘99

(9)

Spring 2004 CS 7450 17

Being Excentric

• “Invisible” – Does not appear until user hovers over data points

• Describes data points using the name field

• Visually connects labels with data points

• Can order labels to indicate graph position

Different Techniques

• Radial Labeling

No intersections

Position does not indicate value

• Vertical Labeling

Relative vertical position indicates Y value

• Horizontal Labeling

Relative horizontal position indicates X value

(10)

Spring 2004 CS 7450 19

Demos

• Different examples at

www.cs.umd.edu/hcil/excentric

Spring 2004 CS 7450 20

Other Issues

• Dealing with long labels

Just do truncation

• Limiting discontinuities when scanning

Users “figure it out”

• Facilitating selection of objects

Right click to lock current label display and select one

(11)

Spring 2004 CS 7450 21

Problems/Questions

• Still dependent on mouse movement rather than showing all

• Can more variables per data point be shown?

• Disorientation is greater when there are no other cues besides position

Augmentation

• Previous work augmented visualizations with helpful text

• How about augmenting text with helpful text?

• Example: Tell me a little more about what I’ll get if I follow a hyperlink

(12)

Spring 2004 CS 7450 23

Fluid Text

• Objective: Annotations to text material that expand on some item of interest

Example: For hyperlink of a few words, annotation explains more about it

• Important for deciding whether to follow

• Do not want to clutter up text with more noise

Zellweger, Chang and Mackinlay Hypertext ‘98

Spring 2004 CS 7450 24

Fluid Link

• When user moves mouse cursor over hyperlink, more explanatory material is presented

• Gloss - Expanded material presented

• Click on gloss follows link

• Moving away makes it disappear

(13)

Spring 2004 CS 7450 25

Framework

• Visual cue

Underline of anchor

• Animated transition

Gloss effects are smooth animations, allowing viewer to track changes more easily

• Accommodation

Primary material needs to shift in some way to make room for gloss

Accommodation Techniques

• Interline expansion

Main text takes on tighter line spacing to make more room for gloss, or main text moves into margins more

• Margin callout

Gloss drawn out in margins, pointed to by line

• Textual overlay

(14)

Spring 2004 CS 7450 27

Show Technique

• Must really see it work to understand it fully

• Shows examples of different techniques

Video

Spring 2004 CS 7450 28

Key Points

• Trying to keep main document free of clutter

• Animation is crucial

• One way where vision of electronic book seems a positive

• Comments?

(15)

Spring 2004 CS 7450 29

Problem

• Want to view a page of a document but see context of surrounding text and pages

Example

Text is too small to read

(16)

Spring 2004 CS 7450 31

Potential Solutions

Magnifying lens

Fisheye view

Bifocal

display Perspective

wall

Spring 2004 CS 7450 32

The Document Lens

• Focus + Context display

• Use 3D to make more effective use of available screen space

• Certain region of text in focus and readable, rest is projected onto 3D pyramid

Robertson and Mackinlay UIST ‘93

(17)

Spring 2004 CS 7450 33

The Document Lens

Video

(Saw little last time on WebBook)

Features

• Movable, rectangular magnifying lens

Mouse - x,y

Space,alt - z

• Truncated viewing pyramid

• Readable highlighted regions

• Effective use of most of the screen space

(18)

Spring 2004 CS 7450 35

Problem

Lens pyramid can

leave viewing frustrum So, couple movement of lens with viewpoint movement

Spring 2004 CS 7450 36

Application

(19)

Spring 2004 CS 7450 37

Application

Improving Text Searches

• What’s wrong with the common search?

(20)

Spring 2004 CS 7450 39

What Hearst Thinks is Wrong

• Query responses do not include include:

How strong the match is

How frequent each term is

How each term is distributed in the document

Overlap between terms

Length of document

• Document ranking is opaque

• Inability to compare between results

• Input limits term relationships

Spring 2004 CS 7450 40

TileBars

• Goal

Minimize time and effort for deciding which documents to view in detail

• Idea

Show the role of the query terms in the retrieved documents, making use of document structure

Hearst CHI ‘95

(21)

Spring 2004 CS 7450 41

TileBars

• Graphical representation of term distribution and overlap

• Simultaneously indicate:

Relative document length

Frequency of term sets in document

Distribution of term sets with respect to the document and each other

Interface

Search terms

Presentation

(22)

Spring 2004 CS 7450 43

Technique

Relative length of document

Two search terms

Blocks indicate “chunks”

of text, such as paragraphs

Blocks are darkened according to the frequency of the term in the

document

Spring 2004 CS 7450 44

See It Work

• Video

• TileBar web page with demo

http://elib.cs.berkeley.edu/tilebars/about.html

(23)

Spring 2004 CS 7450 45

Issues

• Horizontal alignment doesn’t match mental model

• May not be the best solution for web searches

Non-linear material

Images? Java apps?

• Anything else?

Still to Come

• Visualizing themes and semantics of document collections

(24)

Spring 2004 CS 7450 47

Reminder

• Project mid-way progress report due on 30th

2 hardcopies

Spring 2004 CS 7450 48

HWs

• HW4 (hierarchy vizs) not yet graded

• HW5 (graph drawing) returned at end

(25)

Spring 2004 CS 7450 49

HW6

• Commercial tools 2

• Due in 2 weeks

• 10 pages or less

• Stress cognitive tasks again

• InfoZoom & EZChooser

Demos

Upcoming

• Documents & Text 2

Reading Chapter 10 Salton et al Lin

• Software visualization

(26)

Spring 2004 CS 7450 51

References

• Spence and CMS texts

• All referred to papers

• Fall ‘99 slides

Lewis

Kim and Ho

Références

Documents relatifs

− Understand the contents of a document or collection of documents without reading them.?. Spring 2004 CS

In order to lead this evaluation of different tasks, document analysis problem have been divided into five subtasks respectively dedicated to segmentation, writing type

choices between letters, phonemes, morphemes, words, syntactic struc- tures, and textual and discoursal structures, including metrical and literary ones, as well as

The correct lexical analysis of the text is present in the acyclic automaton more often than in uniquely tagged text.

The context relevant for a given act of utterance is a composite of the surrounding co-text, the domain of discourse at issue, the genre of speech event in progress, the

Abstract —This paper presents a Document Image Analysis (DIA) system able to extract homogeneous typed and handwritten text regions from complex layout documents of various types..

Первый вариант необходим в случае дальнейшей индексации текста как для геометрического, так и для полнотекстового поиска, а

The differences in classification performance of three common classifiers, across message lengths and across enhancement meth- ods, as measured by the F1 score for accuracy