Text and Documents 1
CS 7450 - Information Visualization March 16, 2004
John Stasko
Text is Everywhere
• We use documents as primary information artifact in our lives
• Our access to documents has grown tremendously in recent years due to networking infrastructure
− WWW
− Digital libraries
Spring 2004 CS 7450 3
Big Question
• What can information visualization provide to help users in gathering information from text and document collections?
Spring 2004 CS 7450 4
InfoVis Tasks
• Two main tasks that Information
Visualization can assist with in this area
− Enhance a person’s ability to read, understand and gain knowledge from a document
− Understand the contents of a document or collection of documents without reading them
Spring 2004 CS 7450 5
More Specific Tasks
• Which documents contain text on topic XYZ?
• Which documents are of interest to me?
• Are there other documents that might be close enough to be worthwhile?
• What are the main themes of a document?
• How are certain words or themes distributed through a document?
We’re Not Doing...
• Information Retrieval
− Active search process that brings back particular entities
• InfoVis (on the other hand)
− Perhaps not sure precisely what you’re looking for
Spring 2004 CS 7450 7
Challenge
• Text is nominal data
− Does not seem to map to geometric presentation as easily as ordinal and quantitative data
• The “Raw data --> Data Table” mapping now becomes more important
Spring 2004 CS 7450 8
Simple Taxonomy
Single document
Collection of documents
Enhanced presentation (syntax)
Concepts and relationships (semantics)
Spring 2004 CS 7450 9
Today
• Focus more on syntactic (enhanced presentation of a document) issue
− Excentric labeling
− Fluid text
− Document lens
− Tilebars
• Next time: focus more on presentation of concepts and themes
What’s wrong with this picture ?
Spring 2004 CS 7450 11
Problem
• Where are the labels?
− Labeling is difficult to do when so many entities exist
− Can add to ball of string problem
Spring 2004 CS 7450 12
Objectives
• Each label for a data point should:
− Be readable
− Non-ambiguously relate to its graphical object
− Not hide other pertinent information
• Completeness (labeling of all objects) is desired but not always possible
Spring 2004 CS 7450 13
Two types of techniques
• Static
− Road maps
− Physical presentations
− Used in cartography
• Dynamic
− Interactive data points
Cartography
Spring 2004 CS 7450 15
Dynamic Techniques
• Tool tip or “cursor sensitive balloon label”
− SeeIT
• “All or Nothing” technique, zooming (show all then small enough)
− FilmFinder
• Magic lens, shows labels inside
• Dynamic Sampling, only 1->3 labels shown at any one time, context
− Chalmers
Spring 2004 CS 7450 16
Excentric Labeling
Area of focus
Line and box color match the color of the data point
Description boxes containing the name of the data point
Fekete and Plaisant CHI ‘99
Spring 2004 CS 7450 17
Being Excentric
• “Invisible” – Does not appear until user hovers over data points
• Describes data points using the name field
• Visually connects labels with data points
• Can order labels to indicate graph position
Different Techniques
• Radial Labeling
− No intersections
− Position does not indicate value
• Vertical Labeling
− Relative vertical position indicates Y value
• Horizontal Labeling
− Relative horizontal position indicates X value
Spring 2004 CS 7450 19
Demos
• Different examples at
www.cs.umd.edu/hcil/excentric
Spring 2004 CS 7450 20
Other Issues
• Dealing with long labels
− Just do truncation
• Limiting discontinuities when scanning
− Users “figure it out”
• Facilitating selection of objects
− Right click to lock current label display and select one
Spring 2004 CS 7450 21
Problems/Questions
• Still dependent on mouse movement rather than showing all
• Can more variables per data point be shown?
• Disorientation is greater when there are no other cues besides position
Augmentation
• Previous work augmented visualizations with helpful text
• How about augmenting text with helpful text?
• Example: Tell me a little more about what I’ll get if I follow a hyperlink
Spring 2004 CS 7450 23
Fluid Text
• Objective: Annotations to text material that expand on some item of interest
− Example: For hyperlink of a few words, annotation explains more about it
• Important for deciding whether to follow
• Do not want to clutter up text with more noise
Zellweger, Chang and Mackinlay Hypertext ‘98
Spring 2004 CS 7450 24
Fluid Link
• When user moves mouse cursor over hyperlink, more explanatory material is presented
• Gloss - Expanded material presented
• Click on gloss follows link
• Moving away makes it disappear
Spring 2004 CS 7450 25
Framework
• Visual cue
− Underline of anchor
• Animated transition
− Gloss effects are smooth animations, allowing viewer to track changes more easily
• Accommodation
− Primary material needs to shift in some way to make room for gloss
Accommodation Techniques
• Interline expansion
− Main text takes on tighter line spacing to make more room for gloss, or main text moves into margins more
• Margin callout
− Gloss drawn out in margins, pointed to by line
• Textual overlay
Spring 2004 CS 7450 27
Show Technique
• Must really see it work to understand it fully
• Shows examples of different techniques
Video
Spring 2004 CS 7450 28
Key Points
• Trying to keep main document free of clutter
• Animation is crucial
• One way where vision of electronic book seems a positive
• Comments?
Spring 2004 CS 7450 29
Problem
• Want to view a page of a document but see context of surrounding text and pages
Example
Text is too small to read
Spring 2004 CS 7450 31
Potential Solutions
Magnifying lens
Fisheye view
Bifocal
display Perspective
wall
Spring 2004 CS 7450 32
The Document Lens
• Focus + Context display
• Use 3D to make more effective use of available screen space
• Certain region of text in focus and readable, rest is projected onto 3D pyramid
Robertson and Mackinlay UIST ‘93
Spring 2004 CS 7450 33
The Document Lens
Video
(Saw little last time on WebBook)
Features
• Movable, rectangular magnifying lens
− Mouse - x,y
− Space,alt - z
• Truncated viewing pyramid
• Readable highlighted regions
• Effective use of most of the screen space
Spring 2004 CS 7450 35
Problem
Lens pyramid can
leave viewing frustrum So, couple movement of lens with viewpoint movement
Spring 2004 CS 7450 36
Application
Spring 2004 CS 7450 37
Application
Improving Text Searches
• What’s wrong with the common search?
Spring 2004 CS 7450 39
What Hearst Thinks is Wrong
• Query responses do not include include:
− How strong the match is
− How frequent each term is
− How each term is distributed in the document
− Overlap between terms
− Length of document
• Document ranking is opaque
• Inability to compare between results
• Input limits term relationships
Spring 2004 CS 7450 40
TileBars
• Goal
− Minimize time and effort for deciding which documents to view in detail
• Idea
− Show the role of the query terms in the retrieved documents, making use of document structure
Hearst CHI ‘95
Spring 2004 CS 7450 41
TileBars
• Graphical representation of term distribution and overlap
• Simultaneously indicate:
− Relative document length
− Frequency of term sets in document
− Distribution of term sets with respect to the document and each other
Interface
Search terms
Presentation
Spring 2004 CS 7450 43
Technique
Relative length of document
Two search terms
Blocks indicate “chunks”
of text, such as paragraphs
Blocks are darkened according to the frequency of the term in the
document
Spring 2004 CS 7450 44
See It Work
• Video
• TileBar web page with demo
http://elib.cs.berkeley.edu/tilebars/about.html
Spring 2004 CS 7450 45
Issues
• Horizontal alignment doesn’t match mental model
• May not be the best solution for web searches
− Non-linear material
− Images? Java apps?
• Anything else?
Still to Come
• Visualizing themes and semantics of document collections
Spring 2004 CS 7450 47
Reminder
• Project mid-way progress report due on 30th
− 2 hardcopies
Spring 2004 CS 7450 48
HWs
• HW4 (hierarchy vizs) not yet graded
• HW5 (graph drawing) returned at end
Spring 2004 CS 7450 49
HW6
• Commercial tools 2
• Due in 2 weeks
• 10 pages or less
• Stress cognitive tasks again
• InfoZoom & EZChooser
− Demos
Upcoming
• Documents & Text 2
− Reading Chapter 10 Salton et al Lin
• Software visualization
Spring 2004 CS 7450 51
References
• Spence and CMS texts
• All referred to papers
• Fall ‘99 slides
− Lewis
− Kim and Ho