WWW and Internet
CS 7450 - Information Visualization March 4, 2004
John Stasko
Internet and WWW
• By nature, abstract, so good target for visualization
• Often described in terms of metaphors
− “Information Superhighway”
Spring 2004 CS 7450 3
Agenda
• Two main topics
− Presentations of the Internet and WWW Focus on topology and navigation, similar to the
graph visualization work
− Visual aids for browsing and using the WWW and the Internet
Assistive visualizations not focusing on presenting net structure and connectivity
Spring 2004 CS 7450 4
1. Internet and WWW Topology
• Fundamentally, the Internet is a graph with some existing physical topology, though that is often not how we want to conceptualize it
− Might think of it as having a structure
• Our discussions from graph visualization are germane here
Spring 2004 CS 7450 5
Mukherjea & Foley WWW ‘95
The Problem
The Problem
• Websites simply are too big
• Huge graphs
• Layout is challenging
Spring 2004 CS 7450 7
Step Back
• Why would someone want to visualize the WWW?
Spring 2004 CS 7450 8
Some Reasons
• Aid authors and webmasters with production and organization of content
• Assist Web surfers making sense of the information
• Help researchers understand the Web
Spring 2004 CS 7450 9
Depictions of the Web
•• GreatGreat web site that presents many
different conceptualizations of cyberspace
− Atlas of Cyberspace
http://www.cybergeography.org/atlas/
• Let’s take a few minutes to browse...
Mapping the Internet
• Bill Cheswick at ATT
• Interesting visualizations plus the data sets are available
• www.cs.bell-labs.com/who/ches/map/index.html
Spring 2004 CS 7450 11
Internet Traffic Paths
www.caida.org/tools/measurement/skitter/
Spring 2004 CS 7450 12
Mbone Map
www.cs.berkeley.edu/~elan/mbone/map.html
Spring 2004 CS 7450 13
Immersive Systems
www.pnl.gov/remote/projects/starlight/
View of Web Site’s Pages
Spring 2004 CS 7450 15
Web Site
www.mos.ics.keio.ac.jp/NattoView
Spring 2004 CS 7450 16
Web Site Visitations
www.inventix.com
Spring 2004 CS 7450 17
Task Analysis
• Potential web-related tasks
− How and when has info been accessed?
− Where do people enter and spend time?
− How do they move about?
− What paths aren’t traversed?
− Where are they coming from?
− What has been added, changed, deleted?
− Do changes affect navigation patterns?
− Do we need to do a redesign?
Data Set
• Each server request is a data case
• Example variables
− IP Address/Client host
− Timestamp
− URL requested
− HTTP status (success, not found, …)
− Bytes delivered
− Referencing URL (HTTP-Referrer)
− User agent (browser and OS info)
− ...
Spring 2004 CS 7450 19
One Approach
• Use existing InfoVis tool (Eureka, Spotfire, InfoZoom, etc.), load the data set, and analyze it
• Get all the strengths and weakness of the InfoVis tool for supporting particular
analysis tasks
Spring 2004 CS 7450 20
Web Ecology
• Problem: Most visualizations of the web fail to present the dynamically changing ecology of users and documents on the web
• What do we mean by ecology metaphor?
Chi, et al CHI ‘98
Spring 2004 CS 7450 21
Web Ecology
• By understanding set of relationships (ecology) among users and their
information environment, and its change through time (evolution) individuals can better understand
− Web Content
− Layout of physical and topological space
− Usage through time
Existing Visualizations
• Despite useful functions, problems
− Difficulty visualizing large number of documents
− Considerable amount of screen real-estate used
− Only permits the visualization of a site at a particular point in time, very difficult to make comparisons across times
− No mechanisms provided that allow differences in usage to be identified
Spring 2004 CS 7450 23
Techniques
• Disk Tree
− Center-rooted tree that represents the hyperlink structure of a web site
• Time Tube
− Set of disk trees that organizes and visualizes the evolution of web sites
Spring 2004 CS 7450 24
Task Application
• Visualizations designed to be useful for
− Local - Finding specific content
− Comparison - Comparing info at two places
− Global - Discovering a trend or pattern in the site
Spring 2004 CS 7450 25
Analysis Domain
• www.xerox.com, April ‘97
− 7,588 items across a 30-day period
− 889 new items
− Daily log kept of additions, modifications, and deletions of content
− Base data comes from link info, usage log from web servers
− Topological info from custom hyperlink database
Disk Trees
• Interested in shortest number of hops from one document to another
• Breadth-first traversal transforms the web graph into a tree by placing the node as close to the root node as possible
• After obtaining this tree we then visualize the structure using the Disk Tree
technique
Spring 2004 CS 7450 27
Disk Tree
Lines - tree links
Line size & brightness - page access frequency Color - page lifecycle stage
new: red
continued: green deleted: yellow
Spring 2004 CS 7450 28
Advantages
• Structure is compact, with pattern easily recognizable
• When viewed straight on or at slight angles, no occlusion problems, since entire layout is on a 2-D plane
• Unlike cone trees, this 2-D representation can utilize a third dimension for other information, such as time
• Circularity pleasing to the eye
Spring 2004 CS 7450 29
Time Tubes
• Time Tubes are multiple disk trees layered out along a spatial axis
• Advantages
− By using a spatial axis to represent time, we see information space-time in a single
visualization
− Focus and Context
− Possibility for Animation
Time Tubes
Spring 2004 CS 7450 31
Key Point
• Pages there any time during the studied period are shown in all disk trees for period, even if they didn’t exist yet
Spring 2004 CS 7450 32
Real Use
• Time Tube answers following questions:
− What devolved into dead wood? When did it?
Was there a correlation with the restructuring of the web?
Product safety pages got darker and darker, indicating lower usage
Doesn’t tell why page is less popular, just raises a flag to explore page further
Spring 2004 CS 7450 33
Real Use
• What evolved into a popular page? When did it? Was there a correlation with the restructuring of the Web site?
− Redesign of site called attention to Fact Book page
− Became more popular and the corresponding Disk Trees become greener and greener in successive weeks
Real Use
• How was usage affected by items added over time?
− Press release issued for new family of products, shown as red links
− Usage in the third week jumped from 1
access to 871 accesses, this example helps us understand that this was probably a well received product line
Spring 2004 CS 7450 35
Real Use
• How was usage affected by items deleted over time?
− Change in removing direct link from home page to main driver page did not negatively affect the overall use of driver information
− Info stayed green indicating usage, but link from home page was black, showing not much traffic
Spring 2004 CS 7450 36
E-Commerce Applications
• What if your focus is on understanding user access patterns for web sites selling products to consumers?
• What tasks are important?
Spring 2004 CS 7450 37
One Approach
• Blue Martini Software
• Aggregate web data and visualize simplified graph of user movements through web site
• Highlight places where people leave before purchasing
• ...
Brainerd & Becker InfoVis ‘01
Different icons represent different kinds of pages Only show most-used pages
Spring 2004 CS 7450 39
E-Commerce mimics mall shopping :^) Gender differences in purchase paths at websites
Spring 2004 CS 7450 40
2. Aiding WWW Browsing
• Can we utilize information visualization techniques to help people interact with the WWW and the Internet?
• Battle “lost in hyperspace” problem
• Help us know what’s there
• Help us find things
Spring 2004 CS 7450 41
WebBook and Web Forager
• Personal computers viewed as knowledge processors before
− Spreadsheets and calculators
• Now viewed as knowledge sources, portals to vast information worlds
− Networking and WWW
Card, Robertson and York CHI ‘96
WWW Problems
• Pages are hard to find
• Users get lost, can’t relocate pages
• Difficulty organizing things once found
• Difficulty doing knowledge processing on found thing
• Interacting with web is too slow to
incorporate gracefully into other activities
Spring 2004 CS 7450 43
Information Foraging Theory
• From Ecological Biology
• Idea: user stalks certain types of information
• Users have tendency to interact repeatedly with small clusters of information (locality of reference)
• Information encountered at certain rate
− Users evolve to increase finding rate
− Sources evolve to be more attractive
Spring 2004 CS 7450 44
Mechanisms Evolved
• 3 mechanisms in the evolution of the web on the server side
− Indexes - Lycos search
− Table of contents - Yahoo
− Home pages provided by users with big lists of related links
Spring 2004 CS 7450 45
Assisting People
• To provide insight
− must support sensemaking
− restructuring
− recoding
• Hotlists are one mechanism in this direction
Improvements
• WebBook and Web Forager try to do two things to foster information sensemaking
− Move away from a single web page, and group and manipulate related pages
− Move from a work environment containing a single element to a workspace in which the page is contained with multiple other entities, including Web Books
Spring 2004 CS 7450 47
WebBook
Spring 2004 CS 7450 48
Features
• WebBook allows for the rapid interaction with object at a higher level of
aggregation than pages
• 3D book representation, uses animation
• Can ruffle through pages, leave bookmarks
Spring 2004 CS 7450 49
Applications
• Hot List books
• Topic books
• Search reports
• Book books
• ...
Web Forager
Spring 2004 CS 7450 51
Web Forager
• Application that embeds the WebBook and other objects in a hierarchical 3D
information workspace
• Workspace is intended to create patches from the web where high density of
relevant pages (grouped together in Web Books) can be combined with rapid access
Spring 2004 CS 7450 52
Constituents
• Hierarchical Workspace - 3 levels
− Focus Place - full page shown, direct interaction
− Intermediate memory space - books or pages placed when they are in use but not
immediate focus
− Tertiary space - Storage (bookcase) Video
Spring 2004 CS 7450 53
Discussion
• Strengths/Weaknesses
Data Mountain
• 3D document management system
• Prototype is an alternative to web browser
“bookmarks” or “favorites”
• Could be used for any kind of document management
Robertson, et al
Spring 2004 CS 7450 55
Make-Up
• 3D inclined plane in which thumbnails of web pages are placed to serve as
favorites
• User is responsible for organization
• Uses smooth animation and audio to assist interaction
Spring 2004 CS 7450 Video 56
Spring 2004 CS 7450 57
User Study
• Data Mountain versus IE4 “Favorites”
• Experienced IE4 users
• Stored 100 pages, then retrieved them
• DM fared about-as-well with “title” cue
• DM fared better for all other cues
Leveraging Human Capabilities
• Spatial memory: analogy with paper placed on a pile on your desk
− User is responsible for personal organization
• 3D perception: minimal cognitive load, good utilization of screen space
Spring 2004 CS 7450 59
Interaction Techniques
• Placing pages: confinement to inclined plane makes normal 2D drag-and-drop sufficient; no unfamiliar 3D navigation needed
• Continuous feedback: both audio and visual feedback are natural; minimized unexpected interactions/surprises
Spring 2004 CS 7450 60
Limitations/Future
• Limits number of pages stored
• No explicit support for grouping
• Landmarks/contours as helpers
Spring 2004 CS 7450 61
Discussion
• Strengths/Weaknesses
• Could it be used elsewhere?
Upcoming
• Spring Break
− Woo-woo
• Text & documents (2 days)
− Reading
Chapter 10 Salton et al
• Mid-project reports due March 25
Spring 2004 CS 7450 63
References
• Spence and CMS texts
• All referred to papers and websites
• McNamara & Defnet and Craighill, Robeson & Sheridan F ‘99 slides