Multivariate Data &
Representations
CS 7450 - Information Visualization Jan. 15, 2004
John Stasko
Agenda
• Data forms and representations
• Basic representation techniques
• Multivariate (>3) techniques
Spring 2004 CS 7450 3
Data Sets
• Data comes in many different forms
• Typically, not in the way you want it
• How is stored (in the raw)?
Example
• Cars
−make
−model
−year
−miles per gallon
−cost
−number of cylinders
−weights
−...
Spring 2004 CS 7450 5
Example
• Web pages
Data Tables
• Often, we take raw data and transform it into a form that is more workable
• Main idea:
−Individual items are called cases
−Cases have variables (attributes)
Spring 2004 CS 7450 7
Data Table Format
Case1 Case2 Case3 ...
Variable1 Variable2 Variable3 ...
Value11 Value21 Value31 Value12 Value22 Value32 Value13 Value23 Value33
Think of as a function f(case1) = <Val11, Val12,…>
Example
People in class
Mary Jim Sally Mitch ...
SSN Age Hair GPA ...
145 294 563 823 23 17 47 29 brown black blonde red
2.9 3.7 3.4 2.1
Spring 2004 CS 7450 9
Example
Baseball statistics
Variable Types
• Three main types of variables
−N-Nominal (equal or not equal to other values)
Example: gender
−O-Ordinal (obeys < relation, ordered set) Example: fr,so,jr,sr
−Q-Quantitative (can do math on them) Example: age
Spring 2004 CS 7450 11
Metadata
• Descriptive information about the data
−Might be something as simple as the type of a variable, or could be more complex
−For times when the table itself just isn’t enough
−Example: if variable1 is “l”, then variable3 can only be 3, 7 or 16
How Many Variables?
• Data sets of dimensions 1, 2, 3 are common
• Number of variables per class
−1 - Univariate data
−2 - Bivariate data
−3 - Trivariate data
−>3 - Hypervariate data
Spring 2004 CS 7450 13
Representation
• What’s a common way of visually representing multivariate data sets?
• Graphs!
Good Example
www.nationmaster.com
Spring 2004 CS 7450 15
Basic Symbolic Displays
• Graphs Å
• Charts
• Maps
• Diagrams
From:
S. Kosslyn, “Understanding charts and graphs”, Applied Cognitive Psychology, 1989.
1. Graph
Showing the relationships between variables’
values in a data table
0 20 40 60 80 100
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
East West North
Spring 2004 CS 7450 17
Properties
• Graph
−Visual display that illustrates one or more relationships among entities
−Shorthand way to present information
−Allows a trend, pattern or comparison to be easily comprehended
Issues
• Critical to remain task-centric
−Why do you need a graph?
−What questions are being answered?
−What data is needed to answer those questions?
−Who is the audience?
money
Spring 2004 CS 7450 19
Graph Components
• Framework
−Measurement types, scale
• Content
−Marks, lines, points
• Labels
−Title, axes, ticks
Other Symbolic Displays
• Chart
• Map
• Diagram
Spring 2004 CS 7450 21
2. Chart
• Structure is important, relates entities to each other
• Primarily uses lines, enclosure, position to link entities
Examples: flowchart, family tree, org chart, ...
3. Map
• Representation of spatial relations
• Locations identified by labels
Spring 2004 CS 7450 23
Choropleth Map
Areas are filled and colored differently to indicate some attribute of that region
Cartography
• Cartographers and map-makers have a wealth of knowledge about the design and creation of visual information artifacts
−Labeling, color, layout, …
• Information visualization researchers should learn from this older, existing area
Spring 2004 CS 7450 25
4. Diagram
• Schematic picture of object or entity
• Parts are symbolic
Examples: figures, steps in a manual, illustrations,...
Details
• What are the constituent pieces of these four symbolic displays?
• What are the building blocks?
Spring 2004 CS 7450 27
Visual Structures
• Composed of
−Spatial substrate
−Marks
−Graphical properties of marks
Space
• Visually dominant
• Often put axes on space to assist
• Use techniques of
composition, alignment, folding, recursion, overloading to
1) increase use of space 2) do data encodings
Spring 2004 CS 7450 29
Marks
• Things that occur in space
−Points
−Lines
−Areas
−Volumes
Graphical Properties
• Size, shape, color, orientation...
Spatial properties Object properties Expressing
extent
Differentiating marks
Position
Size Grayscale
Orientation Color
Shape Texture
Spring 2004 CS 7450 31
Back to Data
• What were the different types of data sets?
• Number of variables per class
−1 - Univariate data
−2 - Bivariate data
−3 - Trivariate data
−>3 - Hypervariate data
Univariate Data
• Representations
7 5 3 1
Bill
0 20
Mean
low Middle 50% high
Tukey box plot
Spring 2004 CS 7450 33
What goes where
• In univariate representations, we often think of the data case as being shown along one dimension, and the value in another
Linegraph Bar
graph
Y-axis is quantitative variable
See changes over consecutive values
Y-axis is quantitative variable
Compare relative point values
Alternative View
• We may think of graph as representing independent (data case) and dependent (value) variables
• Guideline:
−Independent vs. dependent variables Put independent on x-axis
See resultant dependent variables along y-axis
Spring 2004 CS 7450 35
Bivariate Data
• Representations
Scatter plot is common
price
mileage
Two variables, want to see relationship
Is there a linear, curved or random pattern?
Each mark is now a data case
Trivariate Data
• Representations
3D scatter plot is possible
horsepower
mileage
price
Spring 2004 CS 7450 37
Alternative Representation
Still use 2D but have mark property
represent third variable
Alternative Representation
Represent each variable in its own explicit way
Spring 2004 CS 7450 39
Hypervariate Data
• Ahhh, the tough one
• Number of well-known visualization techniques exist for data sets of 1-3 dimensions
−line graphs, bar graphs, scatter plots OK
−We see a 3-D world (4-D with time)
• What about data sets with more than 3 variables?
−Often the interesting, challenging ones
Multiple Views
Give each variable its own display
A B C D E 1 4 1 8 3 5 2 6 3 4 2 1 3 5 7 2 4 3 4 2 6 3 1 5
1
2
3
4
Spring 2004 CS 7450 41
Scatterplot Matrix
Represent each possible pair of variables in their own 2-D scatterplot
Useful for what?
Misses what?
Chernoff Faces
Encode different variables’ values in characteristics of human face
Spring 2004 CS 7450 43
Star Plots
Var 1
Var 2
Var 3 Var 4
Var 5
Value
Space out the n variables at equal angles around a circle
Each “spoke” encodes a variable’s value
Star Plot examples
Spring 2004 CS 7450 45
Star Coordinates
E. Kandogan, “Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions”, InfoVis 2000
Late-Breaking Hot Topics, Oct. 2000 Demo
Parallel Coordinates
• What are they?
−Explain…
Spring 2004 CS 7450 47
Parallel Coordinates
V1 V2 V3 V4 V5
Encode variables along a horizontal row
Vertical line specifies values
Parallel Coords Example
Basic
Grayscale
Color
Spring 2004 CS 7450 49
Application
• System that uses parallel coordinates for information analysis and discovery
• Interactive tool
−Can focus on certain data items
−Color
Taken from:
A. Inselberg, “Multidimensional Detective”
InfoVis ‘97, 1997.
Discuss
• What was their domain?
• What was their problem?
• What were their data sets?
Spring 2004 CS 7450 51
The Problem
• VLSI chip manufacture
• Want high quality chips (high speed) and a high yield batch (% of useful chips)
• Able to track defects
• Hypothesis: No defects gives desired chip types
• 473 batches of data
The Data
• 16 variables
−X1 - yield
−X2 - quality
−X3-X12 - # defects (inverted)
−X13-X16 - physical parameters
Spring 2004 CS 7450 53
Parallel Coordinate Display
Yikes!
But not that bad
yield &
quality defects parameters
Distributions x1 - normal x2 - bipolar
Top Yield & Quality
defects split
Spring 2004 CS 7450 55
Minimal Defects
Not the highest yields and quality
Best Yields
Appears that some defects are necessary to produce the best chips Non-intuitive!
Spring 2004 CS 7450 57
Xmdv
Toolsuite created by Matthew Ward of WPI
Includes parallel coordinate views
Demo
Parallel Coordinate Tree
Demo
Spring 2004 CS 7450 59
Parallel Coordinates
• Technique
−Strengths?
−Weaknesses?
Sliding Rods
T. Lanning, K. Wittenburg, et al,
Spring 2004 CS 7450 61
Administratia
• Computer accounts
• HW 2 due Tuesday
Upcoming
• Multivariate vis tools
−Reading:
Eick paper
• Visual perception
• Tufte (please be reading)
Spring 2004 CS 7450 63
Sources Used
CMS book
Referenced articles
Marti Hearst SIMS 247 lectures Kosslyn ‘89 article
A. Marcus, Graphic Design for Electronic Documents and User Interfaces
M. Monmonier, How to Lie with Maps
W. Cleveland, The Elements of Graphing Data
C. H. Yu, Visualization Techniques of Different Dimensions
http://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.html http://www.csc.ncsu.edu/faculty/healey/PP/PP.html