• Aucun résultat trouvé

Multivariate Data & Representations

N/A
N/A
Protected

Academic year: 2022

Partager "Multivariate Data & Representations"

Copied!
32
0
0

Texte intégral

(1)

Multivariate Data &

Representations

CS 7450 - Information Visualization Jan. 15, 2004

John Stasko

Agenda

• Data forms and representations

• Basic representation techniques

• Multivariate (>3) techniques

(2)

Spring 2004 CS 7450 3

Data Sets

• Data comes in many different forms

• Typically, not in the way you want it

• How is stored (in the raw)?

Example

• Cars

make

model

year

miles per gallon

cost

number of cylinders

weights

...

(3)

Spring 2004 CS 7450 5

Example

• Web pages

Data Tables

• Often, we take raw data and transform it into a form that is more workable

• Main idea:

Individual items are called cases

Cases have variables (attributes)

(4)

Spring 2004 CS 7450 7

Data Table Format

Case1 Case2 Case3 ...

Variable1 Variable2 Variable3 ...

Value11 Value21 Value31 Value12 Value22 Value32 Value13 Value23 Value33

Think of as a function f(case1) = <Val11, Val12,…>

Example

People in class

Mary Jim Sally Mitch ...

SSN Age Hair GPA ...

145 294 563 823 23 17 47 29 brown black blonde red

2.9 3.7 3.4 2.1

(5)

Spring 2004 CS 7450 9

Example

Baseball statistics

Variable Types

• Three main types of variables

N-Nominal (equal or not equal to other values)

Example: gender

O-Ordinal (obeys < relation, ordered set) Example: fr,so,jr,sr

Q-Quantitative (can do math on them) Example: age

(6)

Spring 2004 CS 7450 11

Metadata

• Descriptive information about the data

Might be something as simple as the type of a variable, or could be more complex

For times when the table itself just isn’t enough

Example: if variable1 is “l”, then variable3 can only be 3, 7 or 16

How Many Variables?

• Data sets of dimensions 1, 2, 3 are common

• Number of variables per class

1 - Univariate data

2 - Bivariate data

3 - Trivariate data

>3 - Hypervariate data

(7)

Spring 2004 CS 7450 13

Representation

• What’s a common way of visually representing multivariate data sets?

• Graphs!

Good Example

www.nationmaster.com

(8)

Spring 2004 CS 7450 15

Basic Symbolic Displays

• Graphs Å

• Charts

• Maps

• Diagrams

From:

S. Kosslyn, “Understanding charts and graphs”, Applied Cognitive Psychology, 1989.

1. Graph

Showing the relationships between variables’

values in a data table

0 20 40 60 80 100

1st Qtr

2nd Qtr

3rd Qtr

4th Qtr

East West North

(9)

Spring 2004 CS 7450 17

Properties

• Graph

Visual display that illustrates one or more relationships among entities

Shorthand way to present information

Allows a trend, pattern or comparison to be easily comprehended

Issues

• Critical to remain task-centric

Why do you need a graph?

What questions are being answered?

What data is needed to answer those questions?

Who is the audience?

money

(10)

Spring 2004 CS 7450 19

Graph Components

• Framework

Measurement types, scale

• Content

Marks, lines, points

• Labels

Title, axes, ticks

Other Symbolic Displays

• Chart

• Map

• Diagram

(11)

Spring 2004 CS 7450 21

2. Chart

Structure is important, relates entities to each other

• Primarily uses lines, enclosure, position to link entities

Examples: flowchart, family tree, org chart, ...

3. Map

Representation of spatial relations

• Locations identified by labels

(12)

Spring 2004 CS 7450 23

Choropleth Map

Areas are filled and colored differently to indicate some attribute of that region

Cartography

• Cartographers and map-makers have a wealth of knowledge about the design and creation of visual information artifacts

Labeling, color, layout, …

• Information visualization researchers should learn from this older, existing area

(13)

Spring 2004 CS 7450 25

4. Diagram

Schematic picture of object or entity

• Parts are symbolic

Examples: figures, steps in a manual, illustrations,...

Details

• What are the constituent pieces of these four symbolic displays?

• What are the building blocks?

(14)

Spring 2004 CS 7450 27

Visual Structures

• Composed of

Spatial substrate

Marks

Graphical properties of marks

Space

• Visually dominant

• Often put axes on space to assist

• Use techniques of

composition, alignment, folding, recursion, overloading to

1) increase use of space 2) do data encodings

(15)

Spring 2004 CS 7450 29

Marks

• Things that occur in space

Points

Lines

Areas

Volumes

Graphical Properties

• Size, shape, color, orientation...

Spatial properties Object properties Expressing

extent

Differentiating marks

Position

Size Grayscale

Orientation Color

Shape Texture

(16)

Spring 2004 CS 7450 31

Back to Data

• What were the different types of data sets?

• Number of variables per class

1 - Univariate data

2 - Bivariate data

3 - Trivariate data

>3 - Hypervariate data

Univariate Data

• Representations

7 5 3 1

Bill

0 20

Mean

low Middle 50% high

Tukey box plot

(17)

Spring 2004 CS 7450 33

What goes where

In univariate representations, we often think of the data case as being shown along one dimension, and the value in another

Linegraph Bar

graph

Y-axis is quantitative variable

See changes over consecutive values

Y-axis is quantitative variable

Compare relative point values

Alternative View

• We may think of graph as representing independent (data case) and dependent (value) variables

• Guideline:

Independent vs. dependent variables Put independent on x-axis

See resultant dependent variables along y-axis

(18)

Spring 2004 CS 7450 35

Bivariate Data

• Representations

Scatter plot is common

price

mileage

Two variables, want to see relationship

Is there a linear, curved or random pattern?

Each mark is now a data case

Trivariate Data

• Representations

3D scatter plot is possible

horsepower

mileage

price

(19)

Spring 2004 CS 7450 37

Alternative Representation

Still use 2D but have mark property

represent third variable

Alternative Representation

Represent each variable in its own explicit way

(20)

Spring 2004 CS 7450 39

Hypervariate Data

• Ahhh, the tough one

• Number of well-known visualization techniques exist for data sets of 1-3 dimensions

line graphs, bar graphs, scatter plots OK

We see a 3-D world (4-D with time)

• What about data sets with more than 3 variables?

Often the interesting, challenging ones

Multiple Views

Give each variable its own display

A B C D E 1 4 1 8 3 5 2 6 3 4 2 1 3 5 7 2 4 3 4 2 6 3 1 5

1

2

3

4

(21)

Spring 2004 CS 7450 41

Scatterplot Matrix

Represent each possible pair of variables in their own 2-D scatterplot

Useful for what?

Misses what?

Chernoff Faces

Encode different variables’ values in characteristics of human face

(22)

Spring 2004 CS 7450 43

Star Plots

Var 1

Var 2

Var 3 Var 4

Var 5

Value

Space out the n variables at equal angles around a circle

Each “spoke” encodes a variable’s value

Star Plot examples

(23)

Spring 2004 CS 7450 45

Star Coordinates

E. Kandogan, “Star Coordinates: A Multi-dimensional Visualization Technique with Uniform Treatment of Dimensions”, InfoVis 2000

Late-Breaking Hot Topics, Oct. 2000 Demo

Parallel Coordinates

• What are they?

Explain…

(24)

Spring 2004 CS 7450 47

Parallel Coordinates

V1 V2 V3 V4 V5

Encode variables along a horizontal row

Vertical line specifies values

Parallel Coords Example

Basic

Grayscale

Color

(25)

Spring 2004 CS 7450 49

Application

• System that uses parallel coordinates for information analysis and discovery

• Interactive tool

Can focus on certain data items

Color

Taken from:

A. Inselberg, “Multidimensional Detective”

InfoVis ‘97, 1997.

Discuss

• What was their domain?

• What was their problem?

• What were their data sets?

(26)

Spring 2004 CS 7450 51

The Problem

• VLSI chip manufacture

• Want high quality chips (high speed) and a high yield batch (% of useful chips)

• Able to track defects

• Hypothesis: No defects gives desired chip types

• 473 batches of data

The Data

• 16 variables

X1 - yield

X2 - quality

X3-X12 - # defects (inverted)

X13-X16 - physical parameters

(27)

Spring 2004 CS 7450 53

Parallel Coordinate Display

Yikes!

But not that bad

yield &

quality defects parameters

Distributions x1 - normal x2 - bipolar

Top Yield & Quality

defects split

(28)

Spring 2004 CS 7450 55

Minimal Defects

Not the highest yields and quality

Best Yields

Appears that some defects are necessary to produce the best chips Non-intuitive!

(29)

Spring 2004 CS 7450 57

Xmdv

Toolsuite created by Matthew Ward of WPI

Includes parallel coordinate views

Demo

Parallel Coordinate Tree

Demo

(30)

Spring 2004 CS 7450 59

Parallel Coordinates

• Technique

Strengths?

Weaknesses?

Sliding Rods

T. Lanning, K. Wittenburg, et al,

(31)

Spring 2004 CS 7450 61

Administratia

• Computer accounts

• HW 2 due Tuesday

Upcoming

• Multivariate vis tools

Reading:

Eick paper

• Visual perception

• Tufte (please be reading)

(32)

Spring 2004 CS 7450 63

Sources Used

CMS book

Referenced articles

Marti Hearst SIMS 247 lectures Kosslyn ‘89 article

A. Marcus, Graphic Design for Electronic Documents and User Interfaces

M. Monmonier, How to Lie with Maps

W. Cleveland, The Elements of Graphing Data

C. H. Yu, Visualization Techniques of Different Dimensions

http://seamonkey.ed.asu.edu/~behrens/asu/reports/compre/comp1.html http://www.csc.ncsu.edu/faculty/healey/PP/PP.html

Références

Documents relatifs

(1.21) Cette construction peut ˆ etre vue dans le cadre plus g´ en´ eral de la th´ eorie des valeurs extrˆ emes multivari´ ees comme le produit entre une composante radiale et

[r]

Using the table headers and the data in its rows and columns, we query existing knowledge bases (KBs) including Wikitology [1] and DBpedia to select the best class labels for

Using this representation, users and applications are enabled to navigate through the virtual directories that represent re- sources and properties, and to access property values

For some classes of planar maps (triangulations and 3-connected pla- nar maps) there exists an optimal succinct representation, supporting navigation in O(1) time. The memory

The whole processing of multivariate functions is expressed on graphs to have a formalism that can be applied on images, region adjacency graphs, and image databases.. Several

Saffron connects to data governance centres to extract relevant metadata, uplifts it to a data value knowledge graph, and presents analysis and semantic driven traceable explanations

Flair: An easy-to-use framework for state-of-the-art NLP. Contextual string embeddings for se- quence labeling. To- wards Scalable Hybrid Stores: Constraint-Based Rewriting to