• Aucun résultat trouvé

Ggplot2

N/A
N/A
Protected

Academic year: 2022

Partager "Ggplot2"

Copied!
106
0
0

Texte intégral

(1)

A

DVANCED

G

RAPHICS Julie Scholler

M Éc E n

(2)

Welcome to the tidyverse

coherent system of packages that work in harmony

for Data Science

tidyverse.org

Figure 1: Packages de tidyverse

(3)

Graphics with ggplot2

Why?

elegant, polyvalent

mature and complete graphics system

very flexible

default behaviour carefully chosen

theme system for polishing plot appearance

How?

grammar of graphics(Wilkinson, 2005)

(4)

The Grammar Of Graphics

The basic idea to building plot

specify blocks/layers

combine them

get any kind of graphics Blocks/layers

data

aesthetic mapping

geometric object

statistical transformations

scales

coordinate system

position adjustments

faceting

(5)

Setup: install the tidyverse package

# install.packages("ggplot2") library(ggplot2)

Or

# install.packages("tidyverse") library(tidyverse)

## -- Attaching packages --- tidyverse 1.2.1 --

## v ggplot2 3.0.0 v purrr 0.2.4

## v tibble 1.4.2 v dplyr 0.7.5

## v tidyr 0.8.1 v stringr 1.3.0

## v readr 1.1.1 v forcats 0.3.0

## -- Conflicts --- tidyverse_conflicts() --

## x dplyr::filter() masks stats::filter()

## x dplyr::lag() masks stats::lag()

(6)

Example Data: Housing prices

housing <- read_csv("dataSets/landdata-states.csv") class(housing)

## [1] "tbl_df" "tbl" "data.frame"

head(housing[1:5])

## # A tibble: 6 x 5

## State region Date Home.Value Structure.Cost

## <chr> <chr> <dbl> <int> <int>

## 1 AK West 2010. 224952 160599

## 2 AK West 2010. 225511 160252

## 3 AK West 2010. 225820 163791

## 4 AK West 2010. 224994 161787

## 5 AK West 2008. 234590 155400

## 6 AK West 2008. 233714 157458

(7)

Housing

str(housing)

## Classes 'tbl_df', 'tbl' and 'data.frame': 7803 obs. of 11 variables:

## $ State : chr "AK" "AK" "AK" "AK" ...

## $ region : chr "West" "West" "West" "West" ...

## $ Date : num 2010 2010 2010 2010 2008 ...

## $ Home.Value : int 224952 225511 225820 224994 234590 233714 232999 232164 231039 229395 ...

## $ Structure.Cost : int 160599 160252 163791 161787 155400 157458 160092 162704 164739 165424 ...

## $ Land.Value : int 64352 65259 62029 63207 79190 76256 72906 69460 66299 63971 ...

## $ Land.Share..Pct.: num 28.6 28.9 27.5 28.1 33.8 32.6 31.3 29.9 28.7 27.9 ...

## $ Home.Price.Index: num 1.48 1.48 1.49 1.48 1.54 ...

## $ Land.Price.Index: num 1.55 1.58 1.49 1.52 1.88 ...

## $ Year : int 2010 2010 2009 2009 2007 2008 2008 2008 2008 2009 ...

## $ Qrtr : int 1 2 3 4 4 1 2 3 4 1 ...

## - attr(*, "spec")=List of 2

## ..$ cols :List of 11

## .. ..$ State : list()

## .. .. ..- attr(*, "class")= chr "collector_character" "collector"

## .. ..$ region : list()

## .. .. ..- attr(*, "class")= chr "collector_character" "collector"

## .. ..$ Date : list()

## .. .. ..- attr(*, "class")= chr "collector_double" "collector"

## .. ..$ Home.Value : list()

## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"

## .. ..$ Structure.Cost : list()

## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"

## .. ..$ Land.Value : list()

## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"

## .. ..$ Land.Share..Pct.: list()

## .. .. ..- attr(*, "class")= chr "collector_double" "collector"

## .. ..$ Home.Price.Index: list()

## .. .. ..- attr(*, "class")= chr "collector_double" "collector"

## .. ..$ Land.Price.Index: list()

## .. .. ..- attr(*, "class")= chr "collector_double" "collector"

## .. ..$ Year : list()

## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"

## .. ..$ Qrtr : list()

## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"

## ..$ default: list()

## .. ..- attr(*, "class")= chr "collector_guess" "collector"

## ..- attr(*, "class")= chr "col_spec"

(8)

ggplot2 VS Base for simple graphs

hist(housing$Home.Value)

Histogram of housing$Home.Value

housing$Home.Value

Frequency

0e+00 2e+05 4e+05 6e+05 8e+05

01000

(9)

ggplot2 VS Base for simple graphs

ggplot(housing) + aes(x = Home.Value) + geom_histogram()

0 500 1000 1500

0 250000 500000 750000

Home.Value

count

(10)

ggplot2 VS Base graphics - round 2

plot(Home.Value ~ Date, col = factor(State),

data = filter(housing, State %in% c("MA", "TX"))) legend("topleft", legend = c("MA", "TX"),

col = c("black", "red"), pch = 1)

1980 1990 2000 2010

1e+054e+05

Date

Home.Value

MA TX

(11)

ggplot2 VS Base graphics - round 2

ggplot(filter(housing, State %in% c("MA", "TX")))+ aes(x=Date, y=Home.Value, color=State)+ geom_point()

1e+05 2e+05 3e+05 4e+05

1980 1990 2000 2010

Date

Home.Value State

MA TX

(12)

Syntax

ggplot(data=...) + aes(x=..., y=...) + geom_...()

Data: what is being visualized

Aesthrtic Mappings: mappings between variables in the data and components of the chart

Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes

(13)

First try

ggplot(housing)

(14)

Aesthetic Mapping

In ggplot: aesthetic= “something you can see”

Examples

position (on the x and y axes)

color (“outside” color)

fill (“inside” color)

shape (of points)

linetype

size

Aesthetic mappings are set with the aes()function.

(15)

Second try

ggplot(housing) + aes(x = Land.Value,y = Structure.Cost)

1e+05 2e+05 3e+05

0e+00 2e+05 4e+05 6e+05

Land.Value

Structure.Cost

(16)

Geometic Objects (geom)

Examples

points: geom_point

lines: geom_line

bar: geom_bar

histogram: geom_histogram

boxplot: geom_boxplot

List of available geometric objects Reference list

help.search("geom_", package = "ggplot2")

(17)

Points (Scatterplot)

hp2001Q1 <- filter(housing, Date == 2001.25) ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = Land.Value) + geom_point()

75000 100000 125000 150000 175000

0 50000 100000 150000 200000

Land.Value

Structure.Cost

(18)

Points (Scatterplot)

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), color=region) + geom_point()

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

region

Midwest N. East South West NA

(19)

Aesthetic Mapping VS Assignment

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value),color=region) + geom_point(aes(size=2),color="red")

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost size

2

(20)

Points (Scatterplot)

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), shape=region) + geom_point()

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

region

Midwest N. East South West NA

(21)

Points (Scatterplot)

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), color=Home.Value) + geom_point()

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

100000 150000 200000 250000 300000 350000

Home.Value

(22)

Points (Scatterplot)

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value) + geom_point()

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

Home.Value

100000 150000 200000 250000 300000 350000

(23)

Points (Scatterplot)

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value,color=region) + geom_point()

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(24)

Points (Scatterplot)

ggplot(hp2001Q1)+

aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value,shape=region,

color=Home.Price.Index) + geom_point()

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

1.04 1.06 1.08 1.10 Home.Price.Index

(25)

Faceting

Creates separate graphs for subsets of data

Two solutions

1. facet_wrap(): subsets as the levels of a single grouping variable

2. facet_grid(): subsets as the crossing of two grouping variables

Facilitates comparison among plots

(26)

Faceting

ggplot(filter(housing, Date == 2001.25) ) +

aes(y = Structure.Cost, x = log(Land.Value),

size=Home.Value,color=region) + geom_point() + facet_wrap(~ region,ncol=3)

West NA

Midwest N. East South

9 10 11 12 9 10 11 12

9 10 11 12 75000

100000 125000 150000 175000

75000 100000 125000 150000 175000

log(Land.Value)

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(27)

Faceting

ggplot(filter(housing, Date == 2000.25| Date == 2008.25) ) + aes(y = Structure.Cost, x = log(Land.Value),

size=Home.Value,color=region) + geom_point() + facet_grid(Date~ region)

Midwest N. East South West NA

2000.252008.25

9 10111213 9 10111213 9 10111213 9 10111213 9 10111213 100000

150000 200000 250000

100000 150000 200000 250000

log(Land.Value)

Structure.Cost

Home.Value 2e+05 4e+05 6e+05 8e+05

region Midwest N. East South West NA

(28)

Faceting

ggplot(filter(housing, Date == 2001.25) ) +

aes(y = Structure.Cost, x = log(Land.Value),

size=Home.Value,color=region) + geom_point() + facet_grid(~ region)

Midwest N. East South West NA

9 10 11 12 9 10 11 12 9 10 11 12 9 10 11 12 9 10 11 12 75000

100000 125000 150000 175000

log(Land.Value)

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(29)

Syntax

ggplot(data=...) + aes(x=..., y=...) + geom_...() + facet_...(...)

Data: what is being visualized

Aesthetic Mappings: mappings between variables in the data and components of the chart

Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes

Facets: describe how the data is partitioned into subsets and how these different subsets are plotted

(30)

Adding geom - Local/Global aesthetics

hp2001Q1$pred.SC <- predict(

lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1)) p1<- ggplot(hp2001Q1) +

aes(x = log(Land.Value), y = Structure.Cost) p1 + geom_point(aes(color = Home.Value)) +

geom_line(aes(y = pred.SC))

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

100000 150000 200000 250000 300000 350000

Home.Value

(31)

Smoothers

p1 + geom_point(aes(color = Home.Value)) + geom_smooth(se=FALSE)

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

100000 150000 200000 250000 300000 350000

Home.Value

(32)

Smoothers

p1 + geom_point(aes(color = Home.Value)) + geom_smooth()

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

100000 150000

9 10 11 12

log(Land.Value)

Structure.Cost

100000 150000 200000 250000 300000 350000 Home.Value

(33)

Syntax

ggplot(data=...) + aes(x=..., y=...) + geom_...() + facet_...(...)

Data: what is being visualized

Aesthetic Mappings: mappings between variables in the data and components of the chart

Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes

Facets: describe how the data is partitioned into subsets and how these different subsets are plotted

(34)

Statistical Transformations

Some plot types, such as boxplots, histograms, prediction lines etc.

require statistical transformations

for a boxplot the y values must be transformed to the quantiles

for a histogram the y values must be transformed into headcounts

Eachgeom has a default statistic.

(35)

Setting Statistical Transformation Arguments

Arguments tostat_functions can be passed through geom_

functions.

p2 <- ggplot(housing, aes(x = Home.Value)) p2 + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

0 500 1000 1500

0 250000 500000 750000

Home.Value

count

(36)

Setting Statistical Transformation Arguments

We can change it by passing the binwidthargument to the stat_binfunction:

p2 + geom_histogram(stat = "bin", binwidth=4000)

0 50 100 150 200 250

0 250000 500000 750000

Home.Value

count

(37)

Changing The Statistical Transformation

housing.sum <- aggregate(housing["Home.Value"], housing["State"], FUN=mean) ggplot(housing.sum) + aes(x=State,y=Home.Value) +

geom_bar(stat="identity")

0e+00 1e+05 2e+05 3e+05

AKALARAZCACOCTDCDEFLGAHIIAIDILINKSKYLAMAMDMEMIMNMOMSMTNCNDNENHNJNMNVNYOHOKORPARISCSDTNTXUTVAVTWAWIWVWY State

Home.Value

(38)

Bar charts

ggplot(cars)+aes(x=cyl)+geom_bar()

0 5 10

4 6 8

cyl

count

(39)

Bar charts

ggplot(cars)+aes(x=cyl,color=transmission)+geom_bar()

0 5 10

4 6 8

cyl

count

transmission auto manual

(40)

Bar charts

ggplot(cars)+aes(x=cyl,color=transmission, fill=transmission)+geom_bar()

0 5 10

4 6 8

cyl

count

transmission

auto manual

(41)

Bar charts

ggplot(cars)+aes(x=cyl,color=transmission, fill=transmission)+geom_bar(position="fill")

0.00 0.25 0.50 0.75 1.00

4 6 8

cyl

count

transmission

auto manual

(42)

Bar charts

ggplot(cars)+aes(x=cyl,color=transmission,

fill=transmission)+geom_bar(position="dodge")

0.0 2.5 5.0 7.5 10.0 12.5

4 6 8

cyl

count

transmission auto manual

(43)

Position adjustement

Insidegeom

identity

stack

fill

dodge: side by side

jitter: useful for points (geom_jitter())

nudge: shift points

(44)

Syntax

ggplot(data=...) + aes(x=..., y=...) + geom_...() + facet_...(...)

Data: what is being visualized

Aesthetic Mappings: mappings between variables in the data and components of the chart

Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes

Statistical Transformations: applied to the data to summarize it

Facets: describe how the data is partitioned into subsets and how these different subsets are plotted

(45)

Create Good and Effective Graphics

Labels

Annotations

Coordinate

Scales

Themes

Interactivity

(46)

Labels

All in one

+ labs(title=..., subtitle=..., caption=..., x=..., y=..., color=..., etc.)

Alternate forms

+ ggtitle=(...)+xlabs(...)+ylabs(...)+etc.

Graphic to improve

l1<-ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value,color=region) + geom_point()

(47)

Labels

l2<-l1 + labs(title="Structure Cost and Land Value", subtitle="I don't know what to say", caption="Still no idea")

l2

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000 I don't know what to say

Structure Cost and Land Value

Still no idea

(48)

Labels

l3<-l2 + labs(x="Logtransformation of Land value", y="Strusture cost")

l3

75000 100000 125000 150000 175000

9 10 11 12

Logtransformation of Land value

Strusture cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000 I don't know what to say

Structure Cost and Land Value

Still no idea

(49)

Labels

l3+ labs(color="Region",size="Home value")

75000 100000 125000 150000 175000

9 10 11 12

Logtransformation of Land value

Strusture cost

Home value 100000 150000 200000 250000 300000 350000

Region Midwest N. East South West NA I don't know what to say

Structure Cost and Land Value

Still no idea

(50)

Annotations

Eachgeom accepts a particular set of mappings; for example geom_text()accepts a labelsmapping.

p1 + geom_point() +

geom_text(aes(label=State), size = 3)

AK

AR AL

AZ

CA

CO CT DE

FL GA

HI

IA ID

IL

IN KSKY LA

MA

MD ME

MI MN

MO

MS MT

NC

ND NE

NH

NJ

NM

NV NY

OH

OK

OR

PA

RI

SC SD

TN TX

UT VA VT

WA

WI

WV

WY

DC 75000

100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

(51)

Annotations

## install.packages("ggrepel") library("ggrepel")

p1 + geom_point() +

geom_text_repel(aes(label=State), size = 3)

AK

AL AR

AZ

CA CO DE CT

FL GA

HI

IA

ID IL

IN KS

KY LA

MA

MD ME

MI MN

MO

MS

MT NC

ND NE

NH

NJ

NM

NV NY

OH

OK

OR

PA

RI

SC SD

TN

TX

UT VA VT

WA

WI

WV

WY

DC 75000

100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

(52)

Annotations

l3 + geom_text_repel(aes(label=State), size = 3)

AK

AR AL

AZ

CA

CO DE CT

FL GA

HI

IA ID

IL

IN KSKY LA

MA

ME MD MI

MN

MO

MS

MT NC

ND NE

NH NJ

NM

NV NY

OH

OK

OR

PA RI

SC SD TN

TX

UT VA VT

WA

WI

WV

WY

DC 75000

100000 125000 150000 175000

9 10 11 12

Logtransformation of Land value

Strusture cost

region a a a a a

Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000 I don't know what to say

Structure Cost and Land Value

Still no idea

(53)

Scales: Controlling Aesthetic Mapping

Aesthetic mapping do not say how.

Describing what colors/shapes/sizes etc. to use is done by modifying the correspondingscale.

Scale examples

x, y

color and fill

size

shape

line type

Scale syntax

scale_<aesthetic>_<type>

(54)

Some available Scales

Scale Types Examples

scale_color_ identity scale_fill_continuous scale_fill_ manual scale_color_discrete scale_size_ continuous scale_size_manual

discrete scale_size_discrete

scale_shape_ discrete scale_shape_discrete scale_linetype_ identity scale_shape_manual

manual scale_linetype_discrete

(55)

Some available Scales

Scale Types Examples

scale_x_ continuous scale_x_continuous scale_y_ discrete scale_y_discrete

reverse scale_x_log log scale_y_reverse date scale_x_date datetime scale_y_datetime

(56)

Scale Modification Examples - color

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), color=region) + geom_point()+

scale_color_manual(name = "Region of the world", values = c("#24576D","#099DD7","#28AADC",

"#248E84","#F2583F","#96503F","white"))

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

Region of the world Midwest N. East South West NA

(57)

Scale Modification Examples - color

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), color=region) + geom_point()+

scale_color_brewer(name = "Region of the world", palette="Dark2")

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

Region of the world Midwest N. East South West NA

(58)

Scale Modification Examples - color

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), color=Home.Value) + geom_point()+

scale_color_continuous(breaks = c(100000,200000, 300000,400000))

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

1e+05 2e+05 3e+05 Home.Value

(59)

Scale Modification Examples - color

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), color=Home.Value) + geom_point()+

scale_color_gradient(breaks = c(100000,200000,300000, 400000),low="blue",high="red")

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

1e+05 2e+05 3e+05 Home.Value

(60)

Scale Modification Examples - color

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), color=Home.Value) + geom_point()+

scale_color_gradient2(breaks = c(100000,200000,300000, 400000),low="blue",high="red",mid="green",midpoint=200000)

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

1e+05 2e+05 3e+05 Home.Value

(61)

Scale Modification Examples - shape

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), shape=region) + geom_point()+

scale_shape_manual(values=c(4,8,11,10,43))

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

region

Midwest N. East South West NA

(62)

Scale Modification Examples - size

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value) + geom_point()

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

Home.Value

100000 150000 200000 250000 300000 350000

(63)

lims()

lims(...)

xlim(...)

ylim(...)

(64)

Coordinates

housing.sum <- aggregate(housing["Home.Value"], housing["State"], FUN=mean) ggplot(housing.sum) + aes(x=reorder(State,Home.Value),

y=Home.Value) + geom_bar(stat="identity") + coord_flip()

WVMSOKNDNEARSDKSKYALLATXINIA MOOHMNMENMSCGANCMTTNWIPAMIID WYORCOWAMDMANHAKDENVNYCADCVTAZUTVANJCTFLRIHIIL

0e+00 1e+05 2e+05 3e+05

Home.Value

reorder(State, Home.Value)

(65)

Guides

(66)

Themes

The ggplot2theme system handles non-data plot elements such as

Axis labels

Plot background

Facet label backround

Legend appearance Built-in themes include:

theme_gray() (default)

theme_bw()

theme_classc()

(67)

Themes

g1<-ggplot(hp2001Q1)+

aes(y = Structure.Cost, x = Land.Value,

color=Home.Value,shape=region) + geom_point() + scale_x_log10() g1

75000 100000 125000 150000 175000

1e+04 1e+05

Land.Value

Structure.Cost 100000

150000 200000 250000 300000 350000 Home.Value

region Midwest N. East South West NA

(68)

Themes

g1 + theme_linedraw()

## Warning: Removed 1 rows containing missing values (geom_point).

75000 100000 125000 150000 175000

1e+04 1e+05

Land.Value

Structure.Cost 100000

150000 200000 250000 300000 350000 Home.Value

region Midwest N. East South West NA

(69)

Themes

g1 + theme_light()

## Warning: Removed 1 rows containing missing values (geom_point).

75000 100000 125000 150000 175000

1e+04 1e+05

Land.Value

Structure.Cost 100000

150000 200000 250000 300000 350000 Home.Value

region Midwest N. East South West NA

(70)

Themes

g1 + theme_minimal()

## Warning: Removed 1 rows containing missing values (geom_point).

75000 100000 125000 150000 175000

1e+04 1e+05

Land.Value

Structure.Cost 100000

150000 200000 250000 300000 350000 Home.Value

region Midwest N. East South West NA

(71)

Overriding theme defaults

g1 + theme_minimal() +

theme(text = element_text(color = "turquoise"))

## Warning: Removed 1 rows containing missing values (geom_point).

75000 100000 125000 150000 175000

1e+04 1e+05

Land.Value

Structure.Cost 100000

150000 200000 250000 300000 350000 Home.Value

region Midwest N. East South West NA

(72)

Legends

ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = log(Land.Value), color=region) + geom_point()+

theme(legend.position="bottom")

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

region Midwest N. East South West NA

(73)

Creating and saving new themes

theme_new <- theme_bw() +

theme(plot.background = element_rect(size = 1, color = "blue", fill = "gray"), text=element_text(size = 12, color = "red"), axis.text.y = element_text(colour = "purple"), axis.text.x = element_text(colour = "green"), panel.background = element_rect(fill = "pink"))

(74)

Result

g1 + theme_new

75000 100000 125000 150000 175000

1e+04 1e+05

Land.Value

Structure.Cost 100000

150000 200000 250000 300000 350000 Home.Value

region Midwest N. East South West NA

(75)

A

DVANCED

G

RAPHICS

- 2

Julie Scholler

M Éc E n

(76)

Example Data: Housing prices

housing <- read_csv("dataSets/landdata-states.csv") hp2001Q1 <- filter(housing, Date == 2001.25)

p1<- ggplot(hp2001Q1) + aes(x = log(Land.Value), y = Structure.Cost) + geom_point()

p1

75000 100000 125000 150000 175000

9 10 11 12

log(Land.Value)

Structure.Cost

(77)

Faceting

ggplot(filter(housing, Date==2000.25|Date==2008.25)) + aes(y = Structure.Cost, x = log(Land.Value),

size=Home.Value,color=region) + geom_point() + facet_grid(Date~ region)

Midwest N. East South West NA

2000.252008.25

9 10111213 9 10111213 9 10111213 9 10111213 9 10111213 100000

150000 200000 250000

100000 150000 200000 250000

log(Land.Value)

Structure.Cost

Home.Value 2e+05 4e+05 6e+05 8e+05

region Midwest N. East South West NA

(78)

Syntax

ggplot(data=...) + aes(x=..., y=...,

color=...,size=...,group=...) + geom_...() + facet_...(...)

Data: what is being visualized

Aesthetic Mappings: mappings between variables in the data and components of the chart

Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes

Statistical Transformations: applied to the data to summarize it

Facets: describe how the data is partitioned into subsets and how these different subsets are plotted

(79)

Create Good and Effective Graphics

Labels

+ labs(title=..., subtitle=..., caption=..., x=..., y=..., color=..., etc.)

Annotations + geom_text()

+ geom_text_repel()

Coordinate + coord_flip()

Scales, Guides, Themes

Interactivity

(80)

Scales: Controlling Aesthetic Mapping

gg<- ggplot(hp2001Q1) +

aes(y = Structure.Cost, x = Land.Value, size=Home.Value,color=region) + geom_point()

gg

75000 100000 125000 150000 175000

0 50000 100000 150000 200000

Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(81)

Scales

gg+scale_x_continuous(breaks=seq(0,250000,25000))

75000 100000 125000 150000 175000

0 25000 50000 75000 100000 125000 150000 175000 200000 Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(82)

Scales

gg+scale_x_continuous(breaks=seq(0,250000,25000), minor_breaks = NULL)

75000 100000 125000 150000 175000

0 25000 50000 75000 100000 125000 150000 175000 200000 Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(83)

Scales

gg+scale_x_continuous(breaks=seq(0,250000,50000),

minor_breaks = seq(0,250000,10000))

75000 100000 125000 150000 175000

0 50000 100000 150000 200000

Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(84)

Scales

gg+scale_x_continuous(limits=c(25000,300000))

## Warning: Removed 30 rows containing missing values (geom_point).

75000 100000 125000 150000 175000

1e+05 2e+05 3e+05

Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(85)

Scales

gg+scale_x_continuous(expand=c(0.25,0.25))

75000 100000 125000 150000 175000

0 50000 100000 150000 200000 250000 Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(86)

Scales

gg+scale_x_continuous(breaks=seq(0,250000,50000), minor_breaks = NULL,trans="log")

75000 100000 125000 150000 175000

50000 100000150000200000250000 Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(87)

Guides, Themes & Legends

Change legend order

gg + guides(color = guide_legend(order = 2), size = guide_legend(order = 1))

75000 100000 125000 150000 175000

0 50000 100000 150000 200000

Land.Value

Structure.Cost

Home.Value 100000 150000 200000 250000 300000 350000

region Midwest N. East South West NA

(88)

Legends

gg + theme(legend.position="bottom")

75000 100000 125000 150000 175000

0 50000 100000 150000 200000

Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000

200000 250000

300000 350000

(89)

Legends

gg + theme(legend.position="bottom")+

guides(color=guide_legend(nrow=2), size=guide_legend(nrow=2))

75000 100000 125000 150000 175000

0 50000 100000 150000 200000

Land.Value

Structure.Cost

region Midwest N. East

South West

NA Home.Value 100000 150000

200000 250000

300000 350000

(90)

Drawing maps with maps package and ggplot2

Available Maps

Name Description county American counties france France

italy Italy

nz New-Zealand

state United States with all states usa United States

world World Map

world2 World Map centered on Pacific

(91)

Worldmap

world<-map_data("world")

world_map <- ggplot(world) + aes(x = long, y = lat, group = group)+geom_polygon()+

scale_y_continuous(breaks = (-3:3) * 20)+

scale_x_continuous(breaks = (-9:9) * 20) world_map+coord_equal()

−60

−40

−20 0 20 40 60

−180−160−140−120−100 −80 −60 −40 −20 0 20 40 60 80 100 120 140 160 180 long

lat

(92)

Change of Map projection

library(mapproj)

world_map+coord_map(projection = "orthographic")

200 40 60

long

lat

(93)

Change of Map projection

world_map+coord_map(projection = "orthographic", orientation=c(40,50,0))

40 60

long

lat

(94)

France Map

france<-map_data("france")

france_map<-ggplot(france)+aes(x=long,y=lat,group=group) france_map+geom_polygon()

42.5 45.0 47.5 50.0

−5 0 5 10

long

lat

(95)

France Map

france_map+geom_polygon(aes(fill=region))+

scale_fill_discrete(guide="none")

42.5 45.0 47.5 50.0

−5 0 5 10

long

lat

(96)

One part

centre<-france[france$region %in% c("Cher","Eure-et-Loir",

"Indre","Indre-et-Loire","Loir-et-Cher","Loiret" ),]

centre_map<-ggplot(centre)+aes(x=long,y=lat,group=group)+

geom_polygon(aes(fill=region)) centre_map

47 48 49

0 1 2 3

long

lat

region Cher Eure−et−Loir Indre Indre−et−Loire Loir−et−Cher Loiret

(97)

Choropleth Maps

infos_centre<-data.frame(region=c("Cher","Eure-et-Loir",

"Indre","Indre-et-Loire","Loir-et-Cher","Loiret" ), densite=c(42,74,33,99,52,100)) centre<-merge(centre,infos_centre,by="region",all=TRUE) centre_map<-ggplot(centre)+aes(x=long,y=lat,group=group)+

geom_polygon(aes(fill=densite))+

scale_fill_gradient(low="green",high="blue")

(98)

Choropleth Maps

centre_map +theme_minimal()

47 48 49

0 1 2 3

long

lat

40 60 80 100 densite

(99)

Draw multiple plots within one figure

library(gridExtra)

p2<-p1+labs(x=NULL,y=NULL,title="Title") grid.arrange(p2,gg,nrow=1)

75000 100000 125000 150000 175000

9 10 11 12

Title

75000 100000 125000 150000 175000

0 50000 100000 150000 200000 Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(100)

Draw multiple plots within one figure

grid.arrange(p2,gg,nrow=2)

75000 100000 125000 150000 175000

9 10 11 12

Title

75000 100000 125000 150000 175000

0 50000 100000 150000 200000

Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

(101)

Draw multiple plots within one figure

library(ggpubr)

ggarrange(gg,p2,align="h")

75000 100000 125000 150000 175000

0 50000 100000 150000200000 Land.Value

Structure.Cost

region Midwest N. East South West NA

Home.Value 100000 150000 200000 250000 300000 350000

75000 100000 125000 150000 175000

9 10 11 12

Title

Link tutorial ggarrange

Link ggpubr

(102)

Packages using Ggplot2

factoextra: factorial analysis, unsupervised classification

ggRandomForests

official extensions: plotROC, ggpmisc, gganimate, ggiraph

(103)

Animation

R Package: animate

need gifski package (just install)

create a gif by default

to_animate<-ggplot(housing) +

aes(y = Structure.Cost, x = Land.Value, color=region)+geom_point()+scale_x_log10()

(104)

Animation

#need gifski package library(gganimate)

to_animate+transition_time(Year)

1e+05 2e+05 3e+05

1e+03 1e+04 1e+05

Land.Value

Structure.Cost

region Midwest N. East South West NA

(105)

Interactive Plot

ggiraph

htmlwidgets

rAmCharts

plotly

dygraph

highcharter

Références

Documents relatifs

Constructive objectivity was originally described by Brouwer in terms of the mental process dividing the present in two opposite parts, the before and the after,

The white area located at the left corner of this figure is excluded, because it corresponds to bodies with an eccentricity larger than 0.67 for the stratified case (see section 3

Additionally, if the central fi xation strategy shown by Easterners is related to holistic face processing, this eye movement strategy should not be deployed for sheep faces

A pseudo-metric d on a compactly generated locally compact group is geodesically adapted if it is equivalent to the word length d 0 with respect to some/any compact generating

Generally, preserving the geometry also implies to preserve the topology. This is the motivation for our second contribution. Indeed, we propose a new notion of

Gerontechnology 2014 ; French Chapter Symposium Session ; 13(2):xxx; doi:10.4017/gt.2014.13.02.xxx.xx Purpose: The help provided by a mobile manipulator robot to an individual can

• Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes..

A new efficient and simple method was presented to solve the problem of de- lineating the centerline of 3D tubular shapes, for various types of input data approximating its