A
DVANCEDG
RAPHICS Julie SchollerM Éc E n
Welcome to the tidyverse
• coherent system of packages that work in harmony
• for Data Science
• tidyverse.org
Figure 1: Packages de tidyverse
Graphics with ggplot2
Why?
• elegant, polyvalent
• mature and complete graphics system
• very flexible
• default behaviour carefully chosen
• theme system for polishing plot appearance
How?
grammar of graphics(Wilkinson, 2005)
The Grammar Of Graphics
The basic idea to building plot
• specify blocks/layers
• combine them
• get any kind of graphics Blocks/layers
• data
• aesthetic mapping
• geometric object
• statistical transformations
• scales
• coordinate system
• position adjustments
• faceting
Setup: install the tidyverse package
# install.packages("ggplot2") library(ggplot2)
Or
# install.packages("tidyverse") library(tidyverse)
## -- Attaching packages --- tidyverse 1.2.1 --
## v ggplot2 3.0.0 v purrr 0.2.4
## v tibble 1.4.2 v dplyr 0.7.5
## v tidyr 0.8.1 v stringr 1.3.0
## v readr 1.1.1 v forcats 0.3.0
## -- Conflicts --- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Example Data: Housing prices
housing <- read_csv("dataSets/landdata-states.csv") class(housing)
## [1] "tbl_df" "tbl" "data.frame"
head(housing[1:5])
## # A tibble: 6 x 5
## State region Date Home.Value Structure.Cost
## <chr> <chr> <dbl> <int> <int>
## 1 AK West 2010. 224952 160599
## 2 AK West 2010. 225511 160252
## 3 AK West 2010. 225820 163791
## 4 AK West 2010. 224994 161787
## 5 AK West 2008. 234590 155400
## 6 AK West 2008. 233714 157458
Housing
str(housing)
## Classes 'tbl_df', 'tbl' and 'data.frame': 7803 obs. of 11 variables:
## $ State : chr "AK" "AK" "AK" "AK" ...
## $ region : chr "West" "West" "West" "West" ...
## $ Date : num 2010 2010 2010 2010 2008 ...
## $ Home.Value : int 224952 225511 225820 224994 234590 233714 232999 232164 231039 229395 ...
## $ Structure.Cost : int 160599 160252 163791 161787 155400 157458 160092 162704 164739 165424 ...
## $ Land.Value : int 64352 65259 62029 63207 79190 76256 72906 69460 66299 63971 ...
## $ Land.Share..Pct.: num 28.6 28.9 27.5 28.1 33.8 32.6 31.3 29.9 28.7 27.9 ...
## $ Home.Price.Index: num 1.48 1.48 1.49 1.48 1.54 ...
## $ Land.Price.Index: num 1.55 1.58 1.49 1.52 1.88 ...
## $ Year : int 2010 2010 2009 2009 2007 2008 2008 2008 2008 2009 ...
## $ Qrtr : int 1 2 3 4 4 1 2 3 4 1 ...
## - attr(*, "spec")=List of 2
## ..$ cols :List of 11
## .. ..$ State : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ region : list()
## .. .. ..- attr(*, "class")= chr "collector_character" "collector"
## .. ..$ Date : list()
## .. .. ..- attr(*, "class")= chr "collector_double" "collector"
## .. ..$ Home.Value : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ Structure.Cost : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ Land.Value : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ Land.Share..Pct.: list()
## .. .. ..- attr(*, "class")= chr "collector_double" "collector"
## .. ..$ Home.Price.Index: list()
## .. .. ..- attr(*, "class")= chr "collector_double" "collector"
## .. ..$ Land.Price.Index: list()
## .. .. ..- attr(*, "class")= chr "collector_double" "collector"
## .. ..$ Year : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## .. ..$ Qrtr : list()
## .. .. ..- attr(*, "class")= chr "collector_integer" "collector"
## ..$ default: list()
## .. ..- attr(*, "class")= chr "collector_guess" "collector"
## ..- attr(*, "class")= chr "col_spec"
ggplot2 VS Base for simple graphs
hist(housing$Home.Value)
Histogram of housing$Home.Value
housing$Home.Value
Frequency
0e+00 2e+05 4e+05 6e+05 8e+05
01000
ggplot2 VS Base for simple graphs
ggplot(housing) + aes(x = Home.Value) + geom_histogram()
0 500 1000 1500
0 250000 500000 750000
Home.Value
count
ggplot2 VS Base graphics - round 2
plot(Home.Value ~ Date, col = factor(State),
data = filter(housing, State %in% c("MA", "TX"))) legend("topleft", legend = c("MA", "TX"),
col = c("black", "red"), pch = 1)
1980 1990 2000 2010
1e+054e+05
Date
Home.Value
MA TX
ggplot2 VS Base graphics - round 2
ggplot(filter(housing, State %in% c("MA", "TX")))+ aes(x=Date, y=Home.Value, color=State)+ geom_point()
1e+05 2e+05 3e+05 4e+05
1980 1990 2000 2010
Date
Home.Value State
MA TX
Syntax
ggplot(data=...) + aes(x=..., y=...) + geom_...()
• Data: what is being visualized
• Aesthrtic Mappings: mappings between variables in the data and components of the chart
• Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes
First try
ggplot(housing)
Aesthetic Mapping
In ggplot: aesthetic= “something you can see”
Examples
• position (on the x and y axes)
• color (“outside” color)
• fill (“inside” color)
• shape (of points)
• linetype
• size
Aesthetic mappings are set with the aes()function.
Second try
ggplot(housing) + aes(x = Land.Value,y = Structure.Cost)
1e+05 2e+05 3e+05
0e+00 2e+05 4e+05 6e+05
Land.Value
Structure.Cost
Geometic Objects (geom)
Examples
• points: geom_point
• lines: geom_line
• bar: geom_bar
• histogram: geom_histogram
• boxplot: geom_boxplot
List of available geometric objects Reference list
help.search("geom_", package = "ggplot2")
Points (Scatterplot)
hp2001Q1 <- filter(housing, Date == 2001.25) ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = Land.Value) + geom_point()
75000 100000 125000 150000 175000
0 50000 100000 150000 200000
Land.Value
Structure.Cost
Points (Scatterplot)
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), color=region) + geom_point()
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
region
Midwest N. East South West NA
Aesthetic Mapping VS Assignment
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value),color=region) + geom_point(aes(size=2),color="red")
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost size
2
Points (Scatterplot)
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), shape=region) + geom_point()
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
region
Midwest N. East South West NA
Points (Scatterplot)
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), color=Home.Value) + geom_point()
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
100000 150000 200000 250000 300000 350000
Home.Value
Points (Scatterplot)
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value) + geom_point()
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
Home.Value
100000 150000 200000 250000 300000 350000
Points (Scatterplot)
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value,color=region) + geom_point()
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Points (Scatterplot)
ggplot(hp2001Q1)+
aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value,shape=region,
color=Home.Price.Index) + geom_point()
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
1.04 1.06 1.08 1.10 Home.Price.Index
Faceting
• Creates separate graphs for subsets of data
• Two solutions
1. facet_wrap(): subsets as the levels of a single grouping variable
2. facet_grid(): subsets as the crossing of two grouping variables
• Facilitates comparison among plots
Faceting
ggplot(filter(housing, Date == 2001.25) ) +
aes(y = Structure.Cost, x = log(Land.Value),
size=Home.Value,color=region) + geom_point() + facet_wrap(~ region,ncol=3)
West NA
Midwest N. East South
9 10 11 12 9 10 11 12
9 10 11 12 75000
100000 125000 150000 175000
75000 100000 125000 150000 175000
log(Land.Value)
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Faceting
ggplot(filter(housing, Date == 2000.25| Date == 2008.25) ) + aes(y = Structure.Cost, x = log(Land.Value),
size=Home.Value,color=region) + geom_point() + facet_grid(Date~ region)
Midwest N. East South West NA
2000.252008.25
9 10111213 9 10111213 9 10111213 9 10111213 9 10111213 100000
150000 200000 250000
100000 150000 200000 250000
log(Land.Value)
Structure.Cost
Home.Value 2e+05 4e+05 6e+05 8e+05
region Midwest N. East South West NA
Faceting
ggplot(filter(housing, Date == 2001.25) ) +
aes(y = Structure.Cost, x = log(Land.Value),
size=Home.Value,color=region) + geom_point() + facet_grid(~ region)
Midwest N. East South West NA
9 10 11 12 9 10 11 12 9 10 11 12 9 10 11 12 9 10 11 12 75000
100000 125000 150000 175000
log(Land.Value)
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Syntax
ggplot(data=...) + aes(x=..., y=...) + geom_...() + facet_...(...)
• Data: what is being visualized
• Aesthetic Mappings: mappings between variables in the data and components of the chart
• Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes
• Facets: describe how the data is partitioned into subsets and how these different subsets are plotted
Adding geom - Local/Global aesthetics
hp2001Q1$pred.SC <- predict(
lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1)) p1<- ggplot(hp2001Q1) +
aes(x = log(Land.Value), y = Structure.Cost) p1 + geom_point(aes(color = Home.Value)) +
geom_line(aes(y = pred.SC))
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
100000 150000 200000 250000 300000 350000
Home.Value
Smoothers
p1 + geom_point(aes(color = Home.Value)) + geom_smooth(se=FALSE)
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
100000 150000 200000 250000 300000 350000
Home.Value
Smoothers
p1 + geom_point(aes(color = Home.Value)) + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
100000 150000
9 10 11 12
log(Land.Value)
Structure.Cost
100000 150000 200000 250000 300000 350000 Home.Value
Syntax
ggplot(data=...) + aes(x=..., y=...) + geom_...() + facet_...(...)
• Data: what is being visualized
• Aesthetic Mappings: mappings between variables in the data and components of the chart
• Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes
• Facets: describe how the data is partitioned into subsets and how these different subsets are plotted
Statistical Transformations
Some plot types, such as boxplots, histograms, prediction lines etc.
require statistical transformations
• for a boxplot the y values must be transformed to the quantiles
• for a histogram the y values must be transformed into headcounts
Eachgeom has a default statistic.
Setting Statistical Transformation Arguments
Arguments tostat_functions can be passed through geom_
functions.
p2 <- ggplot(housing, aes(x = Home.Value)) p2 + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
0 500 1000 1500
0 250000 500000 750000
Home.Value
count
Setting Statistical Transformation Arguments
We can change it by passing the binwidthargument to the stat_binfunction:
p2 + geom_histogram(stat = "bin", binwidth=4000)
0 50 100 150 200 250
0 250000 500000 750000
Home.Value
count
Changing The Statistical Transformation
housing.sum <- aggregate(housing["Home.Value"], housing["State"], FUN=mean) ggplot(housing.sum) + aes(x=State,y=Home.Value) +
geom_bar(stat="identity")
0e+00 1e+05 2e+05 3e+05
AKALARAZCACOCTDCDEFLGAHIIAIDILINKSKYLAMAMDMEMIMNMOMSMTNCNDNENHNJNMNVNYOHOKORPARISCSDTNTXUTVAVTWAWIWVWY State
Home.Value
Bar charts
ggplot(cars)+aes(x=cyl)+geom_bar()
0 5 10
4 6 8
cyl
count
Bar charts
ggplot(cars)+aes(x=cyl,color=transmission)+geom_bar()
0 5 10
4 6 8
cyl
count
transmission auto manual
Bar charts
ggplot(cars)+aes(x=cyl,color=transmission, fill=transmission)+geom_bar()
0 5 10
4 6 8
cyl
count
transmission
auto manual
Bar charts
ggplot(cars)+aes(x=cyl,color=transmission, fill=transmission)+geom_bar(position="fill")
0.00 0.25 0.50 0.75 1.00
4 6 8
cyl
count
transmission
auto manual
Bar charts
ggplot(cars)+aes(x=cyl,color=transmission,
fill=transmission)+geom_bar(position="dodge")
0.0 2.5 5.0 7.5 10.0 12.5
4 6 8
cyl
count
transmission auto manual
Position adjustement
Insidegeom
• identity
• stack
• fill
• dodge: side by side
• jitter: useful for points (geom_jitter())
• nudge: shift points
Syntax
ggplot(data=...) + aes(x=..., y=...) + geom_...() + facet_...(...)
• Data: what is being visualized
• Aesthetic Mappings: mappings between variables in the data and components of the chart
• Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes
• Statistical Transformations: applied to the data to summarize it
• Facets: describe how the data is partitioned into subsets and how these different subsets are plotted
Create Good and Effective Graphics
• Labels
• Annotations
• Coordinate
• Scales
• Themes
• Interactivity
Labels
All in one
+ labs(title=..., subtitle=..., caption=..., x=..., y=..., color=..., etc.)
Alternate forms
+ ggtitle=(...)+xlabs(...)+ylabs(...)+etc.
Graphic to improve
l1<-ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value,color=region) + geom_point()
Labels
l2<-l1 + labs(title="Structure Cost and Land Value", subtitle="I don't know what to say", caption="Still no idea")
l2
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000 I don't know what to say
Structure Cost and Land Value
Still no idea
Labels
l3<-l2 + labs(x="Logtransformation of Land value", y="Strusture cost")
l3
75000 100000 125000 150000 175000
9 10 11 12
Logtransformation of Land value
Strusture cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000 I don't know what to say
Structure Cost and Land Value
Still no idea
Labels
l3+ labs(color="Region",size="Home value")
75000 100000 125000 150000 175000
9 10 11 12
Logtransformation of Land value
Strusture cost
Home value 100000 150000 200000 250000 300000 350000
Region Midwest N. East South West NA I don't know what to say
Structure Cost and Land Value
Still no idea
Annotations
Eachgeom accepts a particular set of mappings; for example geom_text()accepts a labelsmapping.
p1 + geom_point() +
geom_text(aes(label=State), size = 3)
AK
AR AL
AZ
CA
CO CT DE
FL GA
HI
IA ID
IL
IN KSKY LA
MA
MD ME
MI MN
MO
MS MT
NC
ND NE
NH
NJ
NM
NV NY
OH
OK
OR
PA
RI
SC SD
TN TX
UT VA VT
WA
WI
WV
WY
DC 75000
100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
Annotations
## install.packages("ggrepel") library("ggrepel")
p1 + geom_point() +
geom_text_repel(aes(label=State), size = 3)
AK
AL AR
AZ
CA CO DE CT
FL GA
HI
IA
ID IL
IN KS
KY LA
MA
MD ME
MI MN
MO
MS
MT NC
ND NE
NH
NJ
NM
NV NY
OH
OK
OR
PA
RI
SC SD
TN
TX
UT VA VT
WA
WI
WV
WY
DC 75000
100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
Annotations
l3 + geom_text_repel(aes(label=State), size = 3)
AK
AR AL
AZ
CA
CO DE CT
FL GA
HI
IA ID
IL
IN KSKY LA
MA
ME MD MI
MN
MO
MS
MT NC
ND NE
NH NJ
NM
NV NY
OH
OK
OR
PA RI
SC SD TN
TX
UT VA VT
WA
WI
WV
WY
DC 75000
100000 125000 150000 175000
9 10 11 12
Logtransformation of Land value
Strusture cost
region a a a a a
Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000 I don't know what to say
Structure Cost and Land Value
Still no idea
Scales: Controlling Aesthetic Mapping
Aesthetic mapping do not say how.
Describing what colors/shapes/sizes etc. to use is done by modifying the correspondingscale.
Scale examples
• x, y
• color and fill
• size
• shape
• line type
Scale syntax
scale_<aesthetic>_<type>
Some available Scales
Scale Types Examples
scale_color_ identity scale_fill_continuous scale_fill_ manual scale_color_discrete scale_size_ continuous scale_size_manual
discrete scale_size_discrete
scale_shape_ discrete scale_shape_discrete scale_linetype_ identity scale_shape_manual
manual scale_linetype_discrete
Some available Scales
Scale Types Examples
scale_x_ continuous scale_x_continuous scale_y_ discrete scale_y_discrete
reverse scale_x_log log scale_y_reverse date scale_x_date datetime scale_y_datetime
Scale Modification Examples - color
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), color=region) + geom_point()+
scale_color_manual(name = "Region of the world", values = c("#24576D","#099DD7","#28AADC",
"#248E84","#F2583F","#96503F","white"))
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
Region of the world Midwest N. East South West NA
Scale Modification Examples - color
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), color=region) + geom_point()+
scale_color_brewer(name = "Region of the world", palette="Dark2")
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
Region of the world Midwest N. East South West NA
Scale Modification Examples - color
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), color=Home.Value) + geom_point()+
scale_color_continuous(breaks = c(100000,200000, 300000,400000))
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
1e+05 2e+05 3e+05 Home.Value
Scale Modification Examples - color
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), color=Home.Value) + geom_point()+
scale_color_gradient(breaks = c(100000,200000,300000, 400000),low="blue",high="red")
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
1e+05 2e+05 3e+05 Home.Value
Scale Modification Examples - color
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), color=Home.Value) + geom_point()+
scale_color_gradient2(breaks = c(100000,200000,300000, 400000),low="blue",high="red",mid="green",midpoint=200000)
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
1e+05 2e+05 3e+05 Home.Value
Scale Modification Examples - shape
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), shape=region) + geom_point()+
scale_shape_manual(values=c(4,8,11,10,43))
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
region
Midwest N. East South West NA
Scale Modification Examples - size
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), size=Home.Value) + geom_point()
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
Home.Value
100000 150000 200000 250000 300000 350000
lims()
• lims(...)
• xlim(...)
• ylim(...)
Coordinates
housing.sum <- aggregate(housing["Home.Value"], housing["State"], FUN=mean) ggplot(housing.sum) + aes(x=reorder(State,Home.Value),
y=Home.Value) + geom_bar(stat="identity") + coord_flip()
WVMSOKNDNEARSDKSKYALLATXINIA MOOHMNMENMSCGANCMTTNWIPAMIID WYORCOWAMDMANHAKDENVNYCADCVTAZUTVANJCTFLRIHIIL
0e+00 1e+05 2e+05 3e+05
Home.Value
reorder(State, Home.Value)
Guides
Themes
The ggplot2theme system handles non-data plot elements such as
• Axis labels
• Plot background
• Facet label backround
• Legend appearance Built-in themes include:
• theme_gray() (default)
• theme_bw()
• theme_classc()
Themes
g1<-ggplot(hp2001Q1)+
aes(y = Structure.Cost, x = Land.Value,
color=Home.Value,shape=region) + geom_point() + scale_x_log10() g1
75000 100000 125000 150000 175000
1e+04 1e+05
Land.Value
Structure.Cost 100000
150000 200000 250000 300000 350000 Home.Value
region Midwest N. East South West NA
Themes
g1 + theme_linedraw()
## Warning: Removed 1 rows containing missing values (geom_point).
75000 100000 125000 150000 175000
1e+04 1e+05
Land.Value
Structure.Cost 100000
150000 200000 250000 300000 350000 Home.Value
region Midwest N. East South West NA
Themes
g1 + theme_light()
## Warning: Removed 1 rows containing missing values (geom_point).
75000 100000 125000 150000 175000
1e+04 1e+05
Land.Value
Structure.Cost 100000
150000 200000 250000 300000 350000 Home.Value
region Midwest N. East South West NA
Themes
g1 + theme_minimal()
## Warning: Removed 1 rows containing missing values (geom_point).
75000 100000 125000 150000 175000
1e+04 1e+05
Land.Value
Structure.Cost 100000
150000 200000 250000 300000 350000 Home.Value
region Midwest N. East South West NA
Overriding theme defaults
g1 + theme_minimal() +
theme(text = element_text(color = "turquoise"))
## Warning: Removed 1 rows containing missing values (geom_point).
75000 100000 125000 150000 175000
1e+04 1e+05
Land.Value
Structure.Cost 100000
150000 200000 250000 300000 350000 Home.Value
region Midwest N. East South West NA
Legends
ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = log(Land.Value), color=region) + geom_point()+
theme(legend.position="bottom")
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
region Midwest N. East South West NA
Creating and saving new themes
theme_new <- theme_bw() +
theme(plot.background = element_rect(size = 1, color = "blue", fill = "gray"), text=element_text(size = 12, color = "red"), axis.text.y = element_text(colour = "purple"), axis.text.x = element_text(colour = "green"), panel.background = element_rect(fill = "pink"))
Result
g1 + theme_new
75000 100000 125000 150000 175000
1e+04 1e+05
Land.Value
Structure.Cost 100000
150000 200000 250000 300000 350000 Home.Value
region Midwest N. East South West NA
A
DVANCEDG
RAPHICS- 2
Julie Scholler
M Éc E n
Example Data: Housing prices
housing <- read_csv("dataSets/landdata-states.csv") hp2001Q1 <- filter(housing, Date == 2001.25)
p1<- ggplot(hp2001Q1) + aes(x = log(Land.Value), y = Structure.Cost) + geom_point()
p1
75000 100000 125000 150000 175000
9 10 11 12
log(Land.Value)
Structure.Cost
Faceting
ggplot(filter(housing, Date==2000.25|Date==2008.25)) + aes(y = Structure.Cost, x = log(Land.Value),
size=Home.Value,color=region) + geom_point() + facet_grid(Date~ region)
Midwest N. East South West NA
2000.252008.25
9 10111213 9 10111213 9 10111213 9 10111213 9 10111213 100000
150000 200000 250000
100000 150000 200000 250000
log(Land.Value)
Structure.Cost
Home.Value 2e+05 4e+05 6e+05 8e+05
region Midwest N. East South West NA
Syntax
ggplot(data=...) + aes(x=..., y=...,
color=...,size=...,group=...) + geom_...() + facet_...(...)
• Data: what is being visualized
• Aesthetic Mappings: mappings between variables in the data and components of the chart
• Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes
• Statistical Transformations: applied to the data to summarize it
• Facets: describe how the data is partitioned into subsets and how these different subsets are plotted
Create Good and Effective Graphics
• Labels
+ labs(title=..., subtitle=..., caption=..., x=..., y=..., color=..., etc.)
• Annotations + geom_text()
+ geom_text_repel()
• Coordinate + coord_flip()
• Scales, Guides, Themes
• Interactivity
Scales: Controlling Aesthetic Mapping
gg<- ggplot(hp2001Q1) +
aes(y = Structure.Cost, x = Land.Value, size=Home.Value,color=region) + geom_point()
gg
75000 100000 125000 150000 175000
0 50000 100000 150000 200000
Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Scales
gg+scale_x_continuous(breaks=seq(0,250000,25000))
75000 100000 125000 150000 175000
0 25000 50000 75000 100000 125000 150000 175000 200000 Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Scales
gg+scale_x_continuous(breaks=seq(0,250000,25000), minor_breaks = NULL)
75000 100000 125000 150000 175000
0 25000 50000 75000 100000 125000 150000 175000 200000 Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Scales
gg+scale_x_continuous(breaks=seq(0,250000,50000),
minor_breaks = seq(0,250000,10000))
75000 100000 125000 150000 175000
0 50000 100000 150000 200000
Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Scales
gg+scale_x_continuous(limits=c(25000,300000))
## Warning: Removed 30 rows containing missing values (geom_point).
75000 100000 125000 150000 175000
1e+05 2e+05 3e+05
Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Scales
gg+scale_x_continuous(expand=c(0.25,0.25))
75000 100000 125000 150000 175000
0 50000 100000 150000 200000 250000 Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Scales
gg+scale_x_continuous(breaks=seq(0,250000,50000), minor_breaks = NULL,trans="log")
75000 100000 125000 150000 175000
50000 100000150000200000250000 Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Guides, Themes & Legends
Change legend order
gg + guides(color = guide_legend(order = 2), size = guide_legend(order = 1))
75000 100000 125000 150000 175000
0 50000 100000 150000 200000
Land.Value
Structure.Cost
Home.Value 100000 150000 200000 250000 300000 350000
region Midwest N. East South West NA
Legends
gg + theme(legend.position="bottom")
75000 100000 125000 150000 175000
0 50000 100000 150000 200000
Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000
200000 250000
300000 350000
Legends
gg + theme(legend.position="bottom")+
guides(color=guide_legend(nrow=2), size=guide_legend(nrow=2))
75000 100000 125000 150000 175000
0 50000 100000 150000 200000
Land.Value
Structure.Cost
region Midwest N. East
South West
NA Home.Value 100000 150000
200000 250000
300000 350000
Drawing maps with maps package and ggplot2
Available Maps
Name Description county American counties france France
italy Italy
nz New-Zealand
state United States with all states usa United States
world World Map
world2 World Map centered on Pacific
Worldmap
world<-map_data("world")
world_map <- ggplot(world) + aes(x = long, y = lat, group = group)+geom_polygon()+
scale_y_continuous(breaks = (-3:3) * 20)+
scale_x_continuous(breaks = (-9:9) * 20) world_map+coord_equal()
−60
−40
−20 0 20 40 60
−180−160−140−120−100 −80 −60 −40 −20 0 20 40 60 80 100 120 140 160 180 long
lat
Change of Map projection
library(mapproj)
world_map+coord_map(projection = "orthographic")
200 40 60
long
lat
Change of Map projection
world_map+coord_map(projection = "orthographic", orientation=c(40,50,0))
40 60
long
lat
France Map
france<-map_data("france")
france_map<-ggplot(france)+aes(x=long,y=lat,group=group) france_map+geom_polygon()
42.5 45.0 47.5 50.0
−5 0 5 10
long
lat
France Map
france_map+geom_polygon(aes(fill=region))+
scale_fill_discrete(guide="none")
42.5 45.0 47.5 50.0
−5 0 5 10
long
lat
One part
centre<-france[france$region %in% c("Cher","Eure-et-Loir",
"Indre","Indre-et-Loire","Loir-et-Cher","Loiret" ),]
centre_map<-ggplot(centre)+aes(x=long,y=lat,group=group)+
geom_polygon(aes(fill=region)) centre_map
47 48 49
0 1 2 3
long
lat
region Cher Eure−et−Loir Indre Indre−et−Loire Loir−et−Cher Loiret
Choropleth Maps
infos_centre<-data.frame(region=c("Cher","Eure-et-Loir",
"Indre","Indre-et-Loire","Loir-et-Cher","Loiret" ), densite=c(42,74,33,99,52,100)) centre<-merge(centre,infos_centre,by="region",all=TRUE) centre_map<-ggplot(centre)+aes(x=long,y=lat,group=group)+
geom_polygon(aes(fill=densite))+
scale_fill_gradient(low="green",high="blue")
Choropleth Maps
centre_map +theme_minimal()
47 48 49
0 1 2 3
long
lat
40 60 80 100 densite
Draw multiple plots within one figure
library(gridExtra)
p2<-p1+labs(x=NULL,y=NULL,title="Title") grid.arrange(p2,gg,nrow=1)
75000 100000 125000 150000 175000
9 10 11 12
Title
75000 100000 125000 150000 175000
0 50000 100000 150000 200000 Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Draw multiple plots within one figure
grid.arrange(p2,gg,nrow=2)
75000 100000 125000 150000 175000
9 10 11 12
Title
75000 100000 125000 150000 175000
0 50000 100000 150000 200000
Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
Draw multiple plots within one figure
library(ggpubr)
ggarrange(gg,p2,align="h")
75000 100000 125000 150000 175000
0 50000 100000 150000200000 Land.Value
Structure.Cost
region Midwest N. East South West NA
Home.Value 100000 150000 200000 250000 300000 350000
75000 100000 125000 150000 175000
9 10 11 12
Title
• Link tutorial ggarrange
• Link ggpubr
Packages using Ggplot2
• factoextra: factorial analysis, unsupervised classification
• ggRandomForests
• official extensions: plotROC, ggpmisc, gganimate, ggiraph
Animation
• R Package: animate
• need gifski package (just install)
• create a gif by default
to_animate<-ggplot(housing) +
aes(y = Structure.Cost, x = Land.Value, color=region)+geom_point()+scale_x_log10()
Animation
#need gifski package library(gganimate)
to_animate+transition_time(Year)
1e+05 2e+05 3e+05
1e+03 1e+04 1e+05
Land.Value
Structure.Cost
region Midwest N. East South West NA
Interactive Plot
• ggiraph
• htmlwidgets
• rAmCharts
• plotly
• dygraph
• highcharter