G
RAPHIQUES AVEC GGPLOT2
L3 -R3
Julie Scholler - B246
novembre 2019
Graphics with ggplot2
Why?
• elegant, polyvalent
• mature and complete graphics system
• very flexible
• default behaviour carefully chosen
• theme system for polishing plot appearance
How?
grammar of graphics(Wilkinson, 2005)
The Grammar Of Graphics
The basic idea to building plot
• specify blocks/layers
• combine them
• get any kind of graphics Blocks/layers
• data
• aesthetic mapping
• geometric object
• statistical transformations
• scales
• coordinate system
• position adjustments
• faceting
Syntax
ggplot(data=...) + aes(x=..., y=...) + geom_...()
• Data: what is being visualized
• Aesthetic Mappings: mappings between variables in the data and components of the chart
• Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes
First try
ggplot(data)
Aesthetic Mapping
In ggplot: aesthetic= “something you can see”
Examples
• position (on the x and y axes)
• color (“outside” color)
• fill (“inside” color)
• shape (of points)
• linetype
• size
Aesthetic mappings are set with the aes()function.
Second try
ggplot(data) + aes(x = note_totale)
25 50 75 100
note_totale
Geometic Objects (geom)
Examples
• points: geom_point
• lines: geom_line
• bar: geom_bar
• histogram: geom_histogram
• boxplot: geom_boxplot
List of available geometric objects Reference list
help.search("geom_", package = "ggplot2")
Histogramm
ggplot(data) + aes(x = note_totale) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
0 5 10
25 50 75 100
note_totale
count
Create Good and Effective Graphics
• Labels
+ labs(title=..., subtitle=..., caption=..., x=..., y=..., color=..., etc.)
• Annotations + geom_text()
+ geom_text_repel()
• Coordinate + coord_flip()
• Scales, Guides, Themes
• Interactivity
ggplot(data = data) + aes(x=note_totale) + geom_histogram(bins = 15, fill="aquamarine3",
col="white") +
labs(title = "Distribution des notes au QCM", x = "Note", y = "Effectif")+
theme_minimal()
0 5 10 15 20 25
25 50 75 100
Note
Effectif
Distribution des notes au QCM
Themes
The ggplot2theme system handles non-data plot elements such as
• Axis labels
• Plot background
• Facet label backround
• Legend appearance Built-in themes include:
• theme_gray() (default)
• theme_bw()
• theme_classic()
Multivariate
ggplot(data = data) + aes(x=note_totale, fill=annee) + geom_histogram(bins = 15, col="white", alpha=0.6) + labs(title = "Distribution des notes au QCM",
x = "Note", y = "Effectif") + theme_minimal()
0 5 10 15 20 25
25 50 75 100
Note
Effectif
annee
L1 L2 L3
Distribution des notes au QCM
Faceting
• Creates separate graphs for subsets of data
• Two solutions
1. facet_wrap(): subsets as the levels of a single grouping variable
2. facet_grid(): subsets as the crossing of two grouping variables
• Facilitates comparison among plots
Syntax
ggplot(data=...) + aes(x=..., y=...,
fill=...,color=...,group=...) + geom_...() + facet_...(...) +
labs(...) + theme_minimal()
• Data: what is being visualized
• Aesthetic Mappings: mappings between variables in the data and components of the chart
• Geometric Objects: geometric objects that are used to display the data, such as points, lines, or shapes
• Statistical Transformations: applied to the data to summarize it
• Facets: describe how the data is partitioned into subsets and how these different subsets are plotted
Base histogram
gg <- ggplot(data = data) +
aes(x=note_totale, fill=annee) +
geom_histogram(bins = 15, alpha=0.6, col = "white") + labs(title = "Distribution des notes au QCM",
x = "Note", y = "Effectif") + theme_minimal()
facet_wrap()
gg + facet_wrap(~annee)
L1 L2 L3
25 50 75 100 25 50 75 100 25 50 75 100 0.0
2.5 5.0 7.5 10.0 12.5
Note
Effectif
annee
L1 L2 L3
Distribution des notes au QCM
Legend position
gg + facet_wrap(~annee) +
theme(legend.position="bottom")
L1 L2 L3
25 50 75 100 25 50 75 100 25 50 75 100
0.0 2.5 5.0 7.5 10.0 12.5
Note
Effectif
annee L1 L2 L3
Distribution des notes au QCM
Other use of facet_wrap()
gg + facet_wrap(~annee, ncol=2)
L3
L1 L2
25 50 75 100
25 50 75 100
0.0 2.5 5.0 7.5 10.0 12.5
0.0 2.5 5.0 7.5 10.0 12.5
Note
Effectif
annee
L1 L2 L3
Distribution des notes au QCM
Use of facet_grid()
gg + facet_grid(annee~sexe)
Un homme Une femme
L1L2L3
25 50 75 100 25 50 75 100
0 2 4 6 8
0 2 4 6 8
0 2 4 6 8
Note
Effectif
annee
L1 L2 L3
Distribution des notes au QCM
Density chart
ggplot(data = data) + aes(x=note_totale) +
geom_density(fill="aquamarine3", color="white", alpha = 0.6) +
labs(title = "Distribution des notes au QCM", x = "Note", y = "") + theme_minimal()
0.000 0.005 0.010 0.015 0.020 0.025
25 50 75 100
Note Distribution des notes au QCM
Density charts
ggplot(data = data) +
aes(x=note_totale, fill=annee, color=annee) + geom_density(alpha = 0.6) +
labs(title = "Distribution des notes au QCM", x = "Note", y = "") + theme_minimal()
0.00 0.01 0.02 0.03 0.04 0.05
25 50 75 100
Note
annee L1 L2 L3
Distribution des notes au QCM
With ridges lines
library(ggridges) ggplot(data = data) +
aes(x=note_totale, fill=annee, col=annee, y=annee) + geom_density_ridges(alpha = 0.6, scale = 3) +
labs(title = "Distribution des notes au QCM", x = "Note", y = "") + theme_minimal()
L1 L2 L3
50 100
Note
annee L1 L2 L3
Distribution des notes au QCM
Bar charts
ggplot(data) + aes(x=annee) + geom_bar(fill="aquamarine3") + theme_minimal()
0 20 40 60
L1 L2 L3
annee
count
Bar charts
ggplot(data) + aes(x=annee) +
geom_bar(fill="aquamarine3", width = 0.5) + theme_minimal()
0 20 40 60
L1 L2 L3
annee
count
Bar charts
ggplot(data) + aes(x=annee, fill=bac) + geom_bar(width = 0.5) + theme_minimal()
0 20 40 60
L1 L2 L3
annee
count
bac
Bac ES Bac S Bac L Bac STMG Bac professionnel
Bar charts
ggplot(data) + aes(x=annee,fill=bac) +
geom_bar(width = 0.5,position="fill") + theme_minimal()
0.00 0.25 0.50 0.75 1.00
L1 L2 L3
annee
count
bac
Bac ES Bac S Bac L Bac STMG Bac professionnel
Bar charts
ggplot(data) + aes(x=annee,fill=bac) +
geom_bar(width = 0.5, position="dodge") + theme_minimal()
0 10 20 30 40
L1 L2 L3
annee
count
bac
Bac ES Bac S Bac L Bac STMG Bac professionnel
Position adjustement
Insidegeom
• identity
• stack
• fill
• dodge: side by side
• jitter: useful for points (geom_jitter())
• nudge: shift points
Draw multiple plots within one figure
density <- ggplot(data = data) +
aes(x=note_totale, fill=annee, col=annee) + geom_density(alpha = 0.6) +
labs(title = "Notes au QCM",
subtitle = "Les L2 sont très moyens.", x = "Note", y = "")+
theme_minimal()
barplot <- ggplot(data) + aes(x=annee, fill = bac) + geom_bar(width = 0.5) +
labs(title = "Séries de baccalauréat par année de Licence", subtitle = "Les filières ES et S sont très majoritaires.", x = "Note", y = "") +
theme_minimal()
Draw multiple plots within one figure
library(ggpubr)
ggarrange(density,barplot,align="h")
0.00 0.01 0.02 0.03 0.04 0.05
25 50 75 100
Note
annee L1 L2 L3 Les L2 sont très moyens.
Notes au QCM
0 20 40 60
L1 L2 L3
Note
bac Bac ES Bac S Bac L Bac STMG Bac professionnel Les filières ES et S sont très majoritaires.
Séries de baccalauréat par année de Licence