• Aucun résultat trouvé

Data vizualisation with R (session 4)

N/A
N/A
Protected

Academic year: 2022

Partager "Data vizualisation with R (session 4)"

Copied!
8
0
0

Texte intégral

(1)

Data vizualisation with R (session 4)

Solution of the exercises 2021-09-28

Exercise 4.1

• Insert in a Markdown document the correlation matrix of the numeric variables of theirisdata.

library("kableExtra")

vs_dt <- as.data.frame(round(cor(iris[, 1:4]), 2)) vs_dt[1:4] <- lapply(vs_dt[1:4], function(x) {

cell_spec(x, bold = T, color = spec_color(abs(x), scale_from = c(-1, 1)), font_size = spec_font_size(abs(x), scale_from = c(0, 1))) })

kbl(vs_dt, escape = F, align = "c") %>% kable_classic("striped", full_width = F)

• Insert the table of results of a regression analysis of theiris data mod_full <- lm(Sepal.Length ~ ., data = iris)

stargazer::stargazer(mod_full, type = "html", title = "Regression results", header = F)

Regression results Dependent variable:

Sepal.Length Sepal.Width 0.496***

(0.086) Petal.Length 0.829***

(2)

Speciesversicolor -0.724***

(0.240)

Speciesvirginica -1.023***

(0.334) Constant 2.171***

(0.280) Observations 150

R2 0.867 Adjusted R2 0.863

Residual Std. Error 0.307 (df = 144) F Statistic

188.251*** (df = 5; 144) Note:

p<0.1;p<0.05; p<0.01

Exercise 4.2

• Find the code which permits to obtain withggplot2 this figure:

op <- par(oma = c(1, 1, 0, 1), las = 1) boxplot(Sepal.Length ~ Species, data = iris)

points(as.numeric(iris$Species) + rnorm(150, 0, 0.1), iris$Sepal.Length) points(c(1, 2, 3), tapply(iris$Sepal.Length, iris$Species, mean),

col = "red", pch = 16, cex = 2)

2

(3)

setosa versicolor virginica 4.5

5.0 5.5 6.0 6.5 7.0 7.5 8.0

Species

Sepal.Length

par(op)

library(ggplot2)

ggplot(iris, aes(x = Species, y = Sepal.Length)) + geom_boxplot()+

geom_jitter(position=position_jitter(0.2)) +

stat_summary(fun=mean, geom="point", shape=20, size=14, color="red", fill="red")

(4)

5 6 7 8

setosa versicolor virginica

Species

Sepal.Length

• Find the code in Rbase code which permits to obtain this figure:

data("diamonds") ggplot(diamonds,

aes(x = carat, y = price)) + geom_point() +

ggtitle("My scatter plot")

4

(5)

0 5000 10000 15000

0 1 2 3 4 5

carat

price

My scatter plot

par(las = 1, cex.axis = 0.8, cex.lab = 0.8)

plot(price ~ carat, data = diamonds, pch = 16, cex = 0.7, xlab = "carat", ylab= "prix", main = "Scatter plot")

abline(h = seq(0, 20000, by = 5000), v = seq(0, 4, by = 0.5), col = "lightgray", lty = "dotted")

(6)

1 2 3 4 5 0

5000 10000 15000

Scatter plot

carat

prix

Exercise 4.3

• On thelung data used previously, make a mosaic plot betweenstatusandsexvariable.

library("survival") data(lung)

## Warning in data(lung): data set 'lung' not found tab <- xtabs(~ status + sex, lung)

vcd::mosaic(tab, shade = TRUE, legend = TRUE)

−2.0 0.0 2.0 2.4 Pearson residuals:

p−value = 0.00023715 sex

status 2 1

1 2

• On thelung data, make a ridge plot of variableagewith respect tostatus.

6

(7)

library(ggridges) ggplot(lung) +

aes(x = age, y = factor(status), fill = factor(status)) + geom_density_ridges() +

theme_ridges() +

labs("Age by death/live") + theme(legend.position = "none")

## Picking joint bandwidth of 3.17

1 2

40 60 80

age

factor(status)

• Make a correlation plot of variables ph.karno,pat.karno,meal.cal,wt.lossin thelung data.

library(ggcorrplot)

r <- cor(lung[, 7:10], use = "complete.obs")

ggcorrplot(r, hc.order = TRUE, type = "lower", lab = TRUE)

(8)

−0.11 −0.14 −0.19

0.05 0.17

0.53

wt.loss meal.cal ph.karno

meal.cal ph.kar

no

pat.kar no

−1.0

−0.5 0.0 0.5 1.0

Corr

8

Références

Documents relatifs

When we apply the function library() on a package which is not installed on the machine, it causes an error and if the instruction is included in a file that we source (with

We can see that the computational time is nearly identical to the

To aggregate the data,

Install the following packages and load them: install.packagesc"broom", "gapminder", "ggcorrplot", "ggpol", "gg "plotly", "RColorBrewer", "shiny", "stargazer", require"gapminder" #

In this study a response model for target selection in direct marketing with data mining techniques (with R) was constructed for one of the Iranian banks. The purpose of this model

[r]

◊ remarque : un repérage complet de M dans l'espace nécessite deux autres coordonnées, mais ceci est équivalent à une description de la courbe parcourue (il faut pour cela

◊ remarque : l'inductance est plus grande avec le noyau de fer feuilleté (aimantation induite) ; elle est un peu plus faible avec un noyau de métal non feuilleté (champ magnétique