TD Modèle Linéaire en Petite Dimension - Corrigé Etienne Birmelé 3 novembre 2019


Academic year: 2022

TD Modèle Linéaire en Petite Dimension - Corrigé

Etienne Birmelé 3 novembre 2019

Exercice 1


data(iris) str(iris)

## 'data.frame': 150 obs. of 5 variables:

## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...

## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...

## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...

## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...

## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...


res <- glm(Petal.Length~.,family="gaussian",iris) summary(res)

#### Call:

## glm(formula = Petal.Length ~ ., family = "gaussian", data = iris)

#### Deviance Residuals:

## Min 1Q Median 3Q Max

## -0.78396 -0.15708 0.00193 0.14730 0.65418

#### Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -1.11099 0.26987 -4.117 6.45e-05 ***

## Sepal.Length 0.60801 0.05024 12.101 < 2e-16 ***

## Sepal.Width -0.18052 0.08036 -2.246 0.0262 *

## Petal.Width 0.60222 0.12144 4.959 1.97e-06 ***

## Speciesversicolor 1.46337 0.17345 8.437 3.14e-14 ***

## Speciesvirginica 1.97422 0.24480 8.065 2.60e-13 ***

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for gaussian family taken to be 0.06902558)

#### Null deviance: 464.3254 on 149 degrees of freedom

## Residual deviance: 9.9397 on 144 degrees of freedom

## AIC: 32.567

#### Number of Fisher Scoring iterations: 2


Toutes les variables sont retenues comme significatives par le modèle. Cependant, la variable principale semble être la longueur de la sépale, suivie de l’espèce d’iris considérée.

A noter que la variablesSpeciesapparait deux fois. En effet, il s’agit d’une variable discrète à trois états. Il faut donc la modéliser à l’aide de deux coeeficient dans le vecteurβ. En effet, un des états (icisetosa) est choisi comme base, etSpeciesest remplacé par deux indicatrices (versicoloretvirginica). Le coefficient liés à ces variables indiquent alors la différence de valeur moyenne quand on passe desetosaà l’état correspondant.


iris$isversicolor <- iris$Species=='versicolor'

res <- glm(isversicolor~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,family="binomial",iris) summary(res)

#### Call:

## glm(formula = isversicolor ~ Sepal.Length + Sepal.Width + Petal.Length +

## Petal.Width, family = "binomial", data = iris)

#### Deviance Residuals:

## Min 1Q Median 3Q Max

## -2.1280 -0.7668 -0.3818 0.7866 2.1202

#### Coefficients:

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) 7.3785 2.4993 2.952 0.003155 **

## Sepal.Length -0.2454 0.6496 -0.378 0.705634

## Sepal.Width -2.7966 0.7835 -3.569 0.000358 ***

## Petal.Length 1.3136 0.6838 1.921 0.054713 .

## Petal.Width -2.7783 1.1731 -2.368 0.017868 *

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for binomial family taken to be 1)

#### Null deviance: 190.95 on 149 degrees of freedom

## Residual deviance: 145.07 on 145 degrees of freedom

## AIC: 155.07

#### Number of Fisher Scoring iterations: 5

La meilleure variable pour prédire si une fleur est une versicolor est la largeur de la sépale.

Exercice 2





res <- glm(Ozone~.,family="gaussian",airquality) summary(res)

#### Call:

## glm(formula = Ozone ~ ., family = "gaussian", data = airquality)

#### Deviance Residuals:

## Min 1Q Median 3Q Max

## -37.014 -12.284 -3.302 8.454 95.348

#### Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -64.11632 23.48249 -2.730 0.00742 **

## Solar.R 0.05027 0.02342 2.147 0.03411 *

## Wind -3.31844 0.64451 -5.149 1.23e-06 ***

## Temp 1.89579 0.27389 6.922 3.66e-10 ***

## Month -3.03996 1.51346 -2.009 0.04714 *

## Day 0.27388 0.22967 1.192 0.23576

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for gaussian family taken to be 435.0755)

#### Null deviance: 121802 on 110 degrees of freedom

## Residual deviance: 45683 on 105 degrees of freedom

## (42 observations deleted due to missingness)

## AIC: 997.22

#### Number of Fisher Scoring iterations: 2

Les deux coefficient les plus significatifs sont un coefficient positif associé à la température et un coefficient négatif associé au vent, ce qui correspond bien à nos à priori.


Le coefficient du mois dans le modèle précédent est faiblement significatif, mais négatif. On trace le boxplot pour voir l’évolution, qui apparaît clairement non linéaire. Le signe négatif paraît cependant surprenant.



5 6 7 8 9

0 50 100 150


Oz one

airqualitybis <- airquality[airquality$Month<8,]

res <- glm(Ozone~.,family="gaussian",airqualitybis) summary(res)

#### Call:

## glm(formula = Ozone ~ ., family = "gaussian", data = airqualitybis)

#### Deviance Residuals:

## Min 1Q Median 3Q Max

## -32.408 -16.375 -1.268 11.440 65.525

#### Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -60.87857 28.37084 -2.146 0.03649 *

## Solar.R 0.03958 0.03125 1.266 0.21091

## Wind -2.77841 0.83194 -3.340 0.00154 **

## Temp 1.81846 0.52299 3.477 0.00102 **

## Month -3.11778 5.12936 -0.608 0.54590

## Day 0.17306 0.31867 0.543 0.58936

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for gaussian family taken to be 426.3735)

#### Null deviance: 56264 on 58 degrees of freedom

## Residual deviance: 22598 on 53 degrees of freedom

## (33 observations deleted due to missingness)

## AIC: 532.37

#### Number of Fisher Scoring iterations: 2

Si on prend en compte toutes les variables, les conclusions sont les mêmes que quand on regardait l’ensemble


de données.

res <- glm(Ozone~Month,family="gaussian",airqualitybis) summary(res)

#### Call:

## glm(formula = Ozone ~ Month, family = "gaussian", data = airqualitybis)

#### Deviance Residuals:

## Min 1Q Median 3Q Max

## -50.357 -15.857 -3.857 12.143 93.143

#### Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -66.893 22.222 -3.010 0.00384 **

## Month 17.750 3.661 4.849 9.41e-06 ***

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for gaussian family taken to be 696.8018)

#### Null deviance: 57495 on 60 degrees of freedom

## Residual deviance: 41111 on 59 degrees of freedom

## (31 observations deleted due to missingness)

## AIC: 576.41

#### Number of Fisher Scoring iterations: 2

Si on se restreint au mois, on obtient un coefficient très significatif, mais cette fois-ci positif.

Cette fluctuation en fonction de la présence ou non du vent et de la température laisse penser que la variable mois est sans doute trsè fortement corrélée à l’une ou l’autre ces variables, ce qui rend l’interprétation des coefficients difficiles.

Cette hypothèse est bien confirmée, notamment pour la température, quand on fait un test de corrélation.


#### Pearson's product-moment correlation

#### data: airqualitybis$Wind and airqualitybis$Month

## t = -3.0713, df = 90, p-value = 0.002818

## alternative hypothesis: true correlation is not equal to 0

## 95 percent confidence interval:

## -0.4823942 -0.1101398

## sample estimates:

## cor

## -0.308009


#### Pearson's product-moment correlation

#### data: airqualitybis$Temp and airqualitybis$Month

## t = 11.397, df = 90, p-value < 2.2e-16


## alternative hypothesis: true correlation is not equal to 0

## 95 percent confidence interval:

## 0.6691003 0.8410128

## sample estimates:

## cor

## 0.7685878

Le signe négatif de la variableMonth dans le cas complet peut sans doute s’expliquer par le fait qu’il ne s’agit que d’une correction de la pente donnée parTemp.

Exercice 3


datageneration <- function(covar,varsize,samplesize){

Sigma1 <- matrix(covar,varsize,varsize) diag(Sigma1) <- 1

Sigma <- rbind(cbind(Sigma1,matrix(0,varsize,varsize)),cbind(matrix(0,varsize,varsize),Sigma1)) explicativedata <- mvrnorm(n=samplesize,mu=rep(0,dim(Sigma)[1]),Sigma)

data <- cbind(explicativedata, explicativedata[,1]+explicativedata[,1+varsize]+rnorm(dim(explicativedata)[1],0,.5)) colnames(data) <- paste('X',1:dim(data)[2],sep="")

colnames(data)[1+2*varsize] <- 'Y' return(data)


n >> p, ρ=.9

datasimul <- data.frame(datageneration(.9,5,60)) res <- glm(Y~.,family="gaussian",datasimul) summary(res)

#### Call:

## glm(formula = Y ~ ., family = "gaussian", data = datasimul)

#### Deviance Residuals:

## Min 1Q Median 3Q Max

## -0.94425 -0.31212 0.03237 0.34696 0.86810

#### Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -0.08573 0.06344 -1.351 0.1828

## X1 0.76765 0.14621 5.250 3.26e-06 ***

## X2 0.25974 0.18702 1.389 0.1712

## X3 -0.39806 0.18167 -2.191 0.0332 *

## X4 0.11742 0.16500 0.712 0.4801

## X5 0.22131 0.19905 1.112 0.2716

## X6 0.80283 0.17662 4.546 3.60e-05 ***

## X7 0.04956 0.18834 0.263 0.7936

## X8 0.09025 0.16942 0.533 0.5966

## X9 0.01049 0.16218 0.065 0.9487


## X10 -0.01229 0.16224 -0.076 0.9399

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for gaussian family taken to be 0.2166777)

#### Null deviance: 140.167 on 59 degrees of freedom

## Residual deviance: 10.617 on 49 degrees of freedom

## AIC: 90.361

#### Number of Fisher Scoring iterations: 2

On retrouve les bonnes variables sélectionnées.

n >> p, ρ=.1

datasimul <- data.frame(datageneration(.1,5,60)) res <- glm(Y~.,family="gaussian",datasimul) summary(res)

#### Call:

## glm(formula = Y ~ ., family = "gaussian", data = datasimul)

#### Deviance Residuals:

## Min 1Q Median 3Q Max

## -1.58604 -0.41408 -0.01142 0.29805 1.50105

#### Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 0.08160 0.08105 1.007 0.319

## X1 0.95782 0.07972 12.014 3.24e-16 ***

## X2 -0.08108 0.08362 -0.970 0.337

## X3 -0.16017 0.09731 -1.646 0.106

## X4 0.11857 0.08752 1.355 0.182

## X5 -0.09164 0.08732 -1.049 0.299

## X6 1.05154 0.07662 13.724 < 2e-16 ***

## X7 0.06634 0.08197 0.809 0.422

## X8 -0.11561 0.07858 -1.471 0.148

## X9 0.04281 0.08741 0.490 0.626

## X10 0.08651 0.08420 1.027 0.309

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for gaussian family taken to be 0.3560447)

#### Null deviance: 160.172 on 59 degrees of freedom

## Residual deviance: 17.446 on 49 degrees of freedom

## AIC: 120.16

#### Number of Fisher Scoring iterations: 2

n >2p, ρ=.9


datasimul <- data.frame(datageneration(.9,5,15)) res <- glm(Y~.,family="gaussian",datasimul) summary(res)

#### Call:

## glm(formula = Y ~ ., family = "gaussian", data = datasimul)

#### Deviance Residuals:

## 1 2 3 4 5 6 7

## -0.37740 -0.09641 -0.12441 0.15308 0.21249 0.06568 0.02938

## 8 9 10 11 12 13 14

## 0.14589 0.03028 -0.08282 -0.08694 0.18784 0.27241 -0.11127

## 15

## -0.21780

#### Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) -0.43240 0.13814 -3.130 0.0352 *

## X1 2.20120 0.76053 2.894 0.0444 *

## X2 -1.58785 0.63307 -2.508 0.0662 .

## X3 -0.69090 0.55497 -1.245 0.2811

## X4 1.75717 0.57598 3.051 0.0380 *

## X5 -1.12627 0.31151 -3.616 0.0224 *

## X6 1.23843 0.27884 4.441 0.0113 *

## X7 -0.07511 0.31701 -0.237 0.8243

## X8 0.95272 0.36423 2.616 0.0591 .

## X9 0.17572 0.38180 0.460 0.6692

## X10 -1.07969 0.45854 -2.355 0.0781 .

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for gaussian family taken to be 0.1117228)

#### Null deviance: 50.67978 on 14 degrees of freedom

## Residual deviance: 0.44689 on 4 degrees of freedom

## AIC: 13.866

#### Number of Fisher Scoring iterations: 2

On retrouve parfois les bonnes variables (il faut essayer plusieurs fois), mais pas systématiquement, et pour ρ=.9 on sélectionne parfois une des mauvaises variables en raison de la forte colinéarité.

n >2p, ρ=.1

datasimul <- data.frame(datageneration(.1,5,12)) res <- glm(Y~.,family="gaussian",datasimul) summary(res)

#### Call:

## glm(formula = Y ~ ., family = "gaussian", data = datasimul)



## Deviance Residuals:

## 1 2 3 4 5 6

## 0.070403 -0.032067 -0.168542 -0.088037 0.218725 -0.081525

## 7 8 9 10 11 12

## 0.135581 -0.129094 0.107177 -0.053054 0.027337 -0.006904

#### Coefficients:

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 0.2525 0.2793 0.904 0.5321

## X1 1.0497 0.1558 6.736 0.0938 .

## X2 -0.4786 0.4669 -1.025 0.4922

## X3 0.4359 0.2329 1.872 0.3124

## X4 0.6403 0.6218 1.030 0.4907

## X5 0.2227 0.2211 1.007 0.4977

## X6 1.0430 0.2260 4.615 0.1358

## X7 -0.2052 0.7076 -0.290 0.8203

## X8 -0.2277 0.1395 -1.632 0.3500

## X9 0.1462 0.4279 0.342 0.7904

## X10 -0.1889 0.7148 -0.264 0.8356

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for gaussian family taken to be 0.1467726)

#### Null deviance: 23.56510 on 11 degrees of freedom

## Residual deviance: 0.14677 on 1 degrees of freedom

## AIC: 5.2092

#### Number of Fisher Scoring iterations: 2

n >> p, ρ=.1

datasimul <- data.frame(datageneration(.1,5,7)) res <- glm(Y~.,family="gaussian",datasimul) summary(res)

#### Call:

## glm(formula = Y ~ ., family = "gaussian", data = datasimul)

#### Deviance Residuals:

## [1] 0 0 0 0 0 0 0

#### Coefficients: (4 not defined because of singularities)

## Estimate Std. Error t value Pr(>|t|)

## (Intercept) 0.004367 NA NA NA

## X1 1.207175 NA NA NA

## X2 0.133926 NA NA NA

## X3 -0.154513 NA NA NA

## X4 -0.269238 NA NA NA

## X5 -0.283005 NA NA NA

## X6 1.003550 NA NA NA





## X10 NA NA NA NA

#### (Dispersion parameter for gaussian family taken to be NaN)

#### Null deviance: 8.8645e+00 on 6 degrees of freedom

## Residual deviance: 3.6053e-31 on 0 degrees of freedom

## AIC: -468.44

#### Number of Fisher Scoring iterations: 1

Le nombre d’observations est plus faible que le nombre de paramètres: l’algorithme ne peut pas converger.

Exercice 4


Six=x0+δk, où seule lakeme composante deδ est non nulle, log(OR(x, x0))] =βkδ

On peut donc interprétereβk comme la variation d’OR lorsque Xk augmente de 1.

Siβk= 2.3,e2.3= 9.97. La côte est donc multipliée par 10 siXk augmente de 1.


Pour une maladie rare, on peut approximer 1−ppar 1 et l’OR revient à p(xp(x)0). L’interprétation est donc plus simple puisqu’il s’agit directement du rapport de probabilités.


library('mlbench') data("BreastCancer")

BC <- BreastCancer[rowSums(is.na(BreastCancer))==0,]

BC[,2:10] <- sapply(BC[,2:10],as.numeric) #etape necessaire pour que les variables soient onsiderees comme numeriques test <- rbinom(dim(BC)[1],1,1/3)

BCtest <- BC[test==1,]

BClearn <- BC[test==0,]


res <- glm(Class~.-Id,family=binomial(link="logit"),BClearn) summary(res)

#### Call:

## glm(formula = Class ~ . - Id, family = binomial(link = "logit"),

## data = BClearn)


#### Deviance Residuals:

## Min 1Q Median 3Q Max

## -3.4570 -0.1161 -0.0612 0.0235 2.4125

#### Coefficients:

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -10.263537 1.494087 -6.869 6.45e-12 ***

## Cl.thickness 0.527400 0.177801 2.966 0.003015 **

## Cell.size -0.008991 0.301377 -0.030 0.976200

## Cell.shape 0.287621 0.328226 0.876 0.380872

## Marg.adhesion 0.244177 0.171364 1.425 0.154185

## Epith.c.size 0.146785 0.207177 0.709 0.478633

## Bare.nuclei 0.460176 0.130269 3.533 0.000412 ***

## Bl.cromatin 0.487308 0.211643 2.303 0.021307 *

## Normal.nucleoli 0.121612 0.148921 0.817 0.414147

## Mitoses 0.744702 0.380280 1.958 0.050195 .

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for binomial family taken to be 1)

#### Null deviance: 583.664 on 448 degrees of freedom

## Residual deviance: 67.228 on 439 degrees of freedom

## AIC: 87.228

#### Number of Fisher Scoring iterations: 8


steppart du modèle complet et supprime des variables non significatives tant qu’il en trouve (il supprime à chaque fois la variable dont la suppression diminue le plus l’AIC)

stepres <- step(res)

## Start: AIC=87.23

## Class ~ (Id + Cl.thickness + Cell.size + Cell.shape + Marg.adhesion +

## Epith.c.size + Bare.nuclei + Bl.cromatin + Normal.nucleoli +

## Mitoses) - Id

#### Df Deviance AIC

## - Cell.size 1 67.229 85.229

## - Epith.c.size 1 67.720 85.720

## - Normal.nucleoli 1 67.924 85.924

## - Cell.shape 1 67.974 85.974

## <none> 67.228 87.228

## - Marg.adhesion 1 69.364 87.364

## - Mitoses 1 71.287 89.287

## - Bl.cromatin 1 72.983 90.983

## - Cl.thickness 1 77.796 95.796

## - Bare.nuclei 1 83.550 101.550

#### Step: AIC=85.23

## Class ~ Cl.thickness + Cell.shape + Marg.adhesion + Epith.c.size +


## Bare.nuclei + Bl.cromatin + Normal.nucleoli + Mitoses

#### Df Deviance AIC

## - Epith.c.size 1 67.727 83.727

## - Normal.nucleoli 1 67.928 83.928

## - Cell.shape 1 68.871 84.871

## <none> 67.229 85.229

## - Marg.adhesion 1 69.369 85.369

## - Mitoses 1 71.369 87.369

## - Bl.cromatin 1 73.285 89.285

## - Cl.thickness 1 77.798 93.798

## - Bare.nuclei 1 83.644 99.644

#### Step: AIC=83.73

## Class ~ Cl.thickness + Cell.shape + Marg.adhesion + Bare.nuclei +

## Bl.cromatin + Normal.nucleoli + Mitoses

#### Df Deviance AIC

## - Normal.nucleoli 1 68.390 82.390

## <none> 67.727 83.727

## - Cell.shape 1 70.283 84.283

## - Marg.adhesion 1 70.529 84.529

## - Mitoses 1 72.380 86.380

## - Bl.cromatin 1 74.307 88.307

## - Cl.thickness 1 78.797 92.797

## - Bare.nuclei 1 84.942 98.942

#### Step: AIC=82.39

## Class ~ Cl.thickness + Cell.shape + Marg.adhesion + Bare.nuclei +

## Bl.cromatin + Mitoses

#### Df Deviance AIC

## <none> 68.390 82.390

## - Marg.adhesion 1 71.275 83.275

## - Mitoses 1 73.815 85.815

## - Cell.shape 1 74.560 86.560

## - Bl.cromatin 1 76.648 88.648

## - Cl.thickness 1 79.406 91.406

## - Bare.nuclei 1 85.824 97.824 summary(stepres)

#### Call:

## glm(formula = Class ~ Cl.thickness + Cell.shape + Marg.adhesion +

## Bare.nuclei + Bl.cromatin + Mitoses, family = binomial(link = "logit"),

## data = BClearn)

#### Deviance Residuals:

## Min 1Q Median 3Q Max

## -3.3673 -0.1136 -0.0664 0.0229 2.3149

#### Coefficients:

## Estimate Std. Error z value Pr(>|z|)

## (Intercept) -10.1313 1.4224 -7.123 1.06e-12 ***


## Cl.thickness 0.5207 0.1724 3.020 0.002530 **

## Cell.shape 0.4171 0.1810 2.304 0.021231 *

## Marg.adhesion 0.2755 0.1667 1.653 0.098290 .

## Bare.nuclei 0.4746 0.1318 3.600 0.000319 ***

## Bl.cromatin 0.5387 0.1987 2.712 0.006696 **

## Mitoses 0.7464 0.3614 2.065 0.038925 *

## ---

## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#### (Dispersion parameter for binomial family taken to be 1)

#### Null deviance: 583.66 on 448 degrees of freedom

## Residual deviance: 68.39 on 442 degrees of freedom

## AIC: 82.39

#### Number of Fisher Scoring iterations: 8

On retiendra essetiellementCl.thickness,Bare.Nuclei etBl.Cromatin, avec des odd-ratio positif à chaque fois.

Leur augmentation est donc un prédicteur de malignité de la tumeur.


prediction_test <- predict(stepres,newdata=BCtest,type="response") truth_test <- (BCtest$Class=='malignant')*1

Comparons la prévision et la réalité pour les 15 premiers patients du jeu test rbind(prediction_test[1:15],truth_test[1:15])

## 3 7 9 20 25 26

## [1,] 0.01030331 0.1406407 0.02527909 0.02996571 0.002280766 0.6248245

## [2,] 0.00000000 0.0000000 0.00000000 0.00000000 0.000000000 1.0000000

## 29 32 36 39 42 52

## [1,] 0.002240258 0.003833165 0.002240258 0.9918846 0.9399941 0.286267

## [2,] 0.000000000 0.000000000 0.000000000 1.0000000 1.0000000 1.000000

## 53 64 67

## [1,] 0.9844797 0.1478677 0.01078505

## [2,] 1.0000000 1.0000000 0.00000000 boxplot(prediction_test~BCtest$Class)


benign malignant

0.0 0.2 0.4 0.6 0.8 1.0




