Régression linéaire

(1)

http://christophe.genolini.free.fr Licence Stat-info CM6b : 1

Régression linéaire

(2)

Problème Problème

• On a les notes math et français suivantes :

• Un élève a 10 en math, on voudrait estimer sa note

(3)

http://christophe.genolini.free.fr Licence Stat-info CM6b : 3

Solution graphique Solution graphique

• Si on connaît la « droite moyenne » :

on peut « lire » la note probable

• Ici, 10 en math donne 11,2 en français

(4)

Solution arithmétique Solution arithmétique

• Equation d’une droite : y=ax+b.

• On cherche a et b

• Plusieurs solutions possibles

(5)

http://christophe.genolini.free.fr Licence Stat-info CM6b : 5

Solution arithmétique Solution arithmétique

• On considère les écarts entre

la droite et les vrais points : on

veut LA droite qui minimise ces

écarts au carré :

(6)

Calcul (optionnel) Calcul (optionnel)

• L’écart entre un point (x

i

,y

_i

) et la droite est : y

i

-y

ou encore y

i

-ax

i

-b

• L’écart au carré est donc

(y

_i

-ax

_i

-b)

²

• On cherche a et b tel que la somme des écarts au carré soit minimun, c’est-à-dire tel que soit minimum

• Pour cela, on dérive G, on trouve son minimum ce qui nous donne la valeur de a et de b

 ^ ^

 ( ) ²

) ,

( a b y ax b

g _i _i

(7)

http://christophe.genolini.free.fr Licence Stat-info CM6b : 7

Equation de droite Equation de droite

• y=ax+b

2 σ

X

Y) Cov(X, a

X

a

Y

b  

(8)

Exemple Exemple

• On calcule la covariance : cov=12,27

• On obtient a :

a = 12,27 / 11,11=1,10

• On obtient b :

b= 10,17 – 1,10 x 9,63 = - 0,46

• L a droite est :

Y=1,10 X – 0,46

Math Français

11 11

11 13

7 8

8 10

15 18

16 17

9 6

10 10

4 5

5 6

8 7

8 8

9 9

9 13

12 14

13 12

5 4

6 5

10 12

10 8

14 16

15 17

10 9

6 6

2 σ

X

Y) Cov(X,

a b  Y  a X

(9)

http://christophe.genolini.free.fr Licence Stat-info CM6b : 9

Estimation Estimation

• Si quelqu’un qui a 10 en math, on peut penser qu’il aura Y=1,10 x15 – 0,46=16,04

en français

(10)

Régression non linéaires

(11)

http://christophe.genolini.free.fr Licence Stat-info CM6b : 11

Régressions non linéaires

(12)

Examen des données Examen des données

X Y

0,00 4,87

0,05 3,94

0,10 3,32

0,15 2,82

0,20 2,67

0,25 2,29

0,30 1,91

0,35 1,93

0,40 1,86

0,45 1,39

0,50 1,50

0,55 1,09

0,60 1,40

0,65 1,28

0,70 1,35

0,75 1,16

0,80 0,89

0,85 1,03

0,90 1,04

0,95 1,07

1,00 0,78

1,20 0,72

1,40 0,73

1,60 0,51

1,80 0,47

2,00 0,38

2,50 0,56

3,00 0,32

3,50 0,36

4,00 0,30

0,00 1,00 2,00 3,00 4,00 5,00 6,00

• r est petit : il n’y a pas de lien linéaire entre X et Y

 Pas la peine de calculer a et b

(13)

http://christophe.genolini.free.fr Licence Stat-info CM6b : 13

Modification Modification

X Y Z=1/Y

0,00 4,87 0,21

0,05 3,94 0,25

0,10 3,32 0,30

0,15 2,82 0,35

0,20 2,67 0,37

0,25 2,29 0,44

0,30 1,91 0,52

0,35 1,93 0,52

0,40 1,86 0,54

0,45 1,39 0,72

0,50 1,50 0,67

0,55 1,09 0,92

0,60 1,40 0,71

0,65 1,28 0,78

0,70 1,35 0,74

0,75 1,16 0,86

0,80 0,89 1,12

0,85 1,03 0,97

0,90 1,04 0,96

0,95 1,07 0,94

1,00 0,78 1,28

1,20 0,72 1,39

1,40 0,73 1,37

1,60 0,51 1,96

1,80 0,47 2,11

2,00 0,38 2,64

2,50 0,56 1,78

3,00 0,32 3,12

3,50 0,36 2,81

4,00 0,30 3,35

4,50 0,30 3,38

5,00 0,24 4,17

(14)

Modification Modification

X Z

0,00 0,21

0,05 0,25

0,10 0,30

0,15 0,35

0,20 0,37

0,25 0,44

0,30 0,52

0,35 0,52

0,40 0,54

0,45 0,72

0,50 0,67

0,55 0,92

0,60 0,71

0,65 0,78

0,70 0,74

0,75 0,86

0,80 1,12

0,85 0,97

0,90 0,96

0,95 0,94

1,00 1,28

1,20 1,39

1,40 1,37

1,60 1,96

1,80 2,11

2,00 2,64

2,50 1,78

3,00 3,12

3,50 2,81

4,00 3,35

(15)

http://christophe.genolini.free.fr Licence Stat-info CM6b : 15

Lien entre X et Z Lien entre X et Z

X Z

0,00 0,21

0,05 0,25

0,10 0,30

0,15 0,35

0,20 0,37

0,25 0,44

0,30 0,52

0,35 0,52

0,40 0,54

0,45 0,72

0,50 0,67

0,55 0,92

0,60 0,71

0,65 0,78

0,70 0,74

0,75 0,86

0,80 1,12

0,85 0,97

0,90 0,96

0,95 0,94

1,00 1,28

1,20 1,39

1,40 1,37

1,60 1,96

1,80 2,11

2,00 2,64

2,50 1,78

3,00 3,12

3,50 2,81

4,00 3,35

4,50 3,38

5,00 4,17

0,00 0,50 1,00 1,50 2,00 2,50 3,00 3,50 4,00 4,50

0 1 2 3 4 5 6

(16)

X Z

0,00 0,21

0,05 0,25

0,10 0,30

0,15 0,35

0,20 0,37

0,25 0,44

0,30 0,52

0,35 0,52

0,40 0,54

0,45 0,72

0,50 0,67

0,55 0,92

0,60 0,71

0,65 0,78

0,70 0,74

0,75 0,86

0,80 1,12

0,85 0,97

0,90 0,96

0,95 0,94

1,00 1,28

1,20 1,39

1,40 1,37

1,60 1,96

1,80 2,11

2,00 2,64

2,50 1,78

3,00 3,12

3,50 2,81

4,00 3,35

• r=0,97, il y a donc un lien linéaire très fort

• a=0,4

• b=0,7

• Donc Z = 0,4 X + 0,7

Lien entre X et Z

(17)

Régression linéaire

http://christophe.genolini.free.fr Licence Stat-info CM6b : 1