• Aucun résultat trouvé

Deviance of Poisson regression

N/A
N/A
Protected

Academic year: 2021

Partager "Deviance of Poisson regression"

Copied!
5
0
0

Texte intégral

(1)

HAL Id: hal-02335439

https://hal.archives-ouvertes.fr/hal-02335439

Submitted on 28 Oct 2019

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

Deviance of Poisson regression

Enwu Liu

To cite this version:

(2)

Deviance of Poisson regression

Enwu Liu, Mary MacKillop Institute for Health Reaserch, ACU

October 2019

In Poisson regression there are two Deviances.

The Null Deviance1 shows how well the response variable is predicted by a

model that includes only the intercept (grand mean).

And the Residual Deviance2 is -2 times the difference between the log

likeli-hood evaluated at the maximum likelilikeli-hood estimate (MLE) and the log-likelilikeli-hood for a ”saturated model” (a theoretical model with a separate parameter for each observation and thus a perfect fit).

Now let us write down those likelihood functions.

Suppose Y has a Poisson distribution whose mean depends on vector x, for simplicity, we will suppose x only has one predictor variable. We write

E(Y |x) = λ(x)

For Poisson regression we can choose a log or an identity link function, we choose a log link here.

Log(λ(x)) = β0+ β1x

β0is the intercept.

The Likelihood function with the parameter β0 and β1 is

L(β0, β1; yi) = n Y i=1 e−λ(xi)[λ(x i)]yi yi! = n Y i=1 ee−(β0+β1xi)e(β0+β1xi)yi yi!

The log likelihood function is:

l(β0, β1; yi) = − n X i=1 e(β0+β1xi)+ n X i=1 yi(β0+ β1xi) − n X i=1 log(yi!) (1)

When we calculate null deviance we will plug in β0 into (1). β0 will be

calculated by a intercept only regression, β1will be set to zero. We write

l(β0; yi) = − n X i=1 eβ0+ n X i=1 yiβ0− n X i−1 log(yi!) (2)

(3)

Next we need to calculate the log likelihood for ”saturated model” (a theoreti-cal model with a separate parameter for each observation), therefore, we have µ1, µ2, ..., µn parameters here.

(Note, in (1), we only have two parameters, i.e. as long as the subject have same value for the predictor variables we think they are the same).

The log likelihood function for ”saturated model” is

l(µ) = n X i=1 yilogµi− n X i=1 µi− n X i=1 log(yi!)

Then it can be written as: l(µ) =XyiI(yi>0)logµi− X µiI(yi>0)− X log(yi!)I(yi>0)− X µiI(yi=0) (3)

(Note, yi ≥ 0, when yi = 0, yilogµi = 0 and log(yi!) = 0, this will be useful

later, not now)

∂ ∂µi

l(µ) = yi µi

− 1

set to zero, we get

ˆ µi= yi

Now put ˆµi into (3), since when yi= 0 we can see that ˆµi will be zero (this is

the reason why we need (3).

Now for the likelihood function (3)of the ”saturated model” we can only care yi> 0, we write l(ˆµ) =Xyilogyi− X yi− X log(yi!) (4)

From (4) you can see why we need (3) since logyi will be undefied when

yi= 0

Now the let us calcualte the deviances. The Residual Deviance =

−2[(1) − (4)] = −2 ∗ [l(β0, β1; yi) − l(ˆµ)] (5)

The Null Deviance=

−2[(2) − (4)] = −2 ∗ [l(β0; yi) − l(ˆµ)] (6)

Ok, next let us calculate the two Deviances by R then by ”hand”.

x<- c(2,15,19,14,16,15,9,17,10,23,14,14,9,5,17,16,13,6,16,19,24,9,12,7,9,7,15,21,20,20) y<-c(0,6,4,1,5,2,2,10,3,10,2,6,5,2,2,7,6,2,5,5,6,2,5,1,3,3,3,4,6,9)

p_data<-data.frame(y,x)

p_glm<-glm(y~x, family=poisson, data=p_data) summary(p_glm)

(4)

Figure 1: Poisson regression results

We can see β0 = 0.30787, β1 = 0.07636, Null Deviance=48.31, Residual

Deviance=27.84.

Here is the intercept only model

p_glm2<-glm(y~1,family=poisson, data=p_data) summary(p_glm2)

(5)

We can see β0= 1.44299

Now let us calculate these two Deviance by ”hand” l_saturated<-c()

l_regression<-c() l_intercept<-c()

for(i in 1:30){

l_regression[i]<--exp( 0.30787 +0.07636 *x[i])+y[i]*(0.30787+0.07636 *x[i])-log(factorial(y[i]))}

l_reg<-sum(l_regression) l_reg

# -60.25116 ###log likelihood for regression model

for(i in 1:30){

l_saturated[i]<-y[i]*try(log(y[i]),T)-y[i]-log(factorial(y[i])) } #there is one y_i=0 need to take care

l_sat<-sum(l_saturated,na.rm=T) l_sat

#-46.33012 ###log likelihood for saturated model

for(i in 1:30){

l_intercept[i]<-exp(1.44299)+y2[i]*(1.44299)-log(factorial(y[i])) }

l_inter<-sum(l_intercept) l_inter

#-70.48501 ##log likelihood for intercept model only

-2*(l_reg-l_sat)

#27.84209 ##Residual Deviance

-2*(l_inter-l_sat)

##48.30979 ##Null Deviance

Here we can see use these formulas and calculate by hand we can get exactly the same numbers as calculated by GLM function of R.

1. http://www.theanalysisfactor.com/r-glm-model-fit/ 2. https://onlinecourses.science.psu.edu/stat501/node/377

Figure

Figure 1: Poisson regression results

Références

Documents relatifs

To test whether the vesicular pool of Atat1 promotes the acetyl- ation of -tubulin in MTs, we isolated subcellular fractions from newborn mouse cortices and then assessed

Néanmoins, la dualité des acides (Lewis et Bronsted) est un système dispendieux, dont le recyclage est une opération complexe et par conséquent difficilement applicable à

Cette mutation familiale du gène MME est une substitution d’une base guanine par une base adenine sur le chromosome 3q25.2, ce qui induit un remplacement d’un acide aminé cystéine

En ouvrant cette page avec Netscape composer, vous verrez que le cadre prévu pour accueillir le panoramique a une taille déterminée, choisie par les concepteurs des hyperpaysages

Chaque séance durera deux heures, mais dans la seconde, seule la première heure sera consacrée à l'expérimentation décrite ici ; durant la seconde, les élèves travailleront sur

A time-varying respiratory elastance model is developed with a negative elastic component (E demand ), to describe the driving pressure generated during a patient initiated

The aim of this study was to assess, in three experimental fields representative of the various topoclimatological zones of Luxembourg, the impact of timing of fungicide

Attention to a relation ontology [...] refocuses security discourses to better reflect and appreciate three forms of interconnection that are not sufficiently attended to