to Data

(1)

(2)

(3)

(4)

(5)

Generalized Linear Mixed Effects Models with Application to Fishery Data

St. Jo hn 's

by

@Jeffrey John Dowden

A practicum submitted to the School of Graduate Studi es in partial fu lfillment of the requirement for the D egree of

Master of Applied Statistics

Department of Mathe matics and Statistics Memorial University of Newfoundland

October , 2007

ewfoundla nd Canada

(6)

Abstract

The gene ralized linear model (GLIM) represents a versatile class of models s uita ble for seve ra l types of d epend ent variables. GLIMs a re popul ar models and are often a n appropri a te choice for mod elling fisheries data. However , fishery d ata and corre- sponding models tend to be complex, because of th e compl xity of the p opul ations the d ata a re sampled from. In this practicum we use generalized linear mixed effects models (GLMMs), which a re GLIMs in wh ich some parameters are random effects to model two different fi heries data sets. The fir t involves a time s ries of biological samples used to determin e fish ma turity, and the second involves paired-trawl catch da ta to det rmine if there is a difference in catch rates between two fishing vessels.

In this re earch we find that GLMMs improve stimates of maturities in a selected fi sh stock a nd can b e used to model differences in catch rates between fi shing vessels effectiv ly. This re earch also s uggests that prediction a nd forecast accuraci s ar improved by using GLMMs. Weal o provide some simulation results and found that , overall , GLMMs a ppear to perform bet ter th an GLIMs in terms of bias, coverage errors, and power tests.

11

(7)

Acknowledgements

It is difficul t to overstate my gratitude to my sup ervisor, Dr. Noel Cadigan. Hi in sight, d edication , a nd guid a nce have made it possible for me to complete this practicum. He has b een been very generous with this idea and gave me a great opport unity to work on this interest ing app lication of generalized lin ear mixed effects models. I would have b een comp letely lost without him .

I wou ld like to thank my co- upervi sor, Dr. Ga ry Sn eldon. His comments , sug- gest ions, encouragment and abili ty to explain thin gs clearly has helped me immens ly throu ghout my academic programs .

I acknowledge the financial supp ort provided by the School of Graduate St udies, Depa rtm nt of Mathematics a nd Statistics, Dr. Noel Cadigan, and Dr . G ary Sneddon in the f orm of Grad uate Assistan tship & Teaching Assi sta ntshi ps. Further, I wish t o thank Fisheri es and Oceans Canada for givin g me the opportunity to gain relevant work experience throughout my program .

Last and most importa ntly, I a m very mu ch grateful to my pa rents, John and Marily n Dowd en, and my girlfriend Catherine Harty for there continuous love, sup- port , a nd g uidan ce througho ut all my endeavors. To them I dedicate this pra tic um.

111

(8)

Abstract

Acknowledge ments

List of Table

List of Figures

1 Introduction 1.1 ilotivation .

11

111 Vlll

XI

1 1

1.2 Th G n ra lized Linear Mixed Effects Model . 2 1.3 E timation Methods for Generalized Linear Mixed Effe ts Model 5

1.4 Statistical Software P ackage . . . 11

1.4.1 The GENMOD Proc dure 12

1.4.2 The NLM IXED Procedure 12

1.4.3 The GLI MMIX Procedure 13

1.5 Scope of the Practicum . . . . . . 14

v

(9)

2 Application of GLMM: Fish Stock Maturities Data

2.1 Introd uction . . . 2.2 Materials a nd Methods

2.2.1 Data . . .

2.2.2 Fixed Effects Model 2.2.3 Mix d Effects Model

2. 2.4 Autocorrelation Diagnostics 2.2.5 Prediction and Forecast Accuracy 2.2.6 Mod el Checking .

2.3 R s uits . . . . . . .

2.3 .1 Fixed Effects (FE) Model 2.3.2 Autocorrelation Diagnostics

2.3.3 Mixed Effects (ME) Autoregre sive (AR) Model 2.3.4 Mixed Effects Mode l with Rand om Year Effects (YE)

2.4 Discussion . . . . . . . . . .

3 Application of GLMM: Fis hery Survey Calibration Data 3.1 Introdu ct ion

3.2 Methods . .

3.2.1 Pa ired- trawl fis hing protocols 3.3 Statistical Models .

3. 3.1 Fixed effects model

15 15 17 17 17 19 21 23 25 26 26 28 2 30 32

58 58

61

62

63

(10)

4 3.3.2 Mixed effects model . 3.4 R esul ts . . .

3.4. 1 F ixed effects mod I (FEl) 3.4.2 Mixed effects model (MEl) 3.4.3 Outliers I

3.4.4 Fixed effects model (FE2) 3.4.5 1ix d effects model (ME2) 3.4.6 Ou t liers II

3.5 Disc u ion

Simulation Study

4. 1 4.2 4.3

Introduct ion . Simulation Set-up . Re ult

4. 3.1 Analy is with Normal Ditit ri but d Random Effects .

4 .3.2 Analysis When Random Effect Follow a Difference of Two Log-

Gamma Ra ndom Variables . 4.4 Discu sian . . . . . . . . . . . . . .

Bibliography

A D erivation of a Log-Gamma Distribution

B Table of Acronyms

65 66 66 67 68 6 69 70 70

91 91 92 95 95

97 9

115

123

126

(11)

List of Tables

2.1 Behavior of th ACF and PACF plot for AR(p ) a nd MA(q) mod Is.

Th are common characteri tics found in both the CF a nd PA F plots a nd a re used to indenti. fiy a utocorrelat ion stru cture. . . . . . . . 22 2.2 Summary statistics (over cohorts) of fix ed effects m del stimates a nd

mixed autoregr essive (AR) effects mod el prediction of inter cepts, slopes

A 50 's a nd Ni R 's for 3Ps fema le cod. . . . . . . . . . . . . . . . 35

2.3 P earsons total x ² ^tatistics for 3P female cod ages 4- . Models d e- scribed in Ta ble ~.::! .

2.4 Mixed effects covariance para meter timates (Est) with tanda rd er- ror (S.E.) , for 3Ps fema le cod mat uritie . 1odels de rib d in Table

·) . ⁾

2.5 Large (~ ±5) x ² residua ls (Res) from the fixed effe ts model ( FE) and th autocorrelated model with year ffects (AR YE) for 3P od. p is the timated (or predicted) proportion matur . . .

3.1 Differen es in vessel charact ristics of the Canadian oa t Guard ves- sels !fred ee ller (AN ) and Teleost (TEL) . . .

3.2 Catch s umma ries. N c and Nt are th and TEL m as ured catches. n 35

36

73 i s th total numbe r of ob evation (lengt hs a nd tow ) where Nt + Nc > 0. 73

VIII

(12)

3.3 FEPl model re ults. SE - standard e rror. L,U - profil e lik lihood confid ence int rvals. pv - x ² ^p-value. ^Significa ^{nt estima} ^tes ⁱⁿ ^b ^old ^. ^{. .} ⁷³

3.4 FEl model results. SE- standard error. L,U- profile likelihood confi- d ence interva ls. pv- x ² ^p-value. Significant estimates in bold. . . . . 73 3.5 MEPl mod 1 results. SE - standard error. L ,U - profile likelihood

confid nee intervals. pv is the t-statistic p-value. Significant estimates in bold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.6 MEl model r esults. SE - standard e rro r. L ,U - profile likelihood o n-

fidence intervals . pv is th t-statistic p-va.lue. Significant es timates in bold. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.7 FEPl model results when two potential outliers (i. e trawl p air ) were

re moved. SE - standard error. L,U - profil e likelihood confidence in - t rva ls. pv - x ² p-valu e. Significant e timat s in bold . . . . . . . . . . 74 3.8 MEPl mod el r sults when two potentia l o ut liers (i.e trawl p airs) were

re mov d . SE - stand ard error. L, U - profile likelihood confid -nee in- tervals. pv i the t-statistic p-value. Significant estimates in bold . . . 74 3.9 FE2 model resul ts. SE- standard error. L,U - profile likelihood confi-

de nce inter vals. pv- x ² ^p-valu ^e. Significant estimates in bold. . . . . 75 3. 10 ME2 mod el r esults. SE - standard error. L ,U - profile likelihood con-

fiden ce intervals . pv is the t-stat istic p-valu e. Significant estimates in bo ld. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

4 .1 Values of t he s ale parameter ( ¢) and the corresponding values for the

mean (E[c5]) a nd varian c (Var[c5]) of 8. . . . . . . . . . . . . . . . . . 100

(13)

4.2 Catch summ a ries for th e cont rol vessel (WT) pooled over lengths. n c is th e total number of tows p er sp ecies, fie is t he mean catch per tow and V ar( nc) is the between-tow variance of catch for each species . N c is the total numb er of catches over all tows .

4.3 Catch summa ries for the both cont rol a nd test vessels (WT + AN) pooled over le ngths. nc+t is t he total num ber of tows per species , iic+t is the mean catch per tow a nd Var(nc+t) is the between-tow variance of catch for each species. Nc + t is th e total numb er of catch es over all tows . .. .. . . . .. . .

100

(14)

List of Figures

2.1 Northwe t Atla ntic Fis heries Organization ( AFO ) fi she ries ma nage- men t divi sions. . . . . . . . . .

2 .2 Estima tes for 3Ps cod. P anel 1: inte rcepts . P a nel 2: s lopes. P an el 3:

A _{50 .} P a nel 4: MR. ^{AR NOD} ⁱ the autoregressive (AR) mixed-effects

mod l wi th no overdi spersio n ( OD ), and AR OD has ov rdisp ersion.

37 FE is th fixed-effects mod el. . . . . . . . . . . . . . . . . . . . . . . 3 2. 3 3P s cod proportions mature at ages 4- vs. year. ges 5- are listed

a t the left-ha nd side. Top pa nel: Fixed-effects (FE) mod I. Middle pa n 1 : a utoregress ive (AR) mixed-eft· cts model wi th no ove rdi persia n ( OD ). Bottom panel: AR m odel with overdispersion (OD ). . . . . . 39 2.4 Residua ls from t he fix ed effects (FE) model for 3Ps cod , + values a re

positiv and x va lues are negative. Si ze i s p roportional to t h a bsolu te r sidua l. Top pa nel: Chi-squa r ( ² ) residua ls. Bottom pan 1 : Cross- valid ation ch i-squa re (x~ ₁ ⁾ residua l . . . . . . . . . . . . . . . . . . . 40

X I

(15)

2.5 R etrospective a nalysis for 3Ps cod , ages 4- (li ted in left margin).

The r etrospective p metric is show n in t he top left-hand corner of each panel. Column 1: Fixed-effects (FE ) mod el. Column 2: auto regressive (AR) mix ed-effects model with no overdispersion ( T QD ). Colu mn 3:

AR mod I with overdispers ion (OD ) . Colu mn 4. AR model with year- effects as nuisance para meters (YE-) . Column 5. AR m odel with year- effects as predictive pa r ameters (YE + ). The maturity at age 5 predicted from the FE model using data. up to 2004 is shown as a.

qua re , a nd the value e tima.ted in 2005 is show n as a. t riangle. . 41 2 .6 Chi- · quare (x ² ⁾ residuals by age and cohort for 3Ps cod. Fixed effects

(FE) residuals are plotted ar e circl es ( o), and re id ua.ls from the a u- tocorrela.ted model with year effects (AR YE) are plotted as triangles (.6). Solid symbol s a re truncated (~2': ±4), a nd a re presented in Table

•) _{_} _,. - ₎ _{. . .} _. _{. . . . .} _. _. _. _. _. _{. .} _{. . . . .} _. . . . . . . . . . . . . . .

2. 7 3Ps cod proportions m ature at age estimated from the fixed effe ts model (FE; solid lines) and t he a utocorrela.ted mod el with year effects

42 (AR YE; dashed lin es). Observations a re plotted as circles ( o ). . . . . 43 2. Aver age annual deviance r esidual (solid lines) from the fixed effects

(FE) model for 3P cod . Ver t ical lin es demark 95% confid nee intervals ( CI's). R esidua ls are plott d as circles ( o) . Arrow denote years for which the CI 's do not cover zero. Numb rs of r esidua ls are listed at the top. . . . . . . . .. . . . . .

2.9 Absolute values of devia nce residuals ( o's) for 3Ps cod fr om the fixed effects (FE) model vs . n (pan I 1) a nd [l, = n x p (panel 2). Th solid li ne is the fit from a. J oe s smooth r , and the d otted lines r epr esen t 95%

confide nce limits for the smoother . The dash ed lin e is a refe ren ce line at 1 a nd represents th e approx imate exp ected va.lu of the absolute

44 rc iduals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

(16)

2.10 3P cod autocorrelation functions (ACF) a nd partial auto orr lation fun ctions (PACF) of th e intercepts a nd lopes from the va ri ance com- pan nt (VC) model. . . . . . . . . . . . . . . . . . . . . 46 2.11 3Ps cod autocorrelation function (ACF) and partial a uto orrelat ion

fun tion (PAC F) of the inter cepts and slopes from the fix d-effects (FE) model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.12 Estimate for 3Ps cod. Panel 1: intercepts. Panel 2: lope . P a nel 3:

Age at 50% maturity, A 50 . P a n I 4: Maturity range , fli R. VC is the mixed- effects va ri ance component model and FE is the fi xed-effects model. . . . . . . . . . . .

2.13 R esidua ls from the autor egressive mixed-effects mod l with no overdis- p er sion (AR NOD) for 3Ps cod, +values a re positive and x va lu s are negative. Size is proportional to the ab olute residu al. Top pa nel:

Chi-square (x ² ⁾ residuals. Bottom panel: Cross-validation chi- quare 4

C : 1 ) re iduals. . . . . . . . . . . . . . . . . . . . . . . . . . . 49 2.14 Residuals from th e autoregr sive mixed-effect mod with overdis-

p ersion (AR OD) for 3Ps cod, + values a re pos itive a nd x va lue a r e n gative. Size is proportiona l Lo the absolute residual. Top panel:

Chi-sq uare (x ² ⁾ residuals. Bottom panel: Cross-valid ati n chi-squa re

( : 1) re idual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.15 Estimate for 3Ps cod. P anel 1: intercepts. P a nel 2: lop . P anel 3:

A 50 . P a nel 4: MR. AR OD is the a utoregr ssive (AR) mixed effects

model with no overclispersion ( OD ) , and AR YE has year ffe ts. FE

is th fixed efl:'ects model. . . . . . . . . . . . . . . . . . . . . . . . . . 5 1

(17)

2.16 Predictions of random effects for 3Ps cod , with 95% confidence intervals (vertical lines). Tr iangles and solid lines a re for the a utoregre ive mixed e ffects model with overdispersion (AR OD ). Circles a nd d as hed lines a re fo r t he AR model with year eff cts (AR YE) . . . . . . . . . . 52 2.17 3Ps cod proportions mat ure at ages 4- v . year. Ages 5- a re listed

at the left- hand side. Top pane l: Autoregressive (AR) mixed effects model with year ffects as nuisa nce para meters (YE-) . S cond pan 1 : AR with year effects as predictive p aramet rs (YE+ ). Third panel: au- toreg ressive (AR) mixed-effects model with no overdispersion (N OD).

Bottom panel: Independent fixed effects (FE) model. . . . . . . . . . 53 2.1 Residuals from the autoregressive mixed-effects model with year effects

(AR YE) for 3Ps cod, + values are positiv a nd x values are negative.

Size is proportional to the absolute residual. Top pane l: Chi-squar (x ² ⁾ residuals. Bottom panel: Cross-validation chi-square (x~ 1 ⁾ ^resid- uals. . . . . .

2.19 Square root of absolute values of the x ² residuals from the autoregres- sive mixed-effects model with year effects (AR YE) and t he fixed effects (FE) model for 3Ps cod. The 1:1 li ne is s hown (solid) a nd the clotted line d elineates FE residuals greater than J2. The numb r of points above and below the 1: 1 line are shown, and beneath thes va lues ar e the co rresponding numb er of points whose FE residu a ls ar gr eater

54 than J2. . . . . . . . 55 2.20 Absolute values of the x ² residua l s from the a utoregr essive mixed-

effect model with year effects (AR YE) a nd the fixed effects (FE)

model for 3Ps cod. The 1:1 line is shown (solid) and the clotted line

delineates FE r esid uals greater than 2. The number of points above

a nd 2below th e 1:1 line are shown , and beneath these values a re t he

corresponding number of points whose FE residu als a re greater than 2. 56

(18)

2.21 Log of a bsolute standardized devia nce residuals.( o's) for 3Ps cod fr om the fixed effects (FE) mo del vs. log( n). The x 's ar e average a b o- lute log d evia nce residu a l in each bin. The • 's are t he average log a bsolute devia nce residua ls in each bin using the pa ram etric bootstra p prcodeure. Vertical lines r epresent the 95% CI limi ts for each av raged log absolute devi a nce residua l in each bin using t h par amet ric boot- stra p. The solid line is th e fit from a loe s smooher , and t he das hed lines r epresen ts t he 95o/c confidence limits for t he smoother . The dotted line is a r f ren ee line a t log(1) = 0 , and represents th approximate exp c ted value of th e absolute st and a rdi zed residual . .

3.1 Northwes t Atla ntic Fis heries Organizatio n ( AFO ) nor thern Gul f fi sh- ries ma n ag ment Divisions 4RS and Su bdivision 3Pn.

3 .2 Top p anel: Hyp othetical length distri butions sampled by each trawl, Ate: t he fis h density encountered b y the Alfred e dle r a nd A.u: the fis h d ensity encoun ter ed by the Teleost . Bottom panel: log d ens ity ratio,

57

76 bt = log ( A te! A.u ). . . . . . . . . . . . . . . . . . . . . . . . . . ⁷⁷ 3.3 Lo cation of the compar a tive fishing for

division 3Pn in 2004 ( o) and 2005 ( • ).

AFO Divisons 4RS an d Su b-

3.4 Left column: Total scaled catches fr om each ha ul , A vs. TEL. The total p er swept ar ea for all sets ar e listed a t the top. Th e d otted line has a slope of one. Th e d ashed line has a slop e equ al to t he r elative efficie ncy (p) for th e FEP1 mo d 1 and the olid line denotes th e mean rela tive effic iency for t he MEP1 m odel. Solid black circles r epresent potenti al outliers. Right column: The pr di eted ra ndom effects hi stogra ms. The rows indicate ach sp ecies wi th codes given in the rig ht ma r gin: AP - American pla ice ; AC - Atla nt ic cod ; GH -

7 Gree nland ha libut; WF - Wit ch flounder . . . . . . . . . . . . . . . . . 79

(19)

3.5 E stima tes of ~ 0 ^{from th} ^e ^FEPl ^a nd MEPl models, a nd mode l s with two p otenti al outli ers removed (FNO and M 0 ). Species cod es: AP - American pla ice; AC - Atla ntic cod; GH - Greenla nd ha libu t; WF - Witch flound er. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0 3.6 Left column: Estimate of rela tive effi ciency (p _1). Solid lin e represen ts

the ME2 model estima te, d a. hed line represents FE2 mod el estim at e.

Right column : Observed (o's) and estimated (lin es) prop ortions of A scaled catches. Rows a rc for a.ch sp cics, wi th codes indicated in t he ri ght margin: AP - American plaice; AC - Atla nti c cod ; GH - Greenla nd h alibut; WF - Witch flounder. . . . . . . . . . . . . . . . . . . . . . . 1 3. 7 FE2 mo del res ults f or American plaice. Total scaled catches p r swept

a.r a. for both vessels and AN catches p er swep t area adjusted by rela tive efficiency (p 1 ) a re given at t he top . Top pa nel: Total length freq uencie for TEL (dashed-dotted line), A (d ashed line) and AN adjusted by re la tive efficie ncy (solid line), over a ll sets. Middle p anel: Stand ardi zed (by standard deviation ) total chi-squ are residua ls for each set. Botto m pa nel: A lo a l linear smoot her versus length (solid line) of t he stan- da rdized chi-square residua ls. The d ashed lines a re 95% confi d ence interva ls for the average residuals. . . . . . . . . . . . . . . . . . 82 3 .8 FE2 model resul ts for Atla ntic cod . Total scaled catches p er swept area

for both vessels and A catches per swep t area adjusted by rela tive

efficiency (p 1) are give n a t the top . Top panel: Tota l le ng th frequencies

for TEL (d ashed-d otted line), A (d ashed line) a nd A a djusted by

rela ti ve efficiency (solid lin e), over all set . Middle pa nel: Standa rdized

(by s ta nda rd d via tio n ) total chi-square re idua ls f or each set. Bo ttom

pa nel: A local linear smoother ve rs us length (solid li ne) of t he stan-

d a rdized chi-squ are residua l . The das hed lines are 95 % confide nce

intervals for the average re idu als. . . . . . . . . . . . . . . . . . . 83

(20)

3.9 FE2 model results for Greenland h alibu t. Total scaled catches per swept ar ea for both vessels and AN catches p er swept area adj ust d by relative efficiency (PL) are given at t he top. Top panel: Total length fr equencies for TEL (dashed-dotted line), A (dashed lin e) and AN ad justed by relative efficiency (solid line), over all sets. Middle panel:

Standardized (by standard d eviation) total chi-sq uare residua ls for each set. Bottom panel: A local linear smoother ver us length (solid line) of the standa rdi zed chi-sq ua re r esidu als. The dashed lines are 95%

confidence interva ls for the average residua ls . . . . . . . . . . . . . . . 84 3 .10 FE2 model r esults for Witch flound er. Total scaled catches per swept

a rea for both ve sets and AN catch es per swept area adj usted by relative efficiency (p 1) ar given at the top. Top panel: Total length frequencies for TEL (dashed-dotted line), AN (dashed line) and AN adj usted by relative efficie ncy (solid line) , over all sets. Middle panel: Standardized (by standard deviation) total chi -square residuals for each set. Bottom panel: A local li near smoother versus length (solid line) of the stan- dardized chi-squa re r esidua ls. The dash d lines are 95% confidence in terva ls for the average residu als . . . . . . . . . . . . . . . . . . . . . 85 3.11 ME2 model results for American plaice. Total scaled catch es per swept

area. for both vessels and A catches per swept ar ea. adj u ted by rel-

ative efficiency (p ₁₎ a re g iven at the top. Top left panel: Total length

frequencies for TEL (dashed-dotted line), AN (dashed line) and AN

a.dj us ted by relative effici ncy (solid line), over all sets. Top right

panel: Predicted random effects , Jil, vs length for each set. Bottom

left p anel: Sta.ncla.rclizecl (by standard deviation) total chi-square resid-

ua ls for each set. Bottom right panel: A local lin ear smoother versus

length (solid line) of the standardized chi-square residuals. The clashed

lines are 95% confidence intervals for t he aver age residuals. . . . 86

(21)

3 .12 ME2 mod el r s ul ts fo r Atla ntic cod. Total scaled catche per swep t a rea for b oth vessels a nd AN catches pe r s wept area adjust d by rel- a tive effic iency (Pt) are given a t the top . Top left pa nel: Total length frequ encies for TEL (d ashed -d otted line), AN (das hed line) and AN a dj us ted by rela tive effic iency (solid line) , over a ll sets . Top right p a nel: Predicted ra ndom effects, c5i ₁ , vs le ngth for each set. Bot tom left pa nel: Sta nda rd ized (by sta nd ard deviation ) total chi-squ a re res id- uals for each set. Bottom right p anel: A local linear moother versus length (solid line) of the standardi zed chi-squ are r esid uals. The d ashed lines a re 95% confidence intervals for the aver age residua ls. . . . 7

3.1 3 ME2 m od el r esu lts for Greenla nd ha libut. To tal scaled catches per

swep t ar ea for both vessels and AN catches p er swept area a djusted

by rela tive efficiency (p 1 ) a re given at the top. Top left pa nel: Tota l

leng th fre quencies for TEL (dashed-dott ed line), A (dashed line) a nd

A adjusted by relative effi ciency (solid line) , over all sets . Top right

pa nel: Predicted ra ndom effects, Jil , vs lengt h f or each set. Bottom left

pa nel: Sta nda rdi zed (by s ta ndard deviation) total chi-squar e r esid uals

for each set. Bot tom right panel: A local linear smoother vers us length

(solid line) of the standa rdized chi-sq uar e residuals . The d ash ed lines

a r 95% confidence intervals for the aver age residu als. . . . . . . . . .

(22)

3.14 ME2 mode l results for Witch fl ounder. Total scaled catches per swe1 t a rea for both vessels and AN catch es per swep t area adju sted by r 1 - ative efficien cy (p 1) are given a t the top. Top left panel: Total lengt h frequencies for TEL (dashed-dotted line), A (dashed li ne) and A adjusted by r elative efficiency (solid line), over all sets. Top right pan el: Predicted r and om eff cts, 8il , ^vs length for each set. Bottom left pa nel: Standardized (by standa rd deviation) total chi-square resid- ua ls for each set. Bottom right panel: A local linear smoother ver us length (soli d line) of t he stand ard ized chi-sq uarer siduals. Th dashed lines a re 95% confidence intervals for the average residuals. . . . . . . 9 3.15 E t imates of ~ 0 ^{and ~} 1 ^from the FE2 (F) and ME2 (M) models with two

p otenti a l outliers r emoved (FNO a nd M 0). Speci s codes: Am rican plaice- AM; Atlantic cod- AC; Greenland hali but- GH; WF- Witch flounder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

4. 1 Right : histogr a ms of the r andom effects, 5, where 5 ⁼ log[(J..L¢) - ¹ >.i] -

log[(J..L¢)- ¹ >. _2]. Both >. ₁ and >. ₂ are gamma ra ndom variables with mean

J..L a nd varia nce ¢J..L ² . Left: probability plots for the r andom effect .

Rows indicate random effects for corre ponding ¢ values which are given in the upp er left corn er of the probability plot. . . . . . . . . 101 4 .2 Bias of /3 ₀ for FEP1 (solid line), MEP 1c (dotted li n ) , and MEP1m

(dash-dotted line) models. Random effects a re normally distributed

with 0 mean a nd varianc s CJ ² = 0.0, 0.1, 0.5 , 0.9, re pectively. The

dashed line r epresents the horizontal line at 0. Rows are for p cies,

with co des indicated at the right ha nd-side: AC - Atlantic cod; DR -

deepwater redfish ; GH - Greenland ha libut. . . . . . . . . . . . . . . . 102

(23)

4.3 95% coverage errors of t he co nfiden ce intervals from the para meter estimates for FEP1 , MEP1 c, and MEP1m models when random effects do not ex ist. The solid lin e represents lower coverage erro rs, th e dotted line represents upp er coverage errors, the dash-dotted line represents total coverage errors (lower + upper), a nd the horizontal dotted lines represent critical values a: = 0.05 and % = 0.025. Rows are for sp cies, with co des indicated at the right hand-side: AC - Atla ntic cod; DR - deepwater redfish ; GH - Greenland halibu t . . ..

4.4 95% coverage errors of the confidence interva ls from the parameter estimates for FEP1 , MEP1c, and MEP1m models. Random effects are no rma lly di stribu ted with 0 mean a nd varia nce CJ 2 = 0.1. The

103 solid line represents lower coverage errors, the dotted line represents upper coverage errors, th e dash-dott d line represents total ove rage errors (lower + upper), a nd the horizonta l dotted li nes r present critical values a = 0.05 and % = 0.025. Rows are for sp ecies, with codes indi cated at t he right ha nd-side: AC - Atlantic cod ; DR - dee pwater redfish; GH - Greenland halibut .. ·' . . . . . . . . . . . . . . . . . . . 104 4.5 95% coverage errors of the confidence in tervals from t he parameter

est imates for FEP1 , MEP1c, and MEP1m models. Rand om effects

a re norma lly distri b uted with 0 mean a nd variance CJ 2 = 0 .5. The

solid line represents lower coverage errors, the dotted line represents

upper coverage errors , the dash-dotted line represents total coverag

errors (lower + upp er), and the ho rizontal d otted lines represent critical

va lues a = 0.05 and % = 0.025. Rows are for species, with codes

indicated at the right ha nd-side: AC - Atlanti c cod ; DR - deepwater

redfish; GH - Greenla nd halibut. . . . . . . . . . . . . . . . . . . . . . 105

(24)

4.6 95% overag errors of th confidence intervals from the pa ra met er estim at es for FEP1 MEP1 c, and MEP1m model . Ra ndo m effect arc norm ally eli tribu ted with 0 mean a nd va ria nce CJ 2 = 0.9. Th e solid lin e re1 rc ents lower coverage lTors, the dotted line rcpre ents upper coverage errors, th e d ash-dotted line r epresents tota l coverage e rrors (lower+ upp er) , a nd the hori zont al dotted line r cpres nt riti cal valu a: = 0.05 and % = 0.025 . R ows ar fo r pecics, with od es indi cated a t the right ha nd-sid e: A - Atla ntic cod· DR - deepwater redfi h ; GH - Greenla nd ha libu t. . . . . . . . . . . . . . . . . . . . 106 4 .7 The 95% confide nce wid t hs of t he par ameter estimates for FEP1 (solid

line), MEP1 c (dotted lin e), a nd MEP1m models (d as h-dotted lin ).

R andom effe ts ar norm all y distribut d with 0 mean a nd vari an es

CJ 2 = 0.0, 0.1 , 0.5, 0.9, rcspc tively. Rows ar e for s p ecies, with cod es indicated at t he right ha nd-sid e: A tlantic cod ; DR - deepwater rcdfish ; GH - Greenla nd h alibut. . . . . . . . . . . . . . . . . . . . 107 4 . P ower curves for FEP1 ( olid line), MEP1c (d ash ed line), a nd MEP1m

(dash-d otted ) with normally distribu ted ra ndom e fl'ects. Rows are for s pecies, with codes indicated a t the right ha nd-side : AC - Atlantic cod ; DR - d ~ epwater redfish ; GH - Greenl and halibut. . . . . . 10 4 .9 Bia o f {3 0 fo r FEP1 ( olid line) , MEP1 c (d ot ted lin ) and IEP1m

(dash-dotted line) models. R andom effects a re a cliff' renee o f two log- gamm a. eli tribu ted ra ndom varia bl es wit h 0 mean a nd va ri a nce CJ ² = 0 .1, 0.5, 0.9 , resp ectively. The dash ed line re presents th e horizontal line a t 0. R ows ar for sp ecies, wi th cod es indicated at t h e right hand-side:

AC- Atla nti c od; DR - deepwa ter redfish ; GH - Greenl a nd ha libu t. . 109

(25)

4.10 95% coverage errors of the confid ence inter vals from the parameter estimates for FEPl , MEP1c, and MEP1m models . Random ffects are a cliff renee of two log-gamma distributed random variables with mean E( 6) = 0 a nd varianc V ar( 6) = 0.1. The solid line r presents lower cov rage errors, the dotted line rep resents upp er cov rage errors, the dash-dotted line represents total coverage errors (lower + upper), and the horizonta l dotted lines represent criti cal values a = 0.05 and

~ = 0 .025. Rows a re for species, with codes indicated at t he rig ht ha nd- id e: AC- Atlantic co d; DR- deepwater red fish ; GH - Gr enland ha libu t. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

4. 11 95% coverage errors of the confidence intervals from the parameter estimates for FEP1 , MEP1c, a nd MEP1m models . Random effects are a diffe ren ce of two log-gamma distributed random vari ab les with mean E(6) = 0 a nd variance Var(6) = 0.5. The solid line repr scn ts lower coverag errors, the dotted line represents upp r coverage errors the dash-dotted line represents total coverage e rrors (lower+ upper), and the horizontal dotted lines represent critical values a = 0.05 and

~ = 0.0 25. Rows are for species, with codes indicated at t he right h and-s ide: AC - Atla ntic cod; DR - d eepwater redfish ; GH - Greenla nd

h ali b ut. 111

(26)

4.12 95 % coverage errors of the confidence intervals from the par am et er est im a t es for FEP1, MEP1 c, and MEP1m models. Ra ndom effect s a re a d iffe rence of two log-gamma distributed ra ndom varia bles with mean E(8 ) = 0 and vari a nce V ar (8) = 0.9. The solid line represents lower cover age errors , the clotted line represents upper coverage errors, the d ash- dotted line rep resents total coverage errors (lower + ^upper),

a nd the ho rizo ntal clot t ed lines re present c ritical va lues a = 0.05 and

~ = 0.025. Rows are for s pecies, with codes indicated at t he right ha nd-side : AC - Atla ntic cod; DR- d e pwat er redfish; GH - Greenland ha libut.

4.13 95% confid ence width of t he par a m et er est ima tes for FEP1 (solid line), MEP1c (dotted line), and MEP1m m ode ls (dash-dot ted line) . Random effect s are a. d ifference of two log- gamma dis tribut ed ra ndom var ia bles with 0 mean and varian ces CJ ² = 0.1, 0.5, 0.9 , r esp ectively. Rows a re for s pecies, with codes indi cat ed at the right hand-side: A C - Atla nt ic

112 cod ; DR - deepwa t er redfish ; GH - Gr enlancl halibut. . . . . . . . . . 113 4 .14 P ower curves for FEP1 (soli d line ), MEP1c (d as hed line), a nd 1EP1m

(das h-dotted ). Random effect s ar e a. difference of two log-gamm a dis- tributed r and om va ria bles. Rows ar e for s pecies, with codes indicated a t the right hand-side: A C- Atl a ntic cod ; DR - d eepwater redfi h ; GH

- Greenla nd ha libut . 114

(27)

Chapter 1 Introduction

1.1 Motivation

The gen erali z d linear mod el (GLIM ; McCullagh a nd Neider , 1989 ) represents ave r- satil e class of models suita ble for severa l types of dependent varia bles such as ont inu- ous, di chotomous, and count (see Neider a nd Wedd erburn, 1972). GLIM s arc p op ula r mod els th at have been used in many research a reas such as bi ological science , health ciencies, engineerin g and econome trics. StatS ci.org (accessed May 16 , 2007) present a selected bibliogr aphy of technical work r elated to t his su bject.

Th e GLIM is often a n appropriate choice fo r modelling fi s heries d ata. For ex- ampl e, Jiao a nd Chen (2004) fi tt d a GLIM for a production model and sequentia l popu la tion ana lysis (SPA) t o assess a stock of Atla ntic cod. T hey illu trat d the prob- 1 m associated with norm ality assump tions and conclud d that t he GLIM hould b used to i lentify appropriate err or s tructures in modelling fish popu lati on dy nami cs.

An other example of the applicatio n of GLIM 's to fis heries data is Ye e t. a l. (2001 ).

In this report we use similar GLIM 's, or exten ·ions that are d escribed sh ort ly, to mode l two ve ry differen t fisheries d a ta sets. The first involves a t imes ries of biological

1

(28)

2 samples used to de termine fis h maturity. This important information is us d in fish stock assessments. The second data set involves paired-trawl catch data to d etermin if there i s a differ nee in catch r ates between two fishing vessels. Thi inform ation is importa nt when interpreting fi shery survey result from different vcs el , and fish ery s urveys a re a fundamental component of most stock as essments. Although these da ta sets are differe nt in nat ure, it turns out th at simila r statistical mod els can b e used to estimate importa nt para mete rs from these data.

However, fishery d ata a nd corresponding models tend to be complex, b cause of the complexity of t he populations th data are sampled from. Realistic models us ua lly have a much la rger number of parameter t ha n can be reliably estimated. Fortunately many of t hese parameters can b realistically viewed as random variables that can b described and a l o predicted by a much smaller number of variance param ters. This makes the complex fishery models more tractable to estimate. Hence, in our two fis hery d a ta sets, we find a n adva ntage in u ing Generalized Linear Mixed Mod els (GLMM 's), which are GLIM's in which ome parameters are actually rand om effect . The main purpose of our report i to show how to use GLM 1's to model two com plex fish ery data sets. We also provide some simu lation resu lts to assess the reliab ility of t he GLMM estim ates.

1. 2 The Generalized Linear Mixed Effect s Model

In th is section we first describ e the GLIM , followed by the GLMM. A GLIM consists of t he following com ponents:

First , the r sponse variabl e vector Y = (~, .... , Y; 1 ) is de noted as an n x 1 random

vector whose distribution is from th e exponential family (s Dobson , 2002). In this

case t he variance of t he response d epends on th e mean (p, = E[Y ]) through a varianc

function V :

(29)

3 ( 1.1)

where <I? i a diagonal dispers ion matrix which is either known or must be estimated, w is a diagona l m atrix of known weights for each obser vation, a nd V (J.L) is a matrix of the variance fun ction.

Secondly, a monotonic differen tiable link fun ction g(-) is specified wh ich describes how th e expected value J.L of the r esponse vector Y is r elated to a linear predictor 17

g(J.L ) = 7] . ( 1. 2)

The lin ear predi ctor incorpor ates information a bout the covariate into the model,

77 = ^X ^'{3 ^, ^(1.3)

where X is an n x p matrix of covariates of r ank(X ) = p such that X'X i s non- ingular and {3 is a p x 1 vec tor of unknown parameter s which w e a lso refer to as fixed eff cts.

Common GLIMs include linear regression , logistic regression and Poisson regres- sio n with the co rresponding identity, logit, and log link functions respectively.

Fixed effects GLIMs ar e usually based on the assumption tha t all observations are inde pendent of each other and ar not appropriate for analysis of correlated data, in particular , cluster d a nd /o r lon gitudina l data (e.g. Zeger et. a l. 19 ).

A generalized linear mixed effects model (GLMM ) is more appropri ate for t he

a nalysis of correla ted data. A GLMM is a nat ural ext n ion to the GLIM whereby

a r andom effect is added to the lin ear predictor to accou nt for the correlation of

the data. Ma ny refe ren ces on the method are available (e.g. Breslow and layton

1993; Lee and Neid er , 1996; Sutradhar , 2003). GLMMs ar well suited for biological

(30)

4 and medical data which normally dis play he terogeneous re pon cs to treatme nt . GLMM a re u d extensively for d ata tha t ar e no t norma lly eli t ribu ted. For exa mp le Gilmour et a!. (19 5) a na lyzed binomi al da ta us ing GLMMs, a nd Agresti et al.

(2000 ) d escrib ed a va rie ty of social science a pplications of GLMMs when responses we re categori a l. Anothe r advant age of the GLMM is the a bility to combine d a ta by introducing multilevel random effec ts (see Golds tein , 1995 ). Xiao c t. a l. (2004) cite numerou s appli a tions of GLMMs in fi she ries sciences.

Suppose th at Y is an ⁿ x 1 r andom vector for the observed data and o ⁱ ^a ⁿ ^r ^x ¹

vector of r a ndom effects. The GLMM i based on t he asump tion t hat

1.1. = E [ Yjo] = 9 - ¹ (X {3 + Zo ) ( 1.4)

wh er e g- ¹ (-) i th inverse of the monotonic link fun ction , X a nd {3 a re d efin ed a in ( 1.-~ ) a nd the m at rix Z is an n x r m atrix for the rando m ffect . The ra ndom effects a re u uall y as umed to be norma lly eli tribu ted with m an 0 a nd unknown vari a nce-cova ria nce matrix G.

The GLMNI conta ins a linear mixed model inside th e inve r c link functio n, this is referr ed to as the linear predictor ,

TJ = X {J + Zo. ⁽ ^1. ⁵⁾

The va ri a nce of the ob erva tions conditioned on the ra ndom cffc Ls, is

( 1.6)

He re A ₁ ^" is a diagonal ma trix containing evalua tions at 1.1. o f a linear vari a nce function

for the GLMM and R is a va ri ance-covaria n e matrix of unknow n (Wolfinger a nd

O' Connell , 1993).

(31)

5 1.3 Estimation Methods for Generalized Linear Mixe d Eff e cts Models

The primary interest for GLMMs is in th e estima tion of fixed e ffects; howeve r, Lia ng a nd Zeger (1986) and Zeger e t. a!. (1988 ) discuss the interpre ta tion of their esti- mates in t erms of s ubject-specific (SS) and popula tion-averaged (PA ) mod els. A SS a pproach fo cu es on the estimation of the fixed effects pa ra me ters {3, the ra ndom effects o , ^{and the} ^varia ^nce ^{of the} ^random effect . The PA a pproach i prima rily in- tereste d in the estima tion of {3 a nd t he marginal va ri ance of Y which is re lated to the va rian ce of th e random effects . The ra ndom effects themselves are treated a nui- sance para me ters. An ex amJle of SS mod elling is the b est li near unbiased prediction (BLUP ; Rob inson , 1991) , a.nd an example of PA a pproach (a pplied t o count d at a.) is given by Tha ll and Va il (1990).

Fi tting a. linear mixed model using a. likelihood approach consists of s p ecifying a distributio n f or the ra ndom effects a nd then estima ting th e unknown pa ra meters using maximum lik lihood (ML) or restri cted maximum likelih ood (REML). The REML appro ach produ ce unbiased es tima tes of vari ance para meters in som e proble ms (e.g.

Harville, 1977; McGili christ , 1994 ). Th ese methods are usua lly r ferrcd to a.s m argin a l a pproaches and ty pically involve numeri cal integrations over t he ra ndom effects. T h ML a pproach i s also a PA a pproach b ecaus th random effects a rc not e t ima.ted .

Suppose th a t Yi is a. vect or of observed data for each of i s ubj ects, i = 1, .. . , k.

Yi is assum ed t o be indep endent across i, but wit hin subj ect covariance is likely to exist because each of the elem ents of Yi is measured on the same s ubject. Assume t hat a random effects vector o i exis ts tha t is a lso indep end ent across i. A s uming a n a ppropria te model linking Y i a nd oi ^exists ⁽ⁱ ^{.e. a.} ^{GLIM) a} ^{nd this} ^model ^invo ^lves

covari ates Xi th at a re related t o the mean ofYi conditional o n O i, the joint probablili ty

density function is

(32)

6 ( 1. 7)

wher p(-) i the conditional probab ility d ensity func ti on of Yi , q(-) is the pr obabili ty density fun ction of bi, Xi is a ma trix of obser ved expl ana tory vari a ble , {3 is a vector of unknown parame ters, R is a vector of unknown uniqu e elements of R (the va rian ce- covaria nc ma trix of the observations), a nd g is a vector of unknown unique ele men ts of G (the variance-covariance ma trix of the ra ndom effects) . Let {) = ({3, R , Q)' , likelihood inferences based about {) a re based on the ma rginal likelihood fun ction

k

M({)) = II J . . . J ^p(YiiXi ^{,{3, R} ^, ^bi)q(6il9 ^)dbj.

t=l

( 1. )

In pa rticula r , the func tion

f ({) ) = - log M({)) ( 1.9)

is minimi zed ove r {) numerically in orde r to estima te {), and th e inverse Hessian (second-or ler deriva tive) matrix provides an approxima te vari a nce-cova ria nce ma trix for the estima tes of{) . The fun ction f( {) ) is referred to as the negative log-likelihood fun ction or the obj ec tive fun ction for optimization.

Ther e ar e limita tions to the mar gina l method . Likelihood equatio ns te nd to be omplex a nd d iffi cult to derive. Rar ely will closed form expressions exis t for the ma rgin a l likelihood. In some instan ces th d ata may cont ain a la rge n umber o f random effects which lead to a high dimension al integra l for th e ma rgina l likelihood equa tion.

umeri cal int egr a ti on techniqu es ha ve b en used s uch as guassian quadrature (e.g.

Ande rson and Ai tkin , 19 5; Davidi an a nd Galla nt, 1993) a nd Gibb sampling (Zege r

a nd Karim , 1991 ) . High dimensional in tegra ls can be very compu tation ally inten sive

to solve num erically, a nd in some cases ar e not feasable (Stiratelli ct . al, 19 4).

(33)

7 Another app roach for estimating (3, 6, G and R is t he pseudo- likelih ood (PL ) and rest ricted pseudo- likelihood (REPL) procedure (Wolfinger and O 'Connell, 1993).

Implementation of the PL and REPL procedure first involves li n arizing t he data using a first order Taylor's series approximation expanding about initial estimates of the fixed regression parameters and random effects. Then norm al linear model th eory is used to estimate variance param ters. The variance parameters a re then used to estimate fix ed regression para meters and predict random effects whi ch, in turn, are used to linearize the d ata again to estimate new variance parameters. This process is rep eated until a sp ecific tolerance level is obtained (i.e. convergence). This proced ure is described in more detail later in this chap ter.

Several other estimation techniques for GLMMs have app eared in the li terature.

Breslow and Clayton (1993) pre ented two estimation procedures referred to as pe- nalized quasi-likelihood (PQL) and marginal quasi-likelihood (MQL) , although, these two methods corres pond to the SS and PA models of Zeger et. al. (1988) r sp ectively.

As well , the implementation of PQL and MQL can be achi eved using PL (Wol fin ger and O'Connell , 1993).

Waclawiw and Liang (1992) predicted random effects of a GLMM by iteratively olving a set of Stein-type estim ating equations. This SS approach is similar to th PL in its itera tive n ature, although t hey replace the mixed model and ML / REML equations with optimal estimating equ ations for fixed eff cts, random effects, and varian c parameters.

Sutradhar and Rao (2001) considered an exact MQL approach , h owever, this PA approach was only d eveloped for mall values of th e varian ce of the random effects.

Su trad har (2004) since improved on t he ex act iQL app roach by proposing an exact quasi-likelihood or gen eraliz d quasi-likelihood (GQL) method whereby the covari ance matrix needed to construct th e e timating equ ation h as been compu ted fo r small or large values of the variance of th random effects.

Lee and Neider (1996) used a hierarchical likelihood (HL) approach to e timate

(34)

fixed para meters a nd random e ffects . In thi s SS approach , the random ffects were treated as fixed effects and then were used to obtain est imate of the variance com- ponents. This approach was similar to the PQL me thod proposed by Breslow and Clayton (1993).

The PL/ REPL approach app ears to be an approp riate choi ce and i useful for modelling GLMMs sine it provid es a. unified fram ework for both SS a nd PA inference and includes PQL and MQL as special cases. As well, PL/ REPL a lgorithms ca n b e implemented using mixed model software p ackages (see Section 1.3; SAS/ STAT®

PROC GLIMMIX).

We provide more details about the PL/ REPL approaches below, summarized from Wolfinger and O'Connell (1993).

Let P ^a ^nd J be known estimate · of (3 and 8 and r call that

( 1.10)

which is a vector consisting of eva lua tions of g- ¹ at each component of 7]. Now let

(1.11)

where

(1.12)

is a diagonal mat rix with elements consis ting of the first derivative of g- ¹ . ote that ( I I I ) is a first-order Taylor series approxim ation of p, expa nding about P ^a ^nd J . Re- arranging the terms yields the ex pres ion

(1.13)

(35)

9 The left-ha nd s ide is the exp ected va lue, conditiona l on 8 a nd ~' of

( 1.1 4)

and

( 1.15)

Thus we can consider th e model

P = X (J + ZfJ + c: ^(1.16 ⁾

as a lin ear mixed mo del with pseud o-response (i.e . linear mix ed pse udo-model) P fixed effects (3 , ra ndom effects 6, and V a1·[c:] = Var[ PI6].

Now d efin

( 1.17)

as th e ma rginal va rian ce in the lin a r mixed pseudo-model, where () i a q x 1 pa ram et er vector conta ining a ll unknown pa r ameters in R a nd G and Z' is t he t r an posed matrix for the ra ndom effects. Based on this lin earized model, a n obj ctive func tion can be d efin ed , assuming the distribution of P is norm al. Th m aximum J og pseudo- likelihood for P is

e(e , p) =- ~l og I V( f) ) I - ~r'V (e ) - ¹ r - % ^log(27T ⁾ ^(1.18)

and the r estricted maximum log pseud o-likelih ood i

(36)

10 fn( B, p) - ~l og ^IV ^(B)I- ~r'V(e)- ¹ r

2 2

1 n - k

- - log IX 'V (e) - ¹ X I- - - log(21r)

2 2 (1.19)

whe re r = p - X (X 'V (e) - ¹ X )- ¹ X'V (e) - ¹ p , pis a rea lization of the ra ndom vector P , n d enotes the numb er of observation and 1.: is the rank of X . Num ri ca) met hod~

(i.e. Newton-Raph on qu asi- ewto n) arc ge nera lly required to maximi z e and f a over t he param ter e. ^After ^obtaining estimates for e, e t imatcs for {3 a nd 6 arc computed as

(X 'V (B) - ¹ X )- ¹ X'V(B)- ¹ p

cz'v (et ¹ ^r.

( 1.20) (1.21)

With {3 and 6 et to t he estimates the lin earization is re-com puted ( I I') a nd ( I I ) a re maximized to obtain upd ated e t imatc of R and G . Th is is iLc ratcd un t il con- vergence. This invo lves two levels of it er ation: one for the lineariza tio n and one for t he estim ation of the varia nce pa rameter in the lin eari zed model.

In some cas s, the conditional distribu t io n m ay contain a calc pa ram ter (¢ =/:. 1).

The var ian ce fun ction becomes

( 1.22)

where ()* is the ovaria nce para m eter v c tor with q - 1 elem ents. The matrices R * and G * ar c re-para m eterized versions of R and G in t erms of¢. T h m aximum log pse udo-like lihood for th e linear mixed pseudo- model ( I .~) i

e(B'* p ) =- ~ log ^IV ^(B)I ^- ^%{r'V ^(B) ^- ¹ ^r ^{} -} ^%(1 + log{27r/n}) ( 1. 23)

(37)

11 and the restricted maximum log pseudo-likelihood is

-~log IV(B)I - n; ^k ^{r'V(11)- ¹ ^r}

1 n- k

- 2 log IX'V(11*)- ¹ X I - -

2 -(1 + ^log{27r /(n- k)} ). (1.24)

The solutions for ~' 8 ^and ¢ are

{3

8 (X'V({ J)- ¹ X )- ¹ X'V(B)- ¹ ^p GZV(B*t ¹ r

r'V(B) - ¹ r ^/n

where n* equals n for PL and n- k for REPL.

1.4 Statistical Software Packages

(1.25) (1.26) ( 1. 27)

A key feature of the PL/REPL method is its ability to be implemented using standard

statistical software packages. In this practicum, we use three software procedures developed by the SAS Institute for estimating parameters in GLIMs and GLMMs:

PROC GENMOD (generalized linear models), PROC NLMIXED (non-linear mixed

effects models), and PROC GLIMMIX (generalized linear mixed effects models). Each

method will be compared in terms of the accuracy of parameter estimates and model

(i.e. response) predictions.

(38)

12 1.4.1 The GENMOD Procedure

The GENMOD procedure fits a GLIM to the d a ta by maximum likelihoo d estimation ove r the vector of unknown coefficien ts ({3). In genera l, there is no closed form solution for t he max imum likelihood estimates. GE MOD e timates th e parameters of the mode l using a n iterative fi t ting process. The disp ersion paramet r ( ¢ ) is also est imated by either m aximum likeli hood , by t he residual d evia nce, or by Pearson ' chi-squ a red divided by the degrees of freedom. Covari ances, standard rrors, an d p-

values for the para m ter estim a tes are computed bas d on th e asymptotic norma lity of maximum likelihood estima tors.

A number of link func tions and probability distribution are availab le for the GENMOD procedure. The link fun cti ons include th e identity, logit, prob it, log, a nd complementary log-log. The distribu t ions include norma l, bin omial, P ois on , gamma, inverse Ga u ia n , n egative binomia l, and multinomial.

The GE MOD procedure h as the a bilility to fi t correlated res ponse data by the genera lized estima ting equ a tion (GEE) me thod (Lia ng and Zeger, 1986) , a lth ough we do no t utilize t his feature of the software in our a na lyses.

1.4.2 The NLMIXED Procedure

PRO C T LMIXED a llows yo u to specify, conditional on th e ra ndom effects, a eli tri- bution for the resp onse va ria ble that has either a standa rd f orm (norma l, b inomia l, P oisson) o r a general dist rib ution defined by t he user. PROC NLMIXED fits n onli n- ear mixed models by maximizing an a pproximati on t o th e likelihood integrated ove r the ra ndom effects. Su ch ma rgina l me thods are commonly used with mixed mod els.

Different integra l a pproxim ations are ava ilable. These incl ud Gaussia n qu adrature

(Pinh iro a nd Ba tes, 1995) a nd first-order T aylor series approximation (Beil and

Sheiner , 19 ). S uccessful convergence of t h op timi zation proced ure resul ts in pa-

ram eter estima tes a long with their stand ard erro rs based on t he Hessia n ma tr ix of

(39)

13 the likelihood fun ction.

The LMIXED procedure only implements maximum likelihood. This is b ecause the analog to the restricted maximum likelihood met hod in PROC LMIXED soft- ware wou ld involve a h igh d imensional integral over all of the fixed-effects parameters, a nd this integral is typically not available in clos d form.

1.4 .3 The GLIMMIX Proce dure

The GLIMMIX procedure fits GLMMS based on linearizations (see Section 1.3). A Taylor s ries expans ion is used to approxima te the GLMM as a linear mixed model.

Th adva ntage of the linearization is that only t he varian ce parameters have to be est imated numerically because closed form expr ssions exist for the regres ion pa- rameter estimates. The linearization method is doubly iterative. The approximate linear m ixed mo lei is fit which is itself an itera tive process, then the new parameter estimates are used to upda te the linearization , which results in a new linear mixed mod el. The process stops when parameter estimate betwe n sucessiv linear mixed model fit s chan ge within a specifi ed tolera nce. The d efa ult estimation method in PR.OC GLIMMIX softwa re for models con tain ing ra ndom effects is restricted p eudo- likelihood (REPL). Maximum likelihood estimates o f variance pa rameters tend to be biased for small sample sizes . The REPL may provide less bias d estimation of random effect va ria nce parameters.

An adva ntage of linearization based met hod is t hat they can u e a. relatively imple form of the lineari zed mo del that typically can be fit based on only the mean a nd variance in the linearized form. Models for which the marginal distribution i s difficu lt , o r impossible, to compu te can be fit with linearization. This approach is well s ui ted for models with correlated errors and a la rge numb er of random ef!:'ects.

A disadvantage of this approach is t he absence of a t ru e objec tive function and

p otentia lly biased estimates of covaria nce parameters , especially for bin ary data. In a

(40)

14 GLMM is not a lways possible to d erive the exact log-likelih ood of th data, therefor likelihood based tests a nd statistics a re often hard to de ri ve. PROC GLIMMIX produces Wald-typ t st statistics (e.g. Bu e, 19 2) and confiden intervals.

PROC GLIMMIX softwar e provides marginal and cond it ional re iduals. Condi- tional r esidua l ar based on predictors of the r a ndom effects and estimate of the fix ed effects rcgres ion parameters. The pr eli tors of the random effects arc the estimated

best linear unbi ased predictors (BLUPs) in the approximated linear mod l.

1.5 Scop e of the Practicum

The following is a n o utline for the r em ainde r of the pract icum . In ha.pter 2 we

apply the GLMM to maturity data. for a selected fish stock off the so uthern coast of

Newfoundland. In Chapter 3 we app ly the GLMM to fis hery urvey cali bration data

from two res a rch v el fishing in the orthern Gulf of St. Lawcren c. In Chapter

4 a simulation tudy i presented o n the prop erties of estimators based on fishery

calibration data. This practicum conta ins vario u acroynms (i .. GLIM , GLMM), a

table of a ronyms along with their corr sp oncling descriptions i s given in Appendix

B.

(41)

Chapter 2 Application of GLMM: Fish Stock Maturities Data

2.1 Introduction

Generalized linear mixed model s have b come an in reasingly important m thocl f or fi sheries resear h in recent years (Xiao et a l. , 2004) . These model s a r uitabl for analyz ing complex co unt data. Models for such data often have a large number of parameters to estim at , many of whic h an usefull y b considered a random variab l s to improve estim ation (or prediction). In thi chapter w e examin e the appli at i n of mixed model to improve estimation and infer nee f or fish stock maturation rate . This is an im portant probl m in fi heri cien es. Maturation rate are fund amental to understanding t he lynamics and productivi ty of fish stocks. Good c timates of maturation rat are r quired for sue e sfu l management of comm r ia.lly xploite d fi sh tocks ( 0 ! sen t . al. , 2005).

Th e mat ure component of a fish to k is usually referred to as th pawning stock bioma s (SS B). It can b defined as th product of biomass-at-ag and proportion mature-at-age (mat uritie ), summed ov r all ages in the tock . Good timate of

15

(42)

16 ma turities ar e required for good estima tes of SSB . P oor estimates of maturi ties can have d le terious impacts on estimation of the potential yi ld of a fi hery, p op ulation growth a nd the health of stocks (Welch a nd Fou ch er , 19 ). Maturi ties can cha ng ove r time b ecause of many factors (eg. Beach am , 1983; Pitt, 1975; Templeman et a! , 197 ; Shelton a nd Armstrong, 19 3; Morgan a nd Colbo urne, 1999) . Ann ual estima tes are required ; however , the mos t a ppropria te way to produ ce such estimat s is by cohort (see Morgan, 2000). A coho rt is a group of a nimals (or more gener ally individua ls) with the same birth year.

Biological sampling progr ams for fish t ocks provide coun ts of t he number matur . Each year s uch sampling produces d ata on th e number of fish examined , th ir age, and the numb er found to be ma ture. The proba bility of ma turing is a n incr easing func tion of age within a cohort. A common m odel us d t o estim ate m aturi ties i logistic regression , wh re a ge is the cova ria t . This is a fixed effects gen era lized linear model (GLIM) with a logit link fun ction (Mc Cullagh and elder , 19 9). T h is model is fitted to each cohort when maturity d ata for sufficient ages h ave b een collected.

Howe ver , there ar e problems with this a p proach . Data ar e upd ated an n ually for unfinis hed ( eg. r cent) cohorts and this can r esult in substantia l chan ges from year to year in th e estima ted ma turities for tha t cohor t. For example, the maturity at age 5 in 2003 estima ted usin g cohort d a ta up to 2004 can be quite different th an the 2003 est imate using da t a up to 2003.

Often the a nnual trend s in cohort ma turit i s ar e fa irly sm ooth (see Needle et.

al. , 2003 ). The purpos of this cha pter to invest igate if a gener ali zed linear mi x d

effects model (GLMM) can be used to improve estima tes of ma turi ties, par ticula rly

for unfini heel cohorts, by utilizing the a utocorrela tion structure in cohort maturi tie .

We examine an importa nt comm rcia l s pecies in the or thwest Atlan tic, the Atlant ic

co d (Ga dus m orhua). Also, fish eries ma nagers of t en consider cha nges in SSB in

stock projec tions for diffe ren t future ma nageme nt scen ari os and they require that

mat uri ti s b e forecasted in the next seve ra l years (or m or e) to comp u te SSB 's. We

also investigate if mixed mod els can improve for ecasted mat ur ities.

to Data

Generalized Linear Mixed Effects Models with Application to Fishery Data

St. Jo hn 's

by

@Jeffrey John Dowden

A practicum submitted to the School of Graduate Studi es in partial fu lfillment of the requirement for the D egree of

Master of Applied Statistics

Department of Mathe matics and Statistics Memorial University of Newfoundland

October , 2007

ewfoundla nd Canada

Abstract

11

Acknowledgements

I wou ld like to thank my co- upervi sor, Dr. Ga ry Sn eldon. His comments , sug- gest ions, encouragment and abili ty to explain thin gs clearly has helped me immens ly throu ghout my academic programs .

Last and most importa ntly, I a m very mu ch grateful to my pa rents, John and Marily n Dowd en, and my girlfriend Catherine Harty for there continuous love, sup- port , a nd g uidan ce througho ut all my endeavors. To them I dedicate this pra tic um.

111

Contents

Abstract

Acknowledge ments

List of Table

List of Figures

1 Introduction 1.1 ilotivation .

11

111

Vlll

XI

1 1

1.2 Th G n ra lized Linear Mixed Effects Model . 2 1.3 E timation Methods for Generalized Linear Mixed Effe ts Model 5

1.4 Statistical Software P ackage . . . 11

1.4.1 The GENMOD Proc dure 12

1.4.2 The NLM IXED Procedure 12

1.4.3 The GLI MMIX Procedure 13

1.5 Scope of the Practicum . . . . . . 14

v

2 Application of GLMM: Fish Stock Maturities Data

2.1 Introd uction . . . 2.2 Materials a nd Methods

2.2.1 Data . . .

2.2.2 Fixed Effects Model 2.2.3 Mix d Effects Model

2. 2.4 Autocorrelation Diagnostics 2.2.5 Prediction and Forecast Accuracy 2.2.6 Mod el Checking .

2.3 R s uits . . . . . . .

2.3 .1 Fixed Effects (FE) Model 2.3.2 Autocorrelation Diagnostics

2.3.3 Mixed Effects (ME) Autoregre sive (AR) Model 2.3.4 Mixed Effects Mode l with Rand om Year Effects (YE)

2.4 Discussion . . . . . . . . . .

3 Application of GLMM: Fis hery Survey Calibration Data 3.1 Introdu ct ion

3.2 Methods . .

3.2.1 Pa ired- trawl fis hing protocols 3.3 Statistical Models .

3. 3.1 Fixed effects model

15

15 17 17 17 19 21 23 25 26 26 28 2 30 32

58 58

61

61

62

63

4

3.3.2 Mixed effects model . 3.4 R esul ts . . .

3.4. 1 F ixed effects mod I (FEl) 3.4.2 Mixed effects model (MEl) 3.4.3 Outliers I

3.4.4 Fixed effects model (FE2) 3.4.5 1ix d effects model (ME2) 3.4.6 Ou t liers II

3.5 Disc u ion

Simulation Study

4. 1 4.2 4.3

Introduct ion . Simulation Set-up . Re ult

4. 3.1 Analy is with Normal Ditit ri but d Random Effects .

4 .3.2 Analysis When Random Effect Follow a Difference of Two Log-

Gamma Ra ndom Variables . 4.4 Discu sian . . . . . . . . . . . . . .

Bibliography

A D erivation of a Log-Gamma Distribution

B Table of Acronyms

65 66 66 67 68 6 69 70 70

91

91 92 95 95

97 9

115

123

126

List of Tables

2.1 Behavior of th ACF and PACF plot for AR(p ) a nd MA(q) mod Is.

Th are common characteri tics found in both the CF a nd PA F plots a nd a re used to indenti. fiy a utocorrelat ion stru cture. . . . . . . . 22 2.2 Summary statistics (over cohorts) of fix ed effects m del stimates a nd

mixed autoregr essive (AR) effects mod el prediction of inter cepts, slopes

A 50 's a nd Ni R 's for 3Ps fema le cod. . . . . . . . . . . . . . . . 35

2.3 P earsons total x ² ^tatistics for 3P female cod ages 4- . Models d e- scribed in Ta ble ~.::! .

·) . ⁾

2.5 Large (~ ±5) x ² residua ls (Res) from the fixed effe ts model ( FE) and th autocorrelated model with year ffects (AR YE) for 3P od. p is the timated (or predicted) proportion matur . . .

3.3 FEPl model re ults. SE - standard e rror. L,U - profil e lik lihood confid ence int rvals. pv - x ² ^p-value. ^Significa ^{nt estima} ^tes ⁱⁿ ^b ^old ^. ^{. .} ⁷³

3.4 FEl model results. SE- standard error. L,U- profile likelihood confi- d ence interva ls. pv- x ² ^p-value. Significant estimates in bold. . . . . 73 3.5 MEPl mod 1 results. SE - standard error. L ,U - profile likelihood

re moved. SE - standard error. L,U - profil e likelihood confidence in - t rva ls. pv - x ² p-valu e. Significant e timat s in bold . . . . . . . . . . 74 3.8 MEPl mod el r sults when two potentia l o ut liers (i.e trawl p airs) were

de nce inter vals. pv- x ² ^p-valu ^e. Significant estimates in bold. . . . . 75 3. 10 ME2 mod el r esults. SE - standard error. L ,U - profile likelihood con-

A _{50 .} P a nel 4: MR. ^{AR NOD} ⁱ the autoregressive (AR) mixed-effects

positiv and x va lues are negative. Si ze i s p roportional to t h a bsolu te r sidua l. Top pa nel: Chi-squa r ( ² ) residua ls. Bottom pan 1 : Cross- valid ation ch i-squa re (x~ ₁ ⁾ residua l . . . . . . . . . . . . . . . . . . . 40

qua re , a nd the value e tima.ted in 2005 is show n as a. t riangle. . 41 2 .6 Chi- · quare (x ² ⁾ residuals by age and cohort for 3Ps cod. Fixed effects

•) _{_} _,. - ₎ _{. . .} _. _{. . . . .} _. _. _. _. _. _{. .} _{. . . . .} _. . . . . . . . . . . . . . .

Chi-square (x ² ⁾ residuals. Bottom panel: Cross-validation chi- quare 4

Chi-sq uare (x ² ⁾ residuals. Bottom panel: Cross-valid ati n chi-squa re

Size is proportional to the absolute residual. Top pane l: Chi-squar (x ² ⁾ residuals. Bottom panel: Cross-validation chi-square (x~ 1 ⁾ ^resid- uals. . . . . .