• Aucun résultat trouvé

Multiple regression analysis of fire deaths from burns

N/A
N/A
Protected

Academic year: 2021

Partager "Multiple regression analysis of fire deaths from burns"

Copied!
13
0
0

Texte intégral

(1)

Publisher’s version / Version de l'éditeur:

Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n’arrivez pas à les repérer, communiquez avec nous à [email protected].

Questions? Contact the NRC Publications Archive team at

[email protected]. If you wish to email the authors directly, please see the first page of the publication for their contact information.

https://publications-cnrc.canada.ca/fra/droits

L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.

Building Research Note, 1967-03-01

READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE. https://nrc-publications.canada.ca/eng/copyright

NRC Publications Archive Record / Notice des Archives des publications du CNRC :

https://nrc-publications.canada.ca/eng/view/object/?id=f160385b-a9ec-4f20-bbd5-e57e60d578dd https://publications-cnrc.canada.ca/fra/voir/objet/?id=f160385b-a9ec-4f20-bbd5-e57e60d578dd

NRC Publications Archive

Archives des publications du CNRC

This publication could be one of several versions: author’s original, accepted manuscript or the publisher’s version. / La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur.

For the publisher’s version, please access the DOI link below./ Pour consulter la version de l’éditeur, utilisez le lien DOI ci-dessous.

https://doi.org/10.4224/40000650

Access and use of this website and the material on it are subject to the Terms and Conditions set forth at

Multiple regression analysis of fire deaths from burns

(2)

no.

B92

58

I

p?

F e

MULTIPLE REGRESSION

ANALYSIS

OF

FmE DEATHS

FROM

BURNS

C A N A D A M a r c h

1967

t- , --I

2 3

cc

?"."

.,

I ' O l V l S l O N O F B U l L D l N C R E S E A R C H N A T I O N A L R E S E A R C F C O U N C I L O T T A W A a C A N A D A

I

(3)

MULTIPLE

REGRESSION ANALYSIS

OF

F I R E DEATEG

FROM BURNS

G, Williams - L e i r

S U M M A R Y

A multiple regression procedure f o r investigating relation

between associated variables is described, Besides statistical

criteria, prior knowledge and the simplicity and usefulness of the

result are taken into account.

The

procedure is illustrated by an

analysis of survival of b u r n victims i n Ontario.

Multiple

regression is a statistical technique used to seek

relation b e t w e e n several variables in the presence of e r r o r s . It

is of value when t h e observations a r e subject to

random

variations,

provided there a r e enough of them, and w h e n the subject matter is not amenable t o more d i r e c t approach from experiment or theory.

V a r i o u s t e x t s

(l

to 3) explain multiple linear regression in

the detail needed and show how t o determine the constants - c

in:

where

1

is t h e dependent variable a n d the independent variables are

-

x. Starting f r o m - n o b s e r v a t i o n s of xl, xZ,

.

. .

x

,

y, the

n

cross p r o d u c t s

must each be summed and a matrix built up f r o m t h e i r sums.

Inversion of

the

matrix g i v e s the coefficients A c and also conveniently

permits calculation of a significance index, "StudentBs'Vt,

-

far each

coefficient.

Many p r o g r a m s have b e e n d e s c r i b e d for carrying out

this

or

similar procedures, e. g. by B M (4). W h e n this program is used and i f , as is probable, one or m o r e of the coeffic%ents is shown to

be not significant, manual intervention is necessary before a new

run can b e made with a reduced n u m b e r of variables; in

this

r u n

the program will start again f r o m the beginning.

(4)

with the least significant coefficient t o be picked out for rejection

by the program. The cross-product matrix need not be re- constructed; one column and one row a r e dropped, and new

estimates af the coefficients

-

c and of

-

t are made as before.

Another way of dropping o r

adding

variables, devised by Cochran

(5),

is described b y Kendall

( 6 ) .

This process can be repeated until only the most significant variable survives. During the process, one or more regression equations will probably be found that satisfy these conditions:

(a) every coefficient

is

significant; (b) each set of coefficients;

provides

the

best accommodation between the relevant equation and

the

data. According to strict r e g r e s s i o n theory, the ''best''

equation is that for which the residual mean square

is

the

least.

The procedure d e s c r i b e d , however, will not necessarily find the best sub-sets of the v a ~ i a b l e introduced. Each variable is

discarded on the grounds of its weak contribution to the equation

then

under consideration. Each discard may raise or lower the

significance of the o t h e r variables. After one or more subsequent

d i s c a r d s ,

a

sub-set

may

be reached within which some of the

earlier discards, if r e stored, could perhaps contribute significantly.

This defect may be partly remedied b y a m a r e elaborate scheme,

namely "stepwise" multiple regression. The programming of this

has

been d e s c r i b e d

by

Efroymson ( 7 ) .

This program at each step either d i s c a r d s variables below

a preset significance level, or, when all variables in the regression attain this level, r e v i e w s all variables outside the regression to

s e e whether any qualify f o r admission, Eventually an equilibrium

sub-set of variables is attained. Garside, however, points out

(8)

that there may be more than one such equilibrium sub-set at a

given significance level, and concludes that the only way to find the

b e s t sub-set is to t r y all possible sub-sets;

he

proposes a n efficient modification of the

Efroyrnson

method to achieve

this,

I f there were only

a

short list of primary variables, fifteen,

for example and these w e r e to be considered o n l y linearly,

this

method would be feasible; t h e r e would be

zk-1

equations to t e s t k

variables although it is m l i k e l y that the

be

s t sub-set would provide

a much better fit than the result of the Efroymson procedure.

If, however, non-linear forms or "independent" variables

that are functions of more than one of the primary variables

(necessary i f interactions are to be investigated)

are

u n d e r consider- ation, a

m u c h

wider field opens np.

For

instance, i f linear,

(5)

of variables to 3k; i f first order interactions are to be included,

the t o t a l b e c o m e s 3k/2 (3K

-

1 ) s o that for only four p r i m a r y v a r i a b l e s the number sf equations to be t e s t e d is of the order

and

the

e f f o r t would r e q u i r e a prohibitive l e n g t h of time on

any

machine.

k

Variables Sub-sets

These circumstances r e n d e r it necessary to

u s e

whatever

prior knowledge may be available to n a r r o w the list of functions

and interactions. T o this extent,

the

analysis must be subjective

and great effort t o reduce the residual mean square is not justified,

Healy

( 9 )

comments: "when a particular selection of, say, eight x's is needed for a strict minimum of (residual mean square),

t h e r e will usually be a selection of three or four

--

maybe s e v e r a l

such

--

for which the (minimum) is only slightly increased

. . . .

~t

H e

goes on to suggest that "more useful s t i l l will be an

approach in which the selections

. . .

a r e a t l e a s t t o some extent

under the u s e s S s control.

"

The approach found effective in

this

s t u d y w a s to choose

functions of the primary variables that s a t i s f y common sense

requirements, and t o s e l e c t rationally the interactions t o be

investigated. The resulting equations, with a l l coefficients significant, were next examined for t h e i r consistency with

common sense, It may be po'ssible t o repair any deficiencies in

E

this respect by choosing a short list from the discarded variables,

always preferring simple to complicated terms,

by

adding them to

the s u r v i v o r s , and rerunning the program. T e r m s of intermediate importance, discarded e a r l y in the first run, may be retained in

the second. '

CHOOSING FUNCTIONS

OF

THE BRIhtARY VARIABLES

T o c l a r i f y the allusions to "prior knowledge"

and

"comman

sense'' that have been m a d e above, the problem will be reviewed

(6)

regression gives the constants

-

c i n the equation

A

simpLe extension of the procedure will determine

the constants

c in the non-linear form:

-

The arbitrary functions

-

f

evidently demand some care and restraint

in their selection.

Extrapolation

of

a reg r e s sion equation is always spe curative; but even w i t h h the

r a n g e

of the observations its accuracy as a

description of the covariation will depend upon how successfully

functions have been found to match the tendencies of the data.

If

the effect of the independent variables seems to b e additive

then fo

C y )

=

y.

If

it

appears to be multiplicative, then fo

{y)

=

log

y.

In

this c a s e , the most obvious alternatives for the functions of the x's

is

either fix)

=

n, appropriate w h e r e equal increments

of x appear t o alter y

by

a constant ratio, or f (x )

=

log x l ,

leading to

the

power law: 1

T h e s e two f o r m s may be mixed a s required.

A olynornial relation may be attempted by using f [x) =: x,

'2

'1

f2(x)

=

x

,

and so on.

If

x occurs at no m o r e than n leve s , the

s e r i e s should not go higher than

Where prior knowledge r e s t r i c t s the relation, it may be

taken into account.

If, for

instance, a power-law curve must pass

through x l E a, y

=

b, r e g a r d l e s s of what values the other

variables take, then t e r m s

f [yl

=

log l y

-

b), f l ( x l )

=

log (xl

-

"1

(7)

w i l l meet t h e c a s e , provided none of the o b s e r v a t i o n s l i e s t o o

close to

this

locus.

It

is important to avoid

introducing

terms that are

equivalent in e f f e c t t o terms a l r e a d y u s e d for instance, when

the list contains log " 1 and log x2, it would be s u p e r f l u o u s to

introduce log

(XI

/

x;).

It can be s h o w n , s u b j e c t t o t h e assumptions of r e g r e s s i o n

analysis, that an apparent relation with one of the functions, and thus with the p r i m a r y variable, is u n l i k e l y to be due to chance;

hut s o long a s the relation is i n e x a c t i t can n e v e r be s a i d that no

function could r e p r e s e n t the r e l a t i o n b e t t e r .

t h e r e is reason to search f o r a function of a given primary v a r i a b l e that is as significant as pos sible, it is valid to add two or more functions with the c o e f f i c i e n t s given them b y

previous r u n s of the program.

These methods cannot prove that a v a r i a b l e does not

have a significant e f f e c t , since a11 possible functions of that

variable cannot b e tried. One c a n only r e p o r t that a function of

the variable that

had

a s i g n i f i c a n t coefficient was not found, and list those tried,

The functions to b e u s e d need not b e known in advance

when the d a t a c a r d s a r e prepared. The cards must supply the

p r i m a r y independent v a r i a b l e s and the dependent one, but (in

systems w h e r e the p r o g r a m is compiled a f r e s h for each r u n )

the

p r o g r a m loop t h a t reads the data cards can a l s o prepare any

functions or interaction t e r m s that may be needed or choase a

d i f f e r e n t dependent variable; this p a r t

of

t h e p r o g r a m can

readily be changed f r o m run to run.

SURVIVAL A F T E R B U R N S

This d i s c u s s i o n w i l l be illustrated by an analysis of the

period of survival of people who have b e e n fatally burned, usually

by

clothing fires. E v e r y fatal burning accident in Ontario f r o m

1 9 5 4 to 1963 i n c l u s i v e for which all the data w e r e available was included in the analysis, These w e r e 112 male victims and

87

female victims; information was missing in a roughly equal

n u m b e r of c a s e s .

(8)

and s e x of the victim, the a r e a of t h e burn and the date of

the accident, The dependent variable is the time interval

f r o m the accident until the victim's death.

The

notation

used

will be as follows;

T

-zt period of survival, d a y s

Y r year of accident (1 950 r 0)

W

elapsed f r a c t i o n of calendar year [e, g.

1

April r 0 . 2 5 )

S sex of victim (M t: 0 ,

F

=

1 )

R

=

a r e a of b u r n , f r a c t i o n of body s u r f a c e [i. e. per cent/100)

G + age of victim, y e a r s / l O ~

Two successive equal i n c r e a s e s in burn s e v e r i t y would

clearly not produce equal reductions in period of survival, so

that

log

T.

Survival f o r l e s s than a day w a s always treated as half a day.

The effect of the accident date m a y be e x a m i n e d i n two

ways: one can

look

for a s e a s o n a l effect by introducing:

f l (W)

=

sin

(2n

W

+

P)

w h e r e

W

is the elapsed fraction of a c a l e n d a r year and

P

is a phase constant, determined by trial; ~ ~ 1 6 is a likely value, One

can a l s o look for secular t r e n d s in the effectiveness of medical

care, t o g e t h e r with any other ambient factor influencing survival.

Since this effect, i f present at all, should be a small and gradual

o n e , T may be t r e a t e d

a s

exponentially changing with

Y;

and

there seems little point in looking for any interaction of other

variables with

this

g r a d u a l effect.

In general, women survive l o n g e r than m e n ; therefore, e i t h e r separate analyses must be conducted f o r m a l e s and

females, or S c a n

be

treated as a variable, in which case its interactions with all the other v a r i a b l e s must ( a t least in

principle) b e examined, Only one t e r m

is

needed to represent

S in the equation because

this

variable can take only two values,

W h e n the e f f e c t of burn area

R

is examined, prior

knowledge gives two boundary conditions. Burns of z e r o area

do not s h o r t e n life, a n d f o r the present purpose

R

zero must

(9)

scale; victims of 100 p e r c e n t b u r n s

do

n o t always d i e instantly, though their

chance

of s u r v i v i n g one day i s small. This situation

may be r e p r e s e n t e d sufficiently w e l l by requiring

T

=

0 at

some hypothetical value

R,

of

R;

perhaps

Ra

might

be 110 per

c e n t .

It is simple t o i n t r o d u c e terms t o r e p r e s e n t t h e s e conditions:

I3

=

lag

R

I

A p o s s i b l e , but i n f e r i o r , a l t e r n a t i v e would b e to combine

t h e s e effects a r b i t r a r i l y i n a single t e r m such as:

This would c o n s i d e r a b l y reduce the n u m b e r of interaction t e r m s ,

a n d would allow the r e g r e s s i o n less Ireedom to accommodate to

the data.

T h e r e a r e no such absolute e f f e c t s of age. Old people

evidently s u r v i v e a shorter time after b u r n s than

do

young ones,

but the gradient of the effect between c h i l d r e n and young adults

is

open t o question. It seems necessary t o allow for a polynomial relation between

G

and log

T,

W h a t d e g r e e t h i s should have only

the data can t e l l , but it s e e m s unlikely that the accuracy is

sufficient to determine more than two constants. T e r m s cl G

2

-I-

c z

G should be i n t r o d u c e d into the e nation, and all possible

i n t e r a c t i o n s looked f o r b e t w e e n

G

or G B ,

B

or D,

and

S.

The reasoning presented s o far justifies experimentation

w i t h a r e g r e s s i o n equation of

the

form:

log

T

s

C

C

C

f

(Y

3.

W )

+

C

sin (2n W

+

P )

0

1

2

4-

interaction terms.

2

Trials produced coefficients for

Y

and

Y

below the 5 per cent

significance l e v e l , but t h e y g e n e r a l l y b o r e a fairly constant ratio

to one a n o t h e r ; the combination term

Y

(1

-

13.075

Y)

w a s then t r i e d a n d came close to the 5 per c e n t level. The indications are

that, for t h e first half of t h e t e n - y e a r p e r i o d of t h e survey,

survival periods were f a i r l y steady, but that they rose appreciably

in

t h e second half,

(10)

N o

significant seasonal e f f e c t has y e t been d e t e c t e d .

T h e

p r o g r a m d i s c a r d e d n o n - s i g n i f i c a n t t e r m s i n the following order:

G ,

BG,

DG, B S G ~ , DSG', SG', SGD, SD,

Y

( 1

-

0.075

Y),

BSG,

S G , B, The last six r c g r e s s i o n s a r e given i n Table I. The e a r l i e r r e g r e s s i o n s w e r e r e j e c t e d from p r i o r k n o w k d g e , s i n c e

they included positive c o e f f i c i e n t s for the variable

B;

this

would imply that males b e l o w a c e r t a i n a g e s u r v i v e d longer a f t e r , f o r

.I,

example a 20 p e r cent b n ~ n than after one of zero arearP.

*I

-P

Perhaps t h i s s i t u a t i o n could bc rationally justified, but such

equations a r e m i s l e a d i n g as a summary of the data.

T h e r e is some f r e e d o m of choice between the r e g r e s s i o n s listed in T a b l e

I,

Only t h e last t h r e e have e v e r y coefficient:

significant a t the 5 p e r cent level. T h e f i r s t two retain

the

variable B G ~ , f o r which the coefficient i s negative. Thus t h e y

alone i n d i c a t e , c o r r e c t l y , that a l l v i c t i m s s h o u l d survive a zero

b u r n indefinitely. F o r female victims, only, the variable SB

gives a s i m i l a r indication.

T o

i l l u s t r a t e these r e s u l t s , r e g r e s s i o n e q u a t i o n s 2 and 4 f r o m Table 1 have been n u m e r i c a l l y evaluated

for a range of c o n v e n i e n t values

of

the v a r i a b l e s (see Tables

I1

and UI).

It is d i s a p p o i n t i n q that even equation

( I )

accounts for l e s s

than 42 per c e n t of the variance i n the data, An equation with

1 0 t e r m s i n c l u d i n g ?

( Y )

would a c c o u n t f o r 44 p e r c e n t , but even

1 8 t e r m s do not attain 4 5 p e r cent, E s t i m a t e s of individual

survival p e r i o d s b a s e d o n t h e s e e q u a t i o l ~ s may b e said to have

roughly a one i n three chance of being out by a factor of t h r e e or

more, u p or down.

REFERENCES

1 . Snedecor, G. W. Statistical methods. Iowa State College

P r e s s , A r n e s , 5th ed.

1956.

2,

C r o w , E . L.

,

F. A.

Davis

and M. W. Maxfield. Statistics Manual, D o v e r , N e w Y s r k , 1460.

3, Ostle, B. Statistics in research. Iowa S t a t e College Press, Amss, 1954.

4, International B u s i n e s s Machines C o ~ p o r a t l o n . s y s t e r n / 3 6 0

(11)

5.

C o c h r a n ,

W,G.

Supplement, Journal of the

Royal

Statistical Society,

Vol.

5,

1938,

p, 171,

6,

Kendall,

M, G.

Advanced theory of statistics. Griffin,

London, 2nd ed. Vol. 2 ,

p.

l67,

7.

Efroymson, M. A, Multiple regression analysis, e d i t e d by

A. Ralston and H. S, W i l f , WiPey, New Hork, 1960,

p.

191. 8 . G a r side, M. J. The b e s t s u b - s e t multiple regression

analysis. J o u r n a l of Applied Statistics, Vol. 14, 1965,

p.

196.

9,

Healy,

M , J . R .

Programming multiple regression. Computer

(12)

0

-

1 U

u u +

X

w m a ,

(13)

TABLE

11

ESTIMATED

S U R V I V A L PERIOD FOR MALES

{DAYS)

---A

---

- - -

p r e a (Per c e n t of body surface)

7

TABLE LIl. Age

( y e a r s )

--.--

--

ESTIMATED SURVIVAL PERIOD FOR FEMALES (DAYS)

20

.

-

Arcs ( P e r cent of body s u r f a c e )

I

10

1

16 3 0 50

1

Age I Cyearsl 1 3 14 12 1 1 1

9 . 3

10

52

4

2 2

!

10

I

3 . 9

1

44

I

19 9 . 3 1 4 . 0 I 8.5

i

3.4 ---- L - - -- -- . 1 f

i

20 1 40

j

60

1

8 0 i I

..

Références

Documents relatifs

(2013) Length-weight relationship and seasonal effects of the Summer Monsoon on condition factor of Terapon jarbua (Forsskål, 1775) from the wider Gulf of Aden including

Identification and detection of a novel point mutation in the Chitin Synthase gene of Culex pipiens associated with diflubenzuron resistance...

These depend on which actor controls the trait (the vector or the parasite) and, when there is manipulation, whether it is realised via infected hosts (to attract vectors) or

Brennan TP, Woods JO, Sedaghat AR, Siliciano JD, Siliciano RF, Wilke CO: Analysis of human immunodeficiency virus type 1 viremia and provirus in resting CD4+ T cells reveals a

The newly employed reactive magnetron co-sputtering technique has allowed us to enhance the absorption coefficient from the MLs owing to the high density of Si-ncs achieved and/or the

Market and communication schemes have taken a noticeable place in temples and some of them can be regarded as types of “mega-temples.” 2 This article describes the

Altogether, these results indicate that expression of the endogenous DRP1 protein is important for maintaining normal mitochondrial morphology in NHEK and that loss of this

sour rot symptoms in the field and in the laboratory (n = 5 bunches), ‘Post-harvest mild rot’ indicates fruit that were collected without rot symptoms but showed mild rot in