Publisher’s version / Version de l'éditeur:
Vous avez des questions? Nous pouvons vous aider. Pour communiquer directement avec un auteur, consultez la première page de la revue dans laquelle son article a été publié afin de trouver ses coordonnées. Si vous n’arrivez pas à les repérer, communiquez avec nous à [email protected].
Questions? Contact the NRC Publications Archive team at
[email protected]. If you wish to email the authors directly, please see the first page of the publication for their contact information.
https://publications-cnrc.canada.ca/fra/droits
L’accès à ce site Web et l’utilisation de son contenu sont assujettis aux conditions présentées dans le site LISEZ CES CONDITIONS ATTENTIVEMENT AVANT D’UTILISER CE SITE WEB.
Building Research Note, 1967-03-01
READ THESE TERMS AND CONDITIONS CAREFULLY BEFORE USING THIS WEBSITE. https://nrc-publications.canada.ca/eng/copyright
NRC Publications Archive Record / Notice des Archives des publications du CNRC :
https://nrc-publications.canada.ca/eng/view/object/?id=f160385b-a9ec-4f20-bbd5-e57e60d578dd https://publications-cnrc.canada.ca/fra/voir/objet/?id=f160385b-a9ec-4f20-bbd5-e57e60d578dd
NRC Publications Archive
Archives des publications du CNRC
This publication could be one of several versions: author’s original, accepted manuscript or the publisher’s version. / La version de cette publication peut être l’une des suivantes : la version prépublication de l’auteur, la version acceptée du manuscrit ou la version de l’éditeur.
For the publisher’s version, please access the DOI link below./ Pour consulter la version de l’éditeur, utilisez le lien DOI ci-dessous.
https://doi.org/10.4224/40000650
Access and use of this website and the material on it are subject to the Terms and Conditions set forth at
Multiple regression analysis of fire deaths from burns
no.
B92
58
I
p?
F e
MULTIPLE REGRESSION
ANALYSIS
OF
FmE DEATHS
FROMBURNS
C A N A D A M a r c h
1967
t- , --I2 3
cc
?".".,
I ' O l V l S l O N O F B U l L D l N C R E S E A R C H N A T I O N A L R E S E A R C F C O U N C I L O T T A W A a C A N A D AI
MULTIPLE
REGRESSION ANALYSIS
OF
F I R E DEATEG
FROM BURNS
G, Williams - L e i r
S U M M A R Y
A multiple regression procedure f o r investigating relation
between associated variables is described, Besides statistical
criteria, prior knowledge and the simplicity and usefulness of the
result are taken into account.
The
procedure is illustrated by ananalysis of survival of b u r n victims i n Ontario.
Multiple
regression is a statistical technique used to seekrelation b e t w e e n several variables in the presence of e r r o r s . It
is of value when t h e observations a r e subject to
random
variations,provided there a r e enough of them, and w h e n the subject matter is not amenable t o more d i r e c t approach from experiment or theory.
V a r i o u s t e x t s
(l
to 3) explain multiple linear regression inthe detail needed and show how t o determine the constants - c
in:
where
1
is t h e dependent variable a n d the independent variables are-
x. Starting f r o m - n o b s e r v a t i o n s of xl, xZ,.
. .
x,
y, then
cross p r o d u c t s
must each be summed and a matrix built up f r o m t h e i r sums.
Inversion of
the
matrix g i v e s the coefficients A c and also convenientlypermits calculation of a significance index, "StudentBs'Vt,
-
far eachcoefficient.
Many p r o g r a m s have b e e n d e s c r i b e d for carrying out
this
orsimilar procedures, e. g. by B M (4). W h e n this program is used and i f , as is probable, one or m o r e of the coeffic%ents is shown to
be not significant, manual intervention is necessary before a new
run can b e made with a reduced n u m b e r of variables; in
this
r u nthe program will start again f r o m the beginning.
with the least significant coefficient t o be picked out for rejection
by the program. The cross-product matrix need not be re- constructed; one column and one row a r e dropped, and new
estimates af the coefficients
-
c and of-
t are made as before.Another way of dropping o r
adding
variables, devised by Cochran(5),
is described b y Kendall( 6 ) .
This process can be repeated until only the most significant variable survives. During the process, one or more regression equations will probably be found that satisfy these conditions:
(a) every coefficient
is
significant; (b) each set of coefficients;provides
the
best accommodation between the relevant equation andthe
data. According to strict r e g r e s s i o n theory, the ''best''equation is that for which the residual mean square
is
theleast.
The procedure d e s c r i b e d , however, will not necessarily find the best sub-sets of the v a ~ i a b l e introduced. Each variable is
discarded on the grounds of its weak contribution to the equation
then
under consideration. Each discard may raise or lower thesignificance of the o t h e r variables. After one or more subsequent
d i s c a r d s ,
a
sub-setmay
be reached within which some of theearlier discards, if r e stored, could perhaps contribute significantly.
This defect may be partly remedied b y a m a r e elaborate scheme,
namely "stepwise" multiple regression. The programming of this
has
been d e s c r i b e dby
Efroymson ( 7 ) .This program at each step either d i s c a r d s variables below
a preset significance level, or, when all variables in the regression attain this level, r e v i e w s all variables outside the regression to
s e e whether any qualify f o r admission, Eventually an equilibrium
sub-set of variables is attained. Garside, however, points out
(8)
that there may be more than one such equilibrium sub-set at a
given significance level, and concludes that the only way to find the
b e s t sub-set is to t r y all possible sub-sets;
he
proposes a n efficient modification of theEfroyrnson
method to achievethis,
I f there were only
a
short list of primary variables, fifteen,for example and these w e r e to be considered o n l y linearly,
this
method would be feasible; t h e r e would bezk-1
equations to t e s t kvariables although it is m l i k e l y that the
be
s t sub-set would providea much better fit than the result of the Efroymson procedure.
If, however, non-linear forms or "independent" variables
that are functions of more than one of the primary variables
(necessary i f interactions are to be investigated)
are
u n d e r consider- ation, am u c h
wider field opens np.For
instance, i f linear,of variables to 3k; i f first order interactions are to be included,
the t o t a l b e c o m e s 3k/2 (3K
-
1 ) s o that for only four p r i m a r y v a r i a b l e s the number sf equations to be t e s t e d is of the orderand
the
e f f o r t would r e q u i r e a prohibitive l e n g t h of time onany
machine.
k
Variables Sub-setsThese circumstances r e n d e r it necessary to
u s e
whateverprior knowledge may be available to n a r r o w the list of functions
and interactions. T o this extent,
the
analysis must be subjectiveand great effort t o reduce the residual mean square is not justified,
Healy
( 9 )
comments: "when a particular selection of, say, eight x's is needed for a strict minimum of (residual mean square),t h e r e will usually be a selection of three or four
--
maybe s e v e r a lsuch
--
for which the (minimum) is only slightly increased. . . .
~tH e
goes on to suggest that "more useful s t i l l will be anapproach in which the selections
. . .
a r e a t l e a s t t o some extentunder the u s e s S s control.
"
The approach found effective in
this
s t u d y w a s to choosefunctions of the primary variables that s a t i s f y common sense
requirements, and t o s e l e c t rationally the interactions t o be
investigated. The resulting equations, with a l l coefficients significant, were next examined for t h e i r consistency with
common sense, It may be po'ssible t o repair any deficiencies in
E
this respect by choosing a short list from the discarded variables,
always preferring simple to complicated terms,
by
adding them tothe s u r v i v o r s , and rerunning the program. T e r m s of intermediate importance, discarded e a r l y in the first run, may be retained in
the second. '
CHOOSING FUNCTIONS
OF
THE BRIhtARY VARIABLEST o c l a r i f y the allusions to "prior knowledge"
and
"commansense'' that have been m a d e above, the problem will be reviewed
regression gives the constants
-
c i n the equationA
simpLe extension of the procedure will determinethe constants
c in the non-linear form:
-
The arbitrary functions
-
f
evidently demand some care and restraintin their selection.
Extrapolation
of
a reg r e s sion equation is always spe curative; but even w i t h h ther a n g e
of the observations its accuracy as adescription of the covariation will depend upon how successfully
functions have been found to match the tendencies of the data.
If
the effect of the independent variables seems to b e additivethen fo
C y )
=
y.If
it
appears to be multiplicative, then fo{y)
=
log
y.In
this c a s e , the most obvious alternatives for the functions of the x'sis
either fix)=
n, appropriate w h e r e equal incrementsof x appear t o alter y
by
a constant ratio, or f (x )=
log x l ,leading to
the
power law: 1T h e s e two f o r m s may be mixed a s required.
A olynornial relation may be attempted by using f [x) =: x,
'2
'1
f2(x)
=
x,
and so on.If
x occurs at no m o r e than n leve s , thes e r i e s should not go higher than
Where prior knowledge r e s t r i c t s the relation, it may be
taken into account.
If, for
instance, a power-law curve must passthrough x l E a, y
=
b, r e g a r d l e s s of what values the othervariables take, then t e r m s
f [yl
=
log l y-
b), f l ( x l )=
log (xl-
"1
w i l l meet t h e c a s e , provided none of the o b s e r v a t i o n s l i e s t o o
close to
this
locus.It
is important to avoidintroducing
terms that areequivalent in e f f e c t t o terms a l r e a d y u s e d for instance, when
the list contains log " 1 and log x2, it would be s u p e r f l u o u s to
introduce log
(XI
/
x;).It can be s h o w n , s u b j e c t t o t h e assumptions of r e g r e s s i o n
analysis, that an apparent relation with one of the functions, and thus with the p r i m a r y variable, is u n l i k e l y to be due to chance;
hut s o long a s the relation is i n e x a c t i t can n e v e r be s a i d that no
function could r e p r e s e n t the r e l a t i o n b e t t e r .
I£
t h e r e is reason to search f o r a function of a given primary v a r i a b l e that is as significant as pos sible, it is valid to add two or more functions with the c o e f f i c i e n t s given them b yprevious r u n s of the program.
These methods cannot prove that a v a r i a b l e does not
have a significant e f f e c t , since a11 possible functions of that
variable cannot b e tried. One c a n only r e p o r t that a function of
the variable that
had
a s i g n i f i c a n t coefficient was not found, and list those tried,The functions to b e u s e d need not b e known in advance
when the d a t a c a r d s a r e prepared. The cards must supply the
p r i m a r y independent v a r i a b l e s and the dependent one, but (in
systems w h e r e the p r o g r a m is compiled a f r e s h for each r u n )
the
p r o g r a m loop t h a t reads the data cards can a l s o prepare anyfunctions or interaction t e r m s that may be needed or choase a
d i f f e r e n t dependent variable; this p a r t
of
t h e p r o g r a m canreadily be changed f r o m run to run.
SURVIVAL A F T E R B U R N S
This d i s c u s s i o n w i l l be illustrated by an analysis of the
period of survival of people who have b e e n fatally burned, usually
by
clothing fires. E v e r y fatal burning accident in Ontario f r o m1 9 5 4 to 1963 i n c l u s i v e for which all the data w e r e available was included in the analysis, These w e r e 112 male victims and
87
female victims; information was missing in a roughly equal
n u m b e r of c a s e s .
and s e x of the victim, the a r e a of t h e burn and the date of
the accident, The dependent variable is the time interval
f r o m the accident until the victim's death.
The
notationused
will be as follows;T
-zt period of survival, d a y sY r year of accident (1 950 r 0)
W
elapsed f r a c t i o n of calendar year [e, g.1
April r 0 . 2 5 )S sex of victim (M t: 0 ,
F
=
1 )R
=
a r e a of b u r n , f r a c t i o n of body s u r f a c e [i. e. per cent/100)G + age of victim, y e a r s / l O ~
Two successive equal i n c r e a s e s in burn s e v e r i t y would
clearly not produce equal reductions in period of survival, so
that
log
T.Survival f o r l e s s than a day w a s always treated as half a day.
The effect of the accident date m a y be e x a m i n e d i n two
ways: one can
look
for a s e a s o n a l effect by introducing:f l (W)
=
sin(2n
W
+
P)
w h e r e
W
is the elapsed fraction of a c a l e n d a r year andP
is a phase constant, determined by trial; ~ ~ 1 6 is a likely value, Onecan a l s o look for secular t r e n d s in the effectiveness of medical
care, t o g e t h e r with any other ambient factor influencing survival.
Since this effect, i f present at all, should be a small and gradual
o n e , T may be t r e a t e d
a s
exponentially changing withY;
andthere seems little point in looking for any interaction of other
variables with
this
g r a d u a l effect.In general, women survive l o n g e r than m e n ; therefore, e i t h e r separate analyses must be conducted f o r m a l e s and
females, or S c a n
be
treated as a variable, in which case its interactions with all the other v a r i a b l e s must ( a t least inprinciple) b e examined, Only one t e r m
is
needed to representS in the equation because
this
variable can take only two values,W h e n the e f f e c t of burn area
R
is examined, priorknowledge gives two boundary conditions. Burns of z e r o area
do not s h o r t e n life, a n d f o r the present purpose
R
zero mustscale; victims of 100 p e r c e n t b u r n s
do
n o t always d i e instantly, though theirchance
of s u r v i v i n g one day i s small. This situationmay be r e p r e s e n t e d sufficiently w e l l by requiring
T
=
0 atsome hypothetical value
R,
ofR;
perhapsRa
mightbe 110 per
c e n t .
It is simple t o i n t r o d u c e terms t o r e p r e s e n t t h e s e conditions:
I3
=
lagR
IA p o s s i b l e , but i n f e r i o r , a l t e r n a t i v e would b e to combine
t h e s e effects a r b i t r a r i l y i n a single t e r m such as:
This would c o n s i d e r a b l y reduce the n u m b e r of interaction t e r m s ,
a n d would allow the r e g r e s s i o n less Ireedom to accommodate to
the data.
T h e r e a r e no such absolute e f f e c t s of age. Old people
evidently s u r v i v e a shorter time after b u r n s than
do
young ones,but the gradient of the effect between c h i l d r e n and young adults
is
open t o question. It seems necessary t o allow for a polynomial relation betweenG
and logT,
W h a t d e g r e e t h i s should have onlythe data can t e l l , but it s e e m s unlikely that the accuracy is
sufficient to determine more than two constants. T e r m s cl G
2
-I-
c z
G should be i n t r o d u c e d into the e nation, and all possiblei n t e r a c t i o n s looked f o r b e t w e e n
G
or G B ,B
or D,and
S.The reasoning presented s o far justifies experimentation
w i t h a r e g r e s s i o n equation of
the
form:log
T
sC
CC
f(Y
3.W )
+
C
sin (2n W+
P )
0
1
24-
interaction terms.2
Trials produced coefficients for
Y
andY
below the 5 per centsignificance l e v e l , but t h e y g e n e r a l l y b o r e a fairly constant ratio
to one a n o t h e r ; the combination term
Y
(1-
13.075Y)
w a s then t r i e d a n d came close to the 5 per c e n t level. The indications arethat, for t h e first half of t h e t e n - y e a r p e r i o d of t h e survey,
survival periods were f a i r l y steady, but that they rose appreciably
in
t h e second half,N o
significant seasonal e f f e c t has y e t been d e t e c t e d .T h e
p r o g r a m d i s c a r d e d n o n - s i g n i f i c a n t t e r m s i n the following order:
G ,
BG,
DG, B S G ~ , DSG', SG', SGD, SD,Y
( 1-
0.075Y),
BSG,S G , B, The last six r c g r e s s i o n s a r e given i n Table I. The e a r l i e r r e g r e s s i o n s w e r e r e j e c t e d from p r i o r k n o w k d g e , s i n c e
they included positive c o e f f i c i e n t s for the variable
B;
this
would imply that males b e l o w a c e r t a i n a g e s u r v i v e d longer a f t e r , f o r.I,
example a 20 p e r cent b n ~ n than after one of zero arearP.
*I
-P
Perhaps t h i s s i t u a t i o n could bc rationally justified, but such
equations a r e m i s l e a d i n g as a summary of the data.
T h e r e is some f r e e d o m of choice between the r e g r e s s i o n s listed in T a b l e
I,
Only t h e last t h r e e have e v e r y coefficient:significant a t the 5 p e r cent level. T h e f i r s t two retain
the
variable B G ~ , f o r which the coefficient i s negative. Thus t h e yalone i n d i c a t e , c o r r e c t l y , that a l l v i c t i m s s h o u l d survive a zero
b u r n indefinitely. F o r female victims, only, the variable SB
gives a s i m i l a r indication.
T o
i l l u s t r a t e these r e s u l t s , r e g r e s s i o n e q u a t i o n s 2 and 4 f r o m Table 1 have been n u m e r i c a l l y evaluatedfor a range of c o n v e n i e n t values
of
the v a r i a b l e s (see TablesI1
and UI).
It is d i s a p p o i n t i n q that even equation
( I )
accounts for l e s sthan 42 per c e n t of the variance i n the data, An equation with
1 0 t e r m s i n c l u d i n g ?
( Y )
would a c c o u n t f o r 44 p e r c e n t , but even1 8 t e r m s do not attain 4 5 p e r cent, E s t i m a t e s of individual
survival p e r i o d s b a s e d o n t h e s e e q u a t i o l ~ s may b e said to have
roughly a one i n three chance of being out by a factor of t h r e e or
more, u p or down.
REFERENCES
1 . Snedecor, G. W. Statistical methods. Iowa State College
P r e s s , A r n e s , 5th ed.
1956.
2,
C r o w , E . L.,
F. A.Davis
and M. W. Maxfield. Statistics Manual, D o v e r , N e w Y s r k , 1460.3, Ostle, B. Statistics in research. Iowa S t a t e College Press, Amss, 1954.
4, International B u s i n e s s Machines C o ~ p o r a t l o n . s y s t e r n / 3 6 0
5.
C o c h r a n ,W,G.
Supplement, Journal of theRoyal
Statistical Society,Vol.
5,1938,
p, 171,6,
Kendall,M, G.
Advanced theory of statistics. Griffin,London, 2nd ed. Vol. 2 ,
p.
l67,
7.
Efroymson, M. A, Multiple regression analysis, e d i t e d byA. Ralston and H. S, W i l f , WiPey, New Hork, 1960,
p.
191. 8 . G a r side, M. J. The b e s t s u b - s e t multiple regressionanalysis. J o u r n a l of Applied Statistics, Vol. 14, 1965,
p.
196.
9,
Healy,M , J . R .
Programming multiple regression. Computer0
-
1 Uu u +
X
w m a ,
TABLE
11ESTIMATED
S U R V I V A L PERIOD FOR MALES{DAYS)
---A
---
- - -p r e a (Per c e n t of body surface)
7
TABLE LIl. Age
( y e a r s )
--.--
--
ESTIMATED SURVIVAL PERIOD FOR FEMALES (DAYS)
20
.
-
Arcs ( P e r cent of body s u r f a c e )
I
10