R OBERT H AINING
Spatial modelling and the statistical analysis of spatial data in human geography
Mathématiques et sciences humaines, tome 99 (1987), p. 5-25
<http://www.numdam.org/item?id=MSH_1987__99__5_0>
© Centre d’analyse et de mathématiques sociales de l’EHESS, 1987, tous droits réservés.
L’accès aux archives de la revue « Mathématiques et sciences humaines » (http://
msh.revues.org/) implique l’accord avec les conditions générales d’utilisation (http://www.
numdam.org/conditions). Toute utilisation commerciale ou impression systématique est consti- tutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
SPATIAL MODELLING AND THE STATISTICAL ANALYSIS OF SPATIAL DATA IN HUMAN GEOGRAPHY
Robert HAINING*
1. INTRODUCTION
Geographical
dataanalysis involves
theanalysis
of one or morevariables (i)
referenced
by
location(j)
and ortime (t).
Ageographical
observation(zi9illt) is sometimes, therefore, represented
as an element of a data1J
"cube"
the three axes ofwhich
reference thevariable set,
thelocation
set and the time set(Haggett (1981)). Any particular analysis
mayeither
be on a sub-block of the cube(for example multi-variate space-time
dataanalysis)
or, to use a
geological analogy,
a"thin section"
of the cubeparallel
to oneof the faces
(for example multi-variate time series analysis, univariate space-time
dataanalysis
ormulti-variate spatial analysis),
or a"transect"
of the cube
again parallel
to one of the faces(for example uni-variate time
seriesanalysis
or uni-variatespatial analysis).
Geographical
dataanalysis is
much concerned with theanalysis
of datain space and time and since events often possess
spatial
andtemporal
continuity,
orpersistence,
theapplication
ofstatistical
methods in humangeography
has torecognise
thatgeographical
data areunlikely
tosatisfy
theclassical assumption
ofindependence which under-pins
much standard statis-tical theory.
Itis
theproblems
created whenthis assumption
nolonger
holdswhich
are the focus ofthis
paper.Spatial
dataanalysis
and time series dataanalysis
have several fea-*
Department
ofGeography,
TheUniversity,
Sheffield.Paper prepared
for the InternationalCongress
on"Mathematics
for theSocial Sciences", Marseille,
France. June 22nd -27th,
1987.tures in common.
Space, like time, imposes
anordering
on the data set whichmust be
recognised
andretained
in anyanalysis.
It is notsufficient simply
to record the level of
unemployment
in aregion
over time -10% for six of themonths, 10,5%
for three of the months and so on. Thetemporal
order of valuescan tell us much about whether
unemployment
isshowing signs
ofincreasing
ordecreasing
and when taken with other information mayhelp
to indicatewhy
thechanges
aretaking place. Similarly
thearrangement properties
of values on a map mayprovide important
information on how the mappattern
hasarisen.
Spatial
data liketemporal
data also showdifferent scales of variation
whichneed to be
separated
out if anyexplanation is
to be made of totalvariation.
Finally, spatial
datalike temporal
data mayconsist
ofdifferent eomponents of variation
and adistinction is
often drawn betweendeterministic
or func-tional elements of a
series
and stochastic elementssince
these may be asso- ciated withdifferent types
ofgenerating
processes.But there are also differences between
spatial
andtemporal
data ana-lysis.
Time hasdirection -
thepast
may have influenced thepresent,
but thepresent
cannot have influenced thepast (except
in the sense that eventsmay be influenced
by expectations
of thefuture). Space
has no such natural direction and an event at onepoint
can diffuse in all directions toinfluence events elsewhere on the map. There are
special exceptions
tothis (the directionality implied by
thehierarchical ordering
of towns in a cen-tral
place system)
butthey
are not socommonly
encountered. Time is one-dimensional
whereas space is two-dimensional(again
with theexception
ofstudies of linear features such as lines of
communication)
so that thedepen- dency
structures inspatial
data arelikely
to be morecomplex
with thepossi- bility
of differentdependency patterns
indifferent
directions.Temporal
data are
usually
collected in terms of aregular partitioning (each month,
each year,
...)
whereasspatial
data areusually
recorded in terms of anirregular
arealpartitioning
which may confound and mask the contributions of different scales ofvariation. Finally,
whereas time appears as ahomo-
geneousmedium within
which events occur, space appears as aheterogeneous
medium and inter
point
andinter
areadistance relationships
may be measured in manydifferent
ways.This paper will focus on uni-variate and multi-variate
spatial
dataanalysis.
There has been aconsiderable growth
in interest in this areaamongst
statisticians and others in the last few years reflected in the recentpublication
of booksby Ripley (1981),
Cliff and Ord(1981)
andUpton
and
Fingleton (1985)
onspatial statistical
methods. Itis
myimpression
how-ever that these methods are still much less well known than those of
time series analysis. They
are also much lesscommonly used, perhaps
in part be-cause of the lack of
appropriate computer
software. The area ofspatial
sta-tistics
shares commonground
with some of the methods ofgeostatistics
ofwhich
the booksby
Matheron(1971)
and Journel &Huijbregts (1978)
areparti- cularly important. Whilst
the methods ofgeostatistics
havepenetrated
areasof
physical geography
and remotesensing they
are notwidely
used in humangeography.
Inthis
paper therefore I shall focus on the first set of methods for theanalysis
ofspatial data,
theproblems
that are encountered and pro-vide specific examples
of each of the main areas. In some of theexamples
thedata is of a
space-time
form.However,
it is oftenappropriate
toanalyse
such data as a series of
spatial "slices". This
is because thetime period
between successive observations is
sufficiently long
thatobservations
can be treated asindependent in time.
In the
following
sections four classes ofproblems will
be considered :(1)
how to makecomparisons
between two sets ofregional
data on asingle
variable
(such
asincome, mortality rates, etc.)
recorded either for twodifferent
areas or for the same area at twodifferent points in time; (2)
how to
establish
theexistence
of astatistical relationship
between dataon two or more
variables
collectedin
the same area at the sametime; (3)
how to estimate
missing
values in aspatial
record(often
encountered incensus
data); (4)
thespecification
andestimation
of"spatial process"
models.
In each of these
sections
anunderlying
issue is theproblem
ofstatis- tical inference
for data whereindependence
of observations does not hold and where thedependency
structure in the data has to be estimated and builtinto
the estimation and inference
procedures.
In thediscussion
that follows thereare certain
implicit assumptions
and we nowspecify
what these are.Variation
inspatial
data is assumed toarise
from threecomponents :
adeterministic
structuredelement,
astochastic structured
element and a local random elementor
noise.
Thefirst
two elements are oftenrepresented
in terms of thefirst
two moment
properties
of someprobability distribution,
thedeterministic
elementbeing equated
with the mean(not necessarily constant)
of thatprobability distribution.
Inaddition
animplicit
and untestableassumption
is that the variation in
geographical
data arises from one or more of thesethree elements and
represents
asingle
realisation of someprobability
model.This
implies
that eachobservation
is aspecific
value of a randomvariable
associated
with thegiven location.
Matheron(1971) suggested
the term"regionalized variable"
for the random variable itself.This "conceptualisa-
tion"
ofreality
in terms ofprobability
models has manyproblems
butis justified according
to Matheron(1971)
if it allows us"to
solveeffectively practical problems
which wouldotherwise
beunsolvable" (p.6).
We take theview that
this
is the case in theproblems
to be tackled inthis
paper. In the next section we describe twoapproaches
to the measurement ofspatial
variation
as an essentialpreliminary
toconsidering
the morespecific
pro- blemsidentified
above.2. REPRESENTATIONS OF SPATIAL DEPENDENCY
Spatial
statisticalanalysis
ingeography
deals with finiteregions (D) .
There are two
approaches
to themodelling
ofspatial
processes onfinite
spaces :(1)
consider the process as therestriction
to D of astationary
process defined on a
larger region; (2)
consider the process as defined onD
with
border orboundary
values set(for example)
to the mean of the pro-cess. In the second case
(unlike
thefirst)
the process is notstationary.
So,
suppose V ={v..}
denotes thematrix
of autocovariances. In the case- ij
of processes defined
according
to(1) ,
and alsoisotropic
where 02 is a scalar constant and
di, J
ij is the distance betweenregions
iand j .
In the case of processes definedaccording
to(2)
where
C(i,j) signifies
that the autocovariancedepends
on thelocations
ofi
and j .
If the first
approach is adopted (usually
when thestudy
area containsa
large
number of observations or is a subset of alarger
area as oftenhappens with remotely
senseddata)
then itis
usual to describe thedepen- dency properties
of a set of data in terms of the correlations orcovariances
of theprobability distribution.
If the secondapproach
isadopted (usually
when there are few observations and the
study
area is aclearly delimited
areal
unit
such as an island sub-divided into counties or acity
sub-divided into wards or censustracks)
then it will not bepossible
toestimate
co-variances
so thatdependency properties
areusually described
in terms of therelationships
between the randomvariables.
We consider each of these twodescriptive approaches
now.Suppose
theunderlying probability
model is second orderstationary (for definitions
see forexample
Journel andHuijbregts (1978)).
IfZ(x)
denotes the random
variable
atlocation x = (x,,,x2)
and E denotes mathe-matical expectation
thenwhere m
is
a constant. Furthermore for eachpair
ofvariables IZ(x),Z(x+h)}
the
spatial
covariance exists anddepends only
on theseparation distance h ,
that isOf course the mean may not be constant in which case m
is replaced by
afunction m(x)
which in humangeography
isusually
some order of trend surface model(see Chorley
andHaggett (1965)).
When h = o thenC(o)
isthe
variance
ofZ(x) .
C(h)
isusually
estimated for variousdistances ( h > 0 )
ordistance bands
by :
where z denotes an
observation
on Z and the sumis
taken over all then(h) pairs
that areseparated by
distance h .(Ripley (1981) p.80)
notesthat the estimator is unbiased for small distances but that the
sampling
variance
depends
onC(h)
and thatneighbouring
values of thecorrelogram
are
"substatially correlated"). Since,
ingeneral
the mean is neither aconstant nor known it must be
estimated
from the data(usually
apolynomial equation
where the order must also bedetermined)
and aniterative procedure
is
usually
recommended in which the mean is estimated then the set of co-variances
which are in turn used toprovide
animproved
estimate of the meanand so on until convergence. Further
complications
arise ifdirectional
co-variances
arerequired (Ripley 1981). Finally,
the covariancefunction
is oftenstandardized
to aspatial
correlation functionby dividing through by C(0) .
Usually spatial
correlation or covariancefunctions
are used toprovide
formal
descriptions
of thepatterns
of association between observations sepa- ratedby varying
distances. Sibert(1975) computed directional spatial
corre-lations
to describespatial variations
in assessed land values in AnnArbor, Michigan
as apreliminary step
todeveloping
apredictive
model of land values.In other
instances
thecorrelation function
has been used totry
toidentify
characteristics
of theunderlying generating
process as in astudy by Cliff
et al
(1975)
on thediffusion
of measlesepidemics
in S.W.England.
Peaksand
troughs
in anaveraged spatial
correlationfunction
wereinterpreted
assuggesting
a"central place plus
wavediffusion" pattern
ofspread. Haining (1981) using
ruralpopulation density
data for the American Mid Westestimated correlations
and fittedspecific
models to these functions in order to test acentre-satellite
model ofpopulation dispersal.
Spectral analysis,
the fourier transform of thespatial
covariance func-tion,
has also been used todescribe spatial variation.
Tobler(1969) analysed population density along
U.S.highway
40using spectral
methods in order totest whether central
place competitiveness
wasimportant
instructuring
thespatial organization
ofpopulation. By
andlarge
howeverspectral
methodshave not been
widely
used in humangeography mainly
because of the need forlarge sample
sizes and theirregular spatial distribution
of much two dimen-sional geographical data, although
if these two conditions are not aproblem
then this form of
analysis
has moresatisfactory asymptotic sampling theory
than that based on covariances
(Ripley 1981,
p.80).
We now turn to the second
approach,
thatis
therepresentation
of de-pendency
in terms of variaterelationships.
Inthis approach
attention focusesdirectly
on the randomvariables
themselves. AsRipley (1981)
shows there are twodistinct approaches
to theformulation
of variate interaction schemes - thejoint (or simultaneous) approach
and the conditionalapproach.
In theformer the
dependency
structure is modelled as a set ofsimultaneous equations
in
which Z(x.)
isexpressed
as alinear
function of some subset of the other-i
(usually neighbouring)
randomvariables.
Forexample in
the case of a rectan-gular
latticesystem
where thesites
are referencedby (i,j)
on the twoorthogonal
axes :where p is a
parameter
thatprovides
a measure of thestrength
of variateinteraction
ande(i,j) is
some random ornoise
process. Inpractice
modelsof this
type
arespecified by first imposing
agraph
structure on thesystem
of sites. Thisspecifies
which randomvariables
interact with one another. In the case ofirregular
sitedistributions
thisis usually expressed by
a con-tiguity
orweighting matrix
W ={Wk,t}
whereHere k and t denote two sites and non zero values
in W
may bebinary (to
denoteinteraction / no interaction)
orweighted
in order toreflect,
forexample,
how closetogether
the two sites are. Thechoice
ofweighting
schemeis
clearly
animportant
one and thereis
muchdiscussion
in theliterature
on how tospecify
W . In mostapplications
theweighting
matrix W isspeci-
fied
in
advance to reflectrelational properties
between thesites.
This isparticularly important
in the case of the non lattice datatypical
of geo-graphical
data sets. Sometimes alternativeweighting
schemes are tried tosee how
sensitive
results are to the choice of W .Recommendations
totry
to
estimate W
as if it were a set ofparameters
havegenerally
not beenfollowed up in
part
because of thecomputational difficulty
of such a pro- cedure.Discussion
of these issues can be found inBesag (1975)
and Cliffand Ord
(1981)
andUpton
andFingleton (1985).
There are subtle and
important differences
between the conditional andjoint approaches (see Besag 1974)
and for the mostpart
it issimultaneous
schemes of the sort describedby (2.1)
that rave been most often used in humangeography.
Itis important
torealise
that all variate schemesgenerate
spe- cific theoreticalspatial
correlations so that the set ofempirical
corre-lations are sometimes used for model
specification
that is forchoosing
avariable interaction
scheme.However,
for anyfinite lattice,
models such as(2.1)
will not bestationary
if sites on theedge
of the lattice haveonly
2
(corner)
and 3(edge) neighbours. This
caneasily
be shownby evaluating
[(! - P!i) T (! - P!i)] -1 which is
theautocovariance matrix
for(2.1) if
thee(i,j)
processis N(0,1) .
Therefore these models tie in with the secondapproach
todefining spatial
processes on finitedomains although
for suffi-ciently large lattices
these models are often treated asstationary
(Whittle 1954).
Variate
spatial
interaction models have been used todescribe spatial
patterns
in humangeography (Sibert 1975).
Some models in humangeography
are
expressed
in terms of univariatespatial relationships
such as theBechmann-McPherson model of central
place population
sizes and are thereforeappropriately
estimatedusing
theestimation theory
for these models.Haining
(1980) provides
anexample
in this area. These schemeswill figure
moreprominently
in section 5 below.So far in
describing
the use of correlation functions andvariate
inter- action models we have assumed thatspatial dependency
is present in anygiven
set of data. To assume that such
dependency
may bepresent
is asensible
pre- caution inspatial analysis
but as will be evident in latersections,
beforeadopting
the sorts ofprocedures
and remedial action this condition necessi-tates,
it is also sensible to test at somestage
whetherspatial correlation is present in
the data(statistically significant).
In fact this was one ofthe
first problems
to be tackled and two booksby
Cliff and Ord(1973, 1981)
describe the
principal procedures
that areavailable.
The mostcommonly
usedprocedure -
the Moran test - isclosely
related to anestimator
forfirst
order
spatial correlation. Upton
andFingleton (1985
Ch.3)
have reviewedthe range of test statistics and have
attempted
to evaluatetheir effective-
ness
(Fig. 3.15).
Tests are also availablein
the context of thevariate
interaction
modelapproach.
Forexample
a test ofhypothesis
on p in(2.1)
is
equivalent
to a test ofspatial
correlationsince
if p is not found to bestatistically significant this
reduces the Z process to a random orwhite noise process. In
general
however these sorts of tests are more diffi- cult toapply
than thesimpler correlation -
based tests. There areimportant
distributional
differences between testsapplied
to raw data and thoseapplied
toregression
residuals and these have beendiscussed
atlength
inCliff and Ord
(1981).
This concludes the survey of methods for
describing spatial dependency.
Although
measures such as thecorrelation
series can be used toyield
subs-tantive
insights
the next twosections
are concerned with theapplication
ofstatistical methods where
interest
focuses on other data attributes rather than on the correlationproperties
of the data per se. Theproblem
is todevelop statistically
validprocedures given
theexistence
ofspatial depen- dency.
We return to thequestion
ofmodelling spatial dependency
in the finalsection.
3. UNI-VARIATE DATA ANALYSIS WITH SPATIALLY CORRELATED DATA
In this section we
consider
theproblem
ofmaking comparisons
between twosets of
regional
data withrespect
to asingle variable.
Aspecific problem might
be toidentify
whether there is asignificant
difference in the popu-lation mean value of a
given variable
between the two sets of data. We assumeinitially
a constant mean butconsider
indetail
trend surfaceanalysis
wherethe mean
is
notspatially
constant.Throughout
we assumeC(h) >
0 for allh which
is
the mostcommonly
encountered situation in humangeography.
If n observations
(z ( 1 ) , ... , z (n) )
are drawn from a set ofindepend-
ent and
identically distributed
random variables with mean p and variancefor
sufficiently large
n , isdistributed
N(~,02 z / n) .
This fundamental resultsunderpins
standardstatistical
infer-ence on means and
comparisons
between means.On the other hand if the random variables are
spatially
correlated thefollowing problems arise :
(i)
The truevariance
of zis greater
thanc2
z/n
even in thosesituations
whereg2 z
is known apriori.
(ii)
Inthose situations
where U2 must beestimated
from the data thez
classical
estimatoris biased downwards.
A fuller
discussion
of these and otherproblems
aregiven
inHaining (1987 (c)
and(d))
however the mainproblem
is thatby underestimating
the truesampling
variance of zstatistically significant
differences may be infer- red which are not valid at the chosen level ofsignificance.
One
approach
to thisproblem
is to select a model both for the popu-lation
mean(u)
and thespatial
correlationproperties
of the data. Weillustrate
the methodby
reference to aspecific problem
where the mean wasnot constant but was,
instead, described by
a trend surface.Haining (1978)
undertook astudy
ofspatial
cropyield variation
froman area of the American
High
Plainsusing
data for over 40counties
from1879 to 1969. Data were
available
every 5 or 10 years. The area chosen wasmarginal
in thatprecipitation
declinedsignificantly
from east to west andtemperature
increased from north to south. There were also soilgradients
which would be
expected
to lead to lower wheat and cornyields
in the west.The
analysis sought
to show whether thesegradients
were reflectedin
cropyield gradients
and whether theintroduction
of newfarming
methods(such
asdry farming
in theearly 1900’s)
andimproved
strains(especially
after1945)
and other humanimpacts
would reflect in agradual weakening
of geo-graphical
cropyield
trends overtime.
The model
specified
was :where
Z(i,j)
denotes the cropyield (for
wheat orcorn)
incounty (i,j)
and
X1
is the North-South axis andX2
is the East-Westaxis.
The termsS 0
31~1 ’
, andQ2
are the trend surface coefficients. Thee(i,j)
expres- sion is a model forspatial correlation with u(i,j)
a random orwhite
noise
term. The termN(i,j)
refers to the set ofneighbours
of the(i,j)
county,
and f denotes aweighting
ofe(k) .
Inthis
case the inclusionof the
spatial
correlationexpression
wasjustified
onempirical
and theo-retical
grounds.
The method ofestimation
is that describedby
Ord(1975)
which is a modified
version
of the Cochrane-Orcuttprocedure
used in timeseries
analysis.
The correlation element of the model was not
always significant
butthe
results, particularly
for the cornyield
data showedstrong
decreases inyield
both from east to west and north to south in theearly
years(1879- 1924)
thereafter trends were nolonger significant except
for certain iso- latedyears(notably
the severedrought
year of1934).
The central
problem
inapplying
standard statisticaltheory
to datawhich is not
independent
is that the amount of"information" (in
the statis- ticalsense)
available forparameter
estimation is less that thesample
size.Spatial dependence
in dataimplies
that some of the information carriedby
anobservation is
(partially
atleast) duplicated
in other observations(in particular
theneighbouring observations). Thus,
the effectivesample size
(call
thisn’ )
is oftensubstantially
less than the actualsample
size(n) by
an amount thatdepends
on the level ofspatial dependence present
in the data. It is thisproblem,
the need to accomodate thisdependency properly
and allow for it in
constructing sampling
variances which lies at the heart of the inferentialproblem
described in thissection.
Further discussion of these issuestogether
with other references isgiven
inHaining (1978a).
There are certain
problem
areas however where the issue of "informa- tionloss"
can be turned toadvantage.
If each observation carries informa- tion about other observations then if a census record forexample
is incom-plete
then the known data may be used to construct estimates of themissing
data.
(The
sameprincipal
underliesapproaches
toimage
reconstruction withremotely
sensed data(see Besag 1986)
and mapinterpolation
ingeostatistics
(Matheron 1971)).
Bennett etal(1987)
havesuggested
a maximumlikehood
pro- cedure forinterpolating missing
records in census data which is based onfinding
a model for the observed data on thegiven variable (usually
someorder of trend surface model
plus
correlated error model as in(3.1))
andthen
using
this toprovide estimates
of themissing
values. Theprocedure
isiterative
in thatparameter
estimates are re-estimated on the basis ofesti-
mates of the
missing
values at theprevious
round. There arebenefits
toestimating missing
valuesusing only
the variableitself
asopposed
to othervariables with which the
specified variable
is correlatedsince
the latterapproach
maycompromise
other sorts of(multi-variate) statistical analyses
the researcher may
wish
to carry out on the census data. In some cases it isan estimate of the
missing
value(together
with some measure of thelikely error)
that isrequired.
In other cases the aim is toanalyse relationships
between variables and
discarding
census tracts where one or more variables havemissing
values would lead to aconsiderable
waste of data. In the second casemissing
value estimates areonly required
in as much asthey
facilitate the
primary
researchobjective.
Theprocedure
that has been deve-loped
isappropriate
to bothtypes
of researchproblems.
4. MULTI VARIATE DATA ANALYSIS WITH SPATIALLY CORRELATED DATA
Correlation and
regression
are the mostcommonly utilised
statisticaltechniques for identifying relationships
between variables in human geogra-phy.
Bothtechniques
must bemodified
in theirapplication
if the geogra-phical
data arespatially
correlated.The Pearson
product
moment correlationcoefficient,
r , is a measureof
association
for twogaussian
variables( Y , Z )
whereIn the case of uncorrelated
( r = 0 ) gaussian
variables where ob-servations are
independent
has
Student’s
t distribution with n-2degrees
of freedom. Bivand(1980)
showed that if Y and Z are
spatially
correlated the effects on the sam-pling distribution
could be severe withserious underestimation
of the realtype
I errors.Clifford
and Richardson(1985)
havesuggested adjusting
the
statistic (4.1) by replacing
n with n’(the
effectivesample size)
where n’
(which
isgenerally less
than n if the data arecorrelated) is dependent
on thespatial correlation
in the two sets of data. If nearestneighbour
correlationvaries
from 0.2 to 0.8 in bothvariables, failure
to make the
suggested adjustment
results intype
I errors for a 5% testranging
from 8.2% to 52% . When theadjustment is
madetype
I errors arerestricted
to the interval 3.2% to 7% .(See
also Richardson and Hemon(1981, 1982)).
The correlation coefficient measures the
relationship
between two varia-bles without
taking explicit
account of the actualpositions
of the observa- tions(and
the Clifford and Richardsonadjustment
isdesigned only
toadjust
critical
values for r for theinformation
loss thatspatial dependency
intro-duces). Tjostheim (1978)
on the other handdeveloped
an index for ranked data that measures thedegree
ofspatial
association between twovariables.
The index computes the distance between eachpair
ofidentically
ranked observa- tions on the two variables and thusspecifically
takes account of thephysical position
of thereported
data. InTjostheims original
paper his moment anddis-
tribution
theory
for the statistic assumed that neither of the twovariables
was autocorrelated
which, given
theargument underlying
this paper,suggests
that the statistic asoriginally developed
is rather limited inapplicability.
There
have, however,
been studies of the behaviour of thestatistic
in thecase of autocorrelated data
(Glick 1982).
Note that this statistic does pro- vide additional information to thatprovided by
a correlation coefficient and Hubert andGolledge (1982) provide
a niceillustration.
In
regression modelling
classical Gauss-Markovtheory requires
that theerrors
(e(i))
are such thatIf the errors are
spatially
correlated(as evidenced,
forexample by applying
one of the standard tests for
residual
autocorrelation - see Cliff & Ord(1981))
thenassumption (4.3)
does not hold and theprincipal
consequencesare that
goodness
of fit(r2)
measures are inflated(as
shownabove)
and the true standard errors of theslope coefficients
are alsoinflated
so that there is the risk that variables will be retained in the final model whichare not in fact
statistically significant
at the chosen level.Hepple (1976) gives
anexample
of thisproblem arising.
Hisstudy
rela-ted to the effects of sales taxes and
transport charges
on new cars on the ave-rage second hand value of cars in the 49 states of the U.S.A. The
independent
variable
was new carprice differentials (between states) attribuable
to sales tax andtransport
cost differences. Theparameter
estimate for the in-dependent
variable was 0.686 with a t value of 3.52 which washighly signi-
ficant.
However autocorrelation wasidentified
in theresiduals
and a model for the errors asspecified by (3.1)
wasintroduced
into theregression
equa-tion.
The estimate of theparameter
p was 0.816(which is significant)
andthe t value on the
slope ocefficient
of theindependent variable
fell to0.099 which is no
longer significant. Hepple (1979)
hassince given
aBayesian analysis
of the sameregression problem.
What is evident from thisproblem
and also stressedby
Miron(1984)
is that the detection of autocorre-lation
is oftenindicative
of amisspecified
model. Statistical models may be used to allow for thismisspecification
but sometimes at theprice (as
in thecase of the
Hepple example)
ofhaving
noexplanation
at all of the observedvariation.
In manyinstances
it may therefore bepreferable
tovisually
inspect
thepattern
of residuals in order totry
toidentify
newvariables
that
might
beresponsible
for the observedresidual
correlation as in thecase of a
study by Haining (1981)
of urbanpopulation
variation in S.W.Wisconsin in which variations in non-service sector
employment
weresuggested
as
possible
causes of theresidual
correlationpattern.
Sometimes residuals of onesign
may clusterstrongly
in one area of the mapagain indicating possible variables
to include in anaugmented regression
model as in astudy
of income variation
by Haining (1987b).
Recently
Mardia and Marshall(1985)
haveplaced
the statisticalanalysis
of these sorts of
spatial problems
on a morerigorous
foundationderiving asymptotic
results for thesampling distribution
ofmaximum
likelihood esti-mators in the case of
gaussian variables. They
consider numericalalgorithms
and discuss
procedures
forspecifying
andestimating
theparameters
of models for the correlated error structure. An earlier paperby Ord (1975)
is alsoof
importance
in thiscontext,
howeverrelatively
littleis still
known about the smallsample properties
of these estimators and inferentialprocedures.
5. SPATIAL STATISTICS AND SPATIAL PROCESSES
The term
"spatial process"
is anambiguous
one but is used here to refer to any process in which thespatial distribution
ofobjects (such
as towns,farms,
shops, etc.)
has an effect on thebehaviour
ofvariables (such
as incomelevels,
innovation adoption, price
leveletc)
defined on thoseobjects.
So space it- self is not a process butspatial relationships
are one element in thespeci-
fication
of the process. Thesignificance
ofthis
may become clearer withexamples.
In this section we concentrate on two
types
of processes in which spa- tialconsiderations
enter into modelspecification -
these arespatial
compe- tition processes andspatial
transfer or interaction processes. We examine acompetition
processfirst.
Consider two
retailers
located on the same stretch of road in acity selling (more
orless)
anidentical product.
Theirpotential
customers areregularly passing
up and down the street andthey
maybe,
from time to timeat
least, noting
theprices charged
for thegoods
at either or both of the two stores. The decision ofwhere
topurchase by
customers may, all otherthings being equal,
be influencedby
theprice charged
at the two outlets.If the retailers are
sensitive
to this state ofaffairs,
a situation may welldevelop
in whichprices charged
at one retail outlet areresponsive
toprices charged
at the other retail outlet(and
viceversa).
We have described asimple spatial competition
process and if theproblem
isgeneralised
to alarger
number of retailers aprice "surface"
maydevelop
wihch inpart
re- flectscompetition
between retailsites.
The above
arguments underlay
twostatistical
models forspatial price
variation in
petrol retailing developed by Haining (1984).
We consideronly
one of them here
(developed
from a verysimple, neo-classical equilibrium model)
and for further discussion of theproblem
seeHaining (1986).
Letp
be an n x 1 vector and denote theprices charged
at n retail outlets-t
in a
city
or area.Further,
letD
andSt
denote n dimensional demand-t -
and
supply
vectors at time t . Asimple
model for suchspatial price
compe-tition might
bespecified
as follows :where c and e are n dimensional vectors of constants and A and B
are n xn ordered
matrices
whose rows and columnscorrespond
to the label-ling
of the nsites.
The demand vector is stochastic in that u is arandom vector with mean 0 and