Implementation Issues - Spatial Statistical Analysis and Geographic Information Systems

Spatial Statistical Analysis and Geographic Information Systems

3.5 Implementation Issues

In the implementation of a framework for spatial analysis within a GIS many issues can be addressed by means of familiar techniques. These techniques do not neces-sarily fit neatly within our classification of spatial analysis into the four modules of data selection, data manipulation, exploratory and confirmatory analysis (Fig. 3.1), but many methods are important in more than one module. In order to make our discussions less abstract, we next review a number of ways in which specific tech-niques would be incorporated into our framework. It is important to keep in mind that this will only give a general flavor of what we envisage as a general purpose spatial analysis system, since a detailed inventory of techniques is beyond the scope of this paper. Also, much remains to be addressed, and many tricky methodological problems have not yet found a satisfactory solution.

Most of the decisions made about the selection, manipulation, and analysis of spatial data can be thought of as strategies designed to avoid, specify, or account

for the effects of spatial dependence. The data available in a GIS are rarely refer-enced in spatial units that are appropriate for final analysis. For example, pixel data, which are highly spatially dependent, must be aggregated for land use studies. In the data selection process (the first analysis module in Fig. 3.1), the nature of the data dependence should be evaluated before a representative sample can be designated.

In a well-known study, Openshaw and Taylor (1979) summarize the results of extensive experimentation in which scale changes radically altered the correlative and autocorrelative relationships among variables. Arbia (1989) claims that it is the spatial autocorrelation, or the dependence of nearby spatial units on one another, that is responsible for changes in summary measures as scale is changed. If units are summed into larger units, the mean increases, the covariance increases, and the correlation decreases in absolute value in proportion to the change in the size of the units. In all but a few circumstances, however, the variance increases in relation to the changed size of units and to the correlation between specified neighboring units.

Immediately it becomes clear that statistical tests will be affected by the chosen scale(see also Haining, 1991). This being the case, the selection of an appropriate sample is a crucial decision to which a great deal of attention must be given. This is particularly important, since it often is not clear whether the so-called modifi-able areal unit problem is indeed an artifact of a particular data set, as is typically assumed, or instead should be attributed to the use of an improper model and/or technique, as argued by Tobler (1989).

If is difficult to predict how the moments of a spatial sample will change with changing scale in all but the simplest circumstances, that is, when the specification of the relationship between spatial units is simple, and therefore, not particularly interesting. In addition, when spatial units are of unequal size, weighting schemes to “equalize” them must be arbitrary and, as a consequence, one must settle for a range of test results rather than a specific value. It is clear that any multi-purpose GIS must be capable of assisting the data selection process by containing flexible clustering and aggregation algorithms.

The manipulation of spatial data (the second spatial analysis module in Fig. 3.1) may result in the creation or smoothing of a surface or the partition of data units into polygons. These types of operations rest to a large extent on the evaluation of the degree of spatial dependence present in the data. The creation of a surface by interpolation is based on the nature of trends or regularities in the data. Filtering a complex surface into a smooth one is essentially an exercise in specifying a structure for spatial dependence. In order to carry out these operations, a GIS might contain a number of measuring devices that evaluate dependence. Various cross product statistics (Hubert et al., 1981) such as Moran’s I, Geary’s c, the variogram, and Getis and Ord’s G are all helpful in this regard (Cliff and Ord, 1981; Haining, 1990a;

Cressie, 1985; Getis and Ord, 1992). In addition, smoothing techniques can be based on spectra (Rayner, 1971), trend surfaces, spatial adaptive filtering (Foster and Gorr, 1986), and smooth pycnophylactic interpolation (Tobler, 1979b), to name only a few commonly used methods.

For the creation of partitions, meaningful criteria should be based on the depen-dence structure of the spatial data under investigation. The techniques mentioned

in the last paragraph can be used for this purpose as can clustering algorithms.

Similarly, Thiessen polygons and associated tesselation techniques are often-used partitioning devices (Boots, 1985).

Perhaps of greatest importance for the preparation and manipulation of spatial data for further analysis is the need to fill a surface with estimates of variable values when data are missing. For example, a GIS may contain data at points when the analytical interest is in areas. This problem of missing spatial data has received considerably attention (for a review, see Griffith et al., 1989) and many techniques have been implemented in operational GIS, e.g., based on kriging (Cressie, 1986;

Davis, 1986; Oliver and Webster, 1990).

As pointed out before, the precise allocation of techniques to the exploratory spa-tial data analysis (ESDA) and confirmatory spaspa-tial data analysis (CSDA) modules is not always clear (the third and fourth spatial analysis modules in Fig. 3.1), although there are some major differentiating characteristics between the viewpoints taken in each (see Anselin, 1988; Haining, 1990a). Suffice it to say here that ESDA is that phase of analysis in which spatial patterns and structures are revealed, hypothe-ses proposed and models suggested. In contrast, CSDA includes the entire roster of techniques and methodologies for hypothesis testing, the determination of confi-dence intervals, estimation, simulation, prediction and the assessment of model fit.

In ESDA one searches for structure and association, while in CSDA one evaluates the evidence. As Haining (1990a) points out, one alternates in the application of the two aspects of spatial data analysis, similar in spirit to the idea behind EDA advanced by Tukey (1977).

The various elements of ESDA include those which aid in the identification and description of patterns and variables, elicit the characteristics of variables and patterns, help determine the extent of data dependence and heterogeneity. In addi-tion, ESDA should also allow for simple modeling, especially so that residuals can be evaluated and the selection of a “best” subset of explanatory variables can be determined.

A wide array of techniques are available for ESDA. These include the stan-dard tools of EDA and statistical graphics, such as box plots, star plots, Chernoff faces, etc., as well as many of the measures mentioned above. In addition, pattern recognition devices such as those discussed in the artificial intelligence and spa-tial statistics literatures are highly relevant here, e.g., as outlined in the work of Ahuja and Schachter (1983); Pielou (1977); Ripley (1981); Boots and Getis (1987).

However, the “spatial” aspects of ESDA have to date not been fully developed. In this respect, approaches that blend the analytics of the traditional techniques with the computing power and interactive graphics of some of the recent developments could show great promise.

In addition to the predominantly non-parametric approach taken in traditional EDA, one often also needs to know moments, errors, and other parametric charac-teristics of samples and surfaces at different scales. For example, the parameters of simple linear regression, trend surfaces, periodicities, semi-variograms and correlo-grams are often useful. Directional statistics and spatial ANOVA are tools that could be included in any exploratory analytical module. In addition, categorical variables

are often mapped by GIS users. Thus, logit analyses of overlapping variables would prove useful in the exploratory stage of analysis.

It is here that the distinction between ESDA and CSDA becomes difficult. Indeed, the standard tools of CSDA consist of estimation algorithms for a wide range of specifications, both linear and nonlinear. The spatial aspects of such analysis are often identified with the field of spatial econometrics, i.e., “the collection of tech-niques that deal with the peculiarities caused by space in the statistical analysis of regional science models” (Anselin, 1988, 7). In essence this boils down to four broad categories of methods: (1) diagnostics for the presence of spatial dependence and spatial heterogeneity in regression analysis (this includes ANOVA and trend surface models as special cases); (2) methods to estimate and obtain inference (e.g., based on maximum likelihood, instrumental variables or bootstrap estimators) for various types of regression models for cross-sectional and space–time data that explicitly take into account spatial effects (e.g., spatial process models); (3) methods to esti-mate and obtain inference that are robust to the presence of spatial effects; (4) spatial measures of model validity. Although much methodological progress has been made in these areas, a number of very tricky issues remain to be resolved, such as the issue of spatial dependence in models with limited dependent variables (e.g., logit, probit and Poisson regression models), the discrimination between spatial depen-dence and spatial heterogeneity, nonstationarity in models for space–time data, edge effects, etc. (Anselin, 1990). To some extent then, the implementation of CSDA in a spatial analysis system is constrained by the state of the art, which to date is still unsatisfactory to be able to answer the range of questions faced by the users of a GIS.

Dans le document Advances in Spatial Science (Page 57-60)