Clustering and classification - DATA ANALYSIS AND INTERPRETATION

8. DATA ANALYSIS AND INTERPRETATION

8.5. Clustering and classification

Clustering and classification are useful interpretation tools for analysing large multivariate datasets. These methods are used for the automatic identification of samples (pixels) that have a similar radiometric signature.

8.5.1. Clustering (unsupervised classification)

Clustering, or unsupervised classification, is a spectral pattern recognition technique used for finding naturally occurring groupings (i.e. classes or clusters) in a multiband dataset. The class allocation rules are based on some measure of distance in the data space spanned by the multispectral bands. Because classes are identified on the basis of gamma ray signature, their origin is still open to interpretation. The resulting map pattern must therefore be interpreted with respect to some reference source, such as a geological map. The advantage of unsupervised classification is that it has the potential to identify all clusters. It provides an unbiased map generalization with sharp boundaries between clusters.

The K-means algorithm (Niblack, 1986) is an iterative clustering method. The user specifies the number of classes, and initial class centres (means) are determined. Each sample is then allocated to the closest class centre (using Euclidean distance). The class means are recalculated, and the process repeated for a specified number of iterations or until a convergence criterion is met. The radioelement grids should be standardized to zero mean and equal variance before applying to clustering algorithm. This is to avoid dominance of the largest concentration range of the eTh grid.

Harris (1989) applied a minimum “distance-to-mean” clustering of gamma ray data acquired over Nova Scotia, Canada. The most informative clusters were derived from 6-dimensional (K, Th, U, U/Th, Th/K, U/K) and 4-dimensional (K, Th, U, U/Th) data sets. Spatially continuous clusters with distinct boundaries showed good correlation with the mapped geology. These mainly corresponded with mapped granitoid plutons. Some of the clusters were related to anomalous Th content within some of the plutons. However, in some cases the sensitivity of the clustering algorithm was not sufficient to identify geologically significant, yet subtle, colour variations in the ternary radioelement map (Harris, 1989). This suggests that clustering should be used to complement, rather than replace, visual interpretation.

Graham and Bonham-Carter (1993) applied the K-means algorithm to airborne gamma ray spectrometric data from the Marathon area, Ontario, Canada. They analyzed the correlation between clusters and mapped geology using GIS overlay techniques. They found a moderate to strong spatial association between bedrock units and radioelement clusters. Areas where the cluster and geological maps differed were identified for further field verification.

Figure 8.2 shows an example of the application of cluster analysis to the mapping of compositional variations within a granite batholith in northeastern Alberta, Canada. The cluster analysis yielded four classes with distinct radioelement signatures. The clusters were interpreted as Th-rich and Th-poor intrusive phases, which were both subdivided into relatively U-poor and U-rich classes. A comparison with the major element geochemistry indicated that the trend in U in the Th-rich cluster follows a trend in SiO2. The increase of U with SiO2 suggests a magmatic differentiation trend, the most evolved stages of which represent hydrothermal U-enrichment of the country rock envelope (Schetselaar, 2002).

-K-eTh-eU -eTh +K+eTh +K++eTh++eU

Cluster mean plot

FIG. 8.2. Cluster analysis on the basis of K, U/Th, U/K grids, Arch Lake granite, northeastern Alberta, Canada. The cluster mean plot shows the deviations from the mean of each cluster for each of the six variables (Data from Geological Survey of Canada, Open File 2807, Charbonneau et al., 1994).

8.5.2. Supervised classification

Supervised classification methods are based on classes that have been pre-defined as meaningful with respect to ground observations. The decision rules for class allocation are based on sample sites considered representative for these classes. This sampling procedure, known as the training stage, distinguishes supervised classification from clustering routines.

The decision rules are used to allocate samples to particular classes in the subsequent classification stage.

Supervised classification has been widely applied in the analysis of multispectral image data acquired by earth orbiting satellites for land cover mapping. The technique has also been applied to the classification of lithological units using gamma ray spectrometric data in combination with Landsat TM data and other airborne geophysical data, such as magnetic and electromagnetic data (Lanne, 1986; Eberle, 1993; An et al., 1995; Schetselaar et al., 2000).

The probability that a pixel belongs to a class depends on the ratios between the probability density functions of that particular class and the other classes. The available classification algorithms differ in the way the probability density functions for each class are modelled and estimated from the training data (McLachlan, 1992). The classification algorithms can be broadly categorised in: (1) parametric classifiers that model the class probability density functions with the estimated parameters of a multivariate normal distribution or (2) non-parametric classifiers that directly estimate the class probability density functions from the data. A recommended practice is to experiment with a number of algorithms. These are best applied to an area where the geology is well known to evaluate the performance of the different algorithms.

8.5.3. Case study

Schetselaar et al. (2000) used supervised classification of airborne magnetic and gamma ray spectrometric data to map lithological units in the Canadian Shield of northeastern Alberta, Canada. The geology is composed of a Paleoproterozoic granite gneiss basement complex with enclaves of metasedimentary rocks intruded by numerous Paleoproterozoic granitoid plutons. The plutonic suites have distinct gamma ray responses as a result of their distinct accessory mineral content (Charbonneau et al., 1997). Internal lithological variations in the basement complex can be seen in the magnetic data, but are poorly resolved in the gamma ray data.

A flow chart of the supervised classification procedure is shown in Figure 8.3. The analysis first focused on a training area consisting of 2795 field stations (step 1 in Figure 8.3). The resolving power of different combinations of images was then evaluated (step 2 in Figure 8.3). This analysis showed that a 5-channel data set consisting of the three radioelement channels, plus the total and residual magnetic field were best suited to outlining the ten lithological units. Multi-box whisker plots showing the distributions of the grid channels per lithological unit (Figure 8.4) give insight into the discriminating power of the grid channels for the different lithological units.

FIG. 8.4. Multi box-whisker plots for K, Th, U, total and residual field magnetic data lithological units.

Maximum likelihood, predictive and artificial neural network classifiers were used to predict the ten lithological units in the training area (step 3). The resulting classification patterns are shown in Figure 8.5. The classification yielded 65-70% correlation with the mapped geological units in the training area. The classification was then extended to the total study area (step 4), and compared with the regional geological map (Figure 8.6). The discrepancies between the classified patterns and the regional geological map were used to identifying targets for refinement of the geological map or as potential exploration targets (step 5). Some of the targets included internal zonations in granitoid plutons, uranium anomalies, and magnetic intrusive units in the near subsurface. As with the cluster analysis discussed earlier this study suggests that supervised classification is a useful method to assist geologists in fieldwork planning and in identifying targets for map refinement and exploration.

FIG. 8.5. Classification results from the training area, Canadian Shield of northeastern Alberta.

Dans le document Guidelines for Radioelement Mapping Using Gamma Ray Spectrometry Data | IAEA (Page 112-116)