• Aucun résultat trouvé

Data description and modelling technique

4.2 Habitat Suitability Models

4.2.1 Dataset characteristics

4.2.1.2 Biotic data

Overall, 13 biotic groups are considered in the original data set: Amphibians, Birds, Butterflies, Diatoms, Fish, Macro-algae, Macrofauna, Mammals, Macrophytes, Nematodes, Phytoplankton, Reptiles and Zooplankton. Building on Chapter 2, in-depth attention will be given to the description of the macrophyte data.

Macrophyte occurrence was collected with a variety of techniques, including the Tansley-scale, the Braun-Blanquet method and the basic indication of presence (STOWA, 2001) of which an overview is provided in Table A.3. After removal of misclassified algae, undefined species, hybrid species and ambiguous naming (e.g. only family name), a total of 1148 macrophytes remained. Within the latter, responses suggesting macrophyte presence (e.g. abundance and cover percentage) were replaced by a ‘presence’ statement, otherwise ‘absence’ was assumed as to supplement the presence-only data (Elith et al., 2006). Hence, a presence-absence data set was obtained, with the notion that an assigned ‘absence’ statement does not necessarily reflect a true absence (see Chapter 3) (Anderson and Raza, 2010). Despite the high number of macrophytes remaining in the data set, only a minority (Nbio = 23) occurred at more than 10 % of all sites (Ninst = 77 200), with the lowest degree being 0.001 % (see Figure 4.2).

Figure 4.2: Information within the macrophyte dataset. In total, 1148 macrophytes were observed to be present in at least one location between 1968 and 2012, with the lowest degree being 0.001 %. Only 23 macrophytes were recorded as being present within more than 10 % of all instances (i.e. black surface above the dashed grey line) and thereby represent a clear minority.

DATA AND MODELLING TECHNIQUE

91 4.2.1.3 Common data

Each instance within the physicochemical and biological data set was characterised by a unique spatiotemporal identifier (STOWA, 2001). Both data sets were reduced to only contain instances with information on the physicochemical and biological situation for each common identifier to avoid a spatial or temporal mismatch between the prevailing abiotic conditions and the observed macrophyte community. Consequently, a significant reduction in data was obtained, with only 4344 instances remaining and simultaneously affecting both the chemical and biotic data sets by reducing the temporal range to the period between 1978 and 2011 (see Figure A.3).

At the physicochemical level, a minor reduction occurred from 199 variables to 174 variables, while at macrophyte level the original 1148 species were reduced to 576 species. During this extensive data selection, the abiotic conditions were assumed to represent the general conditions occurring at that specific location, i.e. that no extreme event had occurred recently.

The resulting combination of physicochemical and macrophyte data were characterised by a geographical distribution throughout the Netherlands (see Figure 4.3, excluding 80 sites that lacked correct georeferencing), which indicates that spatial coverage is not completely uniform. Moreover, a variety of water bodies was sampled, although the majority (Ninst = 1729) was not characterised. Additional classes included streams (Ninst

= 928), brooks (Ninst = 926) and canals (Ninst = 206).

Figure 4.3: Geographical distribution of sampling sites with physicochemical and macrophyte information. Data is collected between 1978 and 2011 throughout the Netherlands and collected in the Limnodata Neerlandica. A total of 4344 instances were available in the data, but only 4264 were spatially referenced.

CHAPTER 4

92

The common data was additionally characterised by a temporal distribution, showing differences between and within years. More specifically, sampling sites were visited and assessed between 1978 and 2011 and showed that during the first years (1978 – 1987), data was collected throughout the year, while this became more restricted in recent years. For instance, samples were initially also collected during the colder months (November – February), while this occurred only sparsely after 1996. This is clearly illustrated by Figure 4.4 and indicates that additional boundary conditions might be necessary for temporal analysis. For instance, to avoid bias when assessing the trends in physicochemical conditions, it can be recommended to only include data from the warmer months (e.g. April – September). Yet, when inferring the realised niche of a macrophyte species, it is assumed appropriate to include all instances, as winter months might represent unsuitable conditions and help in delineating the abiotic habitat that supports the survival and establishment of the considered species.

Figure 4.4: Temporal distribution of instances within the combined physicochemical and macrophyte observations. Instances were collected between 1978 and 2011 throughout the Netherlands and throughout the year. In time, less instances were collected during the winter period (November to February). The dots indicate for which month instances were collected, while the darkness of the tile indicates the number of instances (see also Figure A.3).

The combined data was characterised by a high degree of missing data points (i.e. 93.7

%), which were all part of the physicochemical data and were distributed differentially over the recorded variables and included instances. For example, only a few variables (Nvar = 6) contained information for more than 50 % of all instances, with the lowest degree of variable-wise data availability being 0.02 % (Figure 4.5A). At instance-level, only a few locations (Ninst = 63) were described by more than 20 % of all recorded variables, with the lowest degree of completeness being 0.57 % (Figure 4.5B).

Macrophyte information showed an increase in prevalence for a few species when compared with Figure 4.2, with prevalence ranging between 0.02 % and 40.0 % (Figure 4.5C), though only a minority was recorded at more than 10 % of all locations (Nbio = 20).

DATA AND MODELLING TECHNIQUE

93 Figure 4.5: Characteristics of the available information within the combined data. A:

Variable-wise information availability; B: Instance-wise information availability within the physicochemical data and C: Macrophyte prevalence depicted as availability.

The number of missing data points can be reduced by removing incomplete variables and instances from the data set. However, despite providing a reduction in the percentage of missing data, such removal also reduces the number of variables in the data (Appendix A.4, Figure A.4). From this analysis, it is clear that a reduction in missing data can be obtained, though that their presence within the final data set is hard to avoid. For instance, to obtain a reduction from 93.7 % to 50 %, about 154 variables were removed, leaving only 20 variables within the remaining data set. It is therefore considered appropriate to perform data imputation in order to avoid an overly simplified assessed environmental domain. This necessity is caused by the fact that variables have been recorded relatively randomly, as illustrated by a more detailed visualisation of the missing data in Appendix A.4, Figure A.5.

CHAPTER 4

94