Uncertainty, sensitivity analysis and the role of data based mechanistic modeling in hydrology

(1)

HAL Id: hal-00305071

https://hal.archives-ouvertes.fr/hal-00305071

Submitted on 3 May 2007

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

based mechanistic modeling in hydrology

M. Ratto, P. C. Young, R. Romanowicz, F. Pappenberger, A. Saltelli, A.

Pagano

To cite this version:

M. Ratto, P. C. Young, R. Romanowicz, F. Pappenberger, A. Saltelli, et al.. Uncertainty, sensitivity

analysis and the role of data based mechanistic modeling in hydrology. Hydrology and Earth System

Sciences Discussions, European Geosciences Union, 2007, 11 (4), pp.1249-1266. �hal-00305071�

(2)

www.hydrol-earth-syst-sci.net/11/1249/2007/ © Author(s) 2007. This work is licensed under a Creative Commons License.

Earth System

Sciences

Uncertainty, sensitivity analysis and the role of data based

mechanistic modeling in hydrology

M. Ratto1, P. C. Young2,3, R. Romanowicz2, F. Pappenberger2,4, A. Saltelli1, and A. Pagano1

1_{Joint Research Centre, European Commission, Ispra, Italy}

2_{Centre for Research on Environmental Systems and Statistic, Lancaster University, Lancaster, UK} 3_{Centre for Resource and Environmental Studies, Australian National University, Canberra, Australia} 4_{European Centre for Medium-Range Weather Forecasts, Reading, UK}

Received: 7 April 2006 – Published in Hydrol. Earth Syst. Sci. Discuss.: 25 September 2006 Revised: 14 March 2007 – Accepted: 29 March 2007 – Published: 3 May 2007

Abstract. In this paper, we discuss a joint approach to

cal-ibration and uncertainty estimation for hydrologic systems that combines a top-down, data-based mechanistic (DBM) modelling methodology; and a bottom-up, reductionist mod-elling methodology. The combined approach is applied to the modelling of the River Hodder catchment in North-West England. The top-down DBM model provides a well identi-fied, statistically sound yet physically meaningful descrip-tion of the rainfall-flow data, revealing important charac-teristics of the catchment-scale response, such as the na-ture of the effective rainfall nonlinearity and the partition-ing of the effective rainfall into different flow pathways. These characteristics are defined inductively from the data without prior assumptions about the model structure, other than it is within the generic class of nonlinear differential-delay equations. The bottom-up modelling is developed us-ing the TOPMODEL, whose structure is assumed a priori and is evaluated by global sensitivity analysis (GSA) in or-der to specify the most sensitive and important parameters. The subsequent exercises in calibration and validation, per-formed with Generalized Likelihood Uncertainty Estimation (GLUE), are carried out in the light of the GSA and DBM analyses. This allows for the pre-calibration of the the priors used for GLUE, in order to eliminate dynamical features of the TOPMODEL that have little effect on the model output and would be rejected at the structure identification phase of the DBM modelling analysis. In this way, the elements of meaningful subjectivity in the GLUE approach, which allow the modeler to interact in the modelling process by constraining the model to have a specific form prior to cali-bration, are combined with other more objective, data-based benchmarks for the final uncertainty estimation. GSA plays a major role in building a bridge between the hypothetico-deductive (bottom-up) and inductive (top-down) approaches

Correspondence to: M. Ratto

(marco.ratto@jrc.it)

and helps to improve the calibration of mechanistic hydro-logical models, making their properties more transparent. It also helps to highlight possible mis-specification problems, if these are identified. The results of the exercise show that the two modelling methodologies have good synergy; com-bining well to produce a complete joint modelling approach that has the kinds of checks-and-balances required in prac-tical data-based modelling of rainfall-flow systems. Such a combined approach also produces models that are suitable for different kinds of application. As such, the DBM model considered in the paper is developed specifically as a vehi-cle for flow and flood forecasting (although the generality of DBM modelling means that a simulation version of the model could be developed if required); while TOPMODEL, suitably calibrated (and perhaps modified) in the light of the DBM and GSA results, immediately provides a simulation model with a variety of potential applications, in areas such as catchment management and planning.

1 Introduction

Uncertainty estimation is a fundamental topic in hydrology and hydraulic modelling (see discussion in Pappenberger and Beven (2006) and Pappenberger et al. (2006a)). Mathemat-ical models are always an approximation to reality and the evaluation of the uncertainties is one of the key research pri-orities in every modelling process. The most widely used ap-proach to modelling is based on the description of physical and natural systems by deterministic mathematical equations based on well known scientific laws: the so-called reduc-tionist (bottom-up) approach. Uncertainty estimation is dealt with via calibration and estimation procedures that insert the deterministic approach into a stochastic framework (see a re-cent review in Romanowicz and Macdonald (2005) and the references cited therein).

(3)

A widely used approach to calibration in hydrology, and environmental sciences in general, is the Generalised Like-lihood Uncertainty Estimation (GLUE, Beven and Binley, 1992). This is a very flexible, Bayesian-like approach to cal-ibration, which is based on the acceptance that different pa-rameterisations, as well as different model structures, can be equally likely simulators of the observed systems (referred to variously as “ambiguity”, “unidentifiability” or “model equi-finality”). In GLUE, model realisations are weighted and ranked on a likelihood scale, via conditioning on observa-tions, and the weights are used to formulate a cumulative distribution of predictions.

In the calibration (model identification and estimation) framework, the understanding, rather than just the evalua-tion of the influence of different uncertainties on the mod-elling outcome, becomes a fundamental question. This is es-pecially the case if the modelling goal is to reduce the model uncertainties and resources for this task are limited. Global sensitivity analysis (GSA) can play an important role in this framework: it can help in better understanding the model structure, the main sources of model output uncertainty and the identification issues (Ratto et al., 2001). For instance, Pappenberger et al. (2006b, c, 2007) and Hall et al. (2005) have recently presented cases that achieve such an under-standing for flood inundation models, using sensitivity anal-ysis to support their analanal-ysis. Tang et al. (2006) also provide an interesting review of GSA methods within the context of hydrologic modelling.

The GLUE approach provides a methodology for cop-ing with the problems of lack of identification and over-parameterisation inherent in the reductionist approach, while a detailed GSA can make these problems more transparent. Neither of the two approaches, however, can provide a full solution of these problems.

Another approach to coping with uncertainty in envi-ronmental processes is the “top-down” Data-Based Mech-anistic (DBM) method of modelling, introduced by Young over many years (see Young, 1998, and the prior references therein). Here, the approach is inductive: the models are derived directly from the data based on a generic class of dynamic models (here, differential-delay equations or their discrete-time equivalents). In particular, the DBM approach is based on the statistical identification and estimation of a stochastic dynamic relationship between the input and output variables using advanced time series analysis tools, with pos-sible non-linearities introduced by means of non-linear State-Dependent Parameter (SDP) transformation of the model variables. Unlike “black-box” modelling, however, DBM models are only accepted if the estimated model form can be interpreted in a physically meaningful manner. This DBM methodology can also be applied to the analysis and sim-plification of large deterministic models (e.g., Young et al., 1996).

In contrast to the GLUE technique, the DBM approach chooses, from amongst all possible model structures, only

those that are clearly identifiable: i.e., those that have an inverse solution in which the model parameters are well defined in statistical terms. Moreover, this technique es-timates the uncertainty associated with the model parame-ters and variables, normally based on Gaussian assumptions. If the Gaussian assumptions are strongly violated, however, the DBM analysis exploits Monte Carlo Simulation (MCS) to evaluate the uncertainty. For instance, it is common in DBM analysis to estimate the uncertainty in derived physi-cally meaningful model parameters, such as residence times and flow partition parameters, using MCS analysis. Also, when the identified model is nonlinear, then the uncertainty in the state and output variables (e.g. internal, unobserved flow variables and their combination in the output flow) is normally evaluated using MCS analysis.

In this paper, we will discuss the problems of calibration and uncertainty estimation for hydrologic systems from a unified point of view, combining the bottom-up, reduction-ist TOPMODEL with the top-down, DBM approach, which plays the role of an “independent” benchmark. Note that, in this respect, we do not pursue any competitive compari-son, to judge or prove one type of model to be “better” than the other. Our choice of DBM modelling as the right can-didate for benchmarking, is based on its capability of pro-viding a meaningful description of observations, in addition to its simplicity, ease of interpretation and rapidity of imple-mentation. We will also highlight the role of GSA in build-ing a bridge between the two approaches by assistbuild-ing in the improved calibration of reductionist models and highlight-ing both their weaknesses and strengths. Initially, however, we will compare and contrast the two modelling approaches used in the paper.

2 The Different Modelling Approaches

Data-based Mechanistic (DBM) modelling is a methodolog-ical approach to model synthesis from time-series data and it is not restricted to rainfall-flow modelling; nor, indeed, to the specific rainfall-flow model identified in the present paper. Its application to rainfall-flow processes goes back a long way (Young, 1974) but the more powerful technical tools for its application have been developed much more recently (see e.g. Young, 1998, 2003 and the prior references therein).

DBM models are obtained by a relatively objective induc-tive analysis of the data, with prior assumptions kept to the minimum but with the aim of producing a model that can be interpreted in a reasonable, physically meaningful manner. Essentially, DBM modelling involves four stages that exploit advanced methods of time series analysis: data-based identi-fication of the model structure based on the assumed generic model form; estimation of the parameters that characterize this identified model structure; interpretation of the estimated model in physically meaningful terms; and validation of the estimated model on rainfall-flow data that is different from

(4)

the calibration data used in the identification and estimation analysis.

In the case of the specific DBM model used in the present paper, the identification stage of the modeling is based on the non-parametric estimation of a nonlinear State-Dependent Parameter (SDP) transfer function model, where the SDP re-lationships that define the nonlinear dynamic behaviour of the model are estimated initially in a non-parametric, graph-ical form, with the SDPs plotted against the states on which they are identified to be dependent. In the case of rainfall flow models, there is normally only one such SDP relation-ship: this is associated with the rainfall input and the state dependency is identified, paradoxically at first, in terms of the measured flow variable (see later). In physical terms, this nonlinear function defines the connection between the measured rainfall and the “effective rainfall”, i.e. the rainfall that is effective in causing flow variations. The relationship between this effective rainfall and the flow is then normally identified as a linear, constant coefficient, dynamic process and is estimated in the form of a 2nd (or on rare occasions 3rd) order Stochastic Transfer Function (STF) model, whose impulse response represents the underlying unit hydrograph behaviour. This STF model is the discrete-time equivalent of a continuous-time differential-delay equation and it could be identified in this form, if required (e.g. Young, 2005).

In the estimation stage of the DBM modelling, the non-parametric SDP nonlinearity is parameterised in a parsimo-nious (parametrically efficient) manner. Normally, this pa-rameterisation is in terms of a power law or an exponen-tial function, although this depends upon the results of the non-parametric identification, as discussed in the later ex-ample. The approach used in this paper has been devel-oped in the previously published DBM modeling of rainfall flow processes. Here, the STF model and the power law pa-rameters, are estimated simultaneously using a special, non-linear least squares optimisation procedure that exploits the Refined Instrumental Variable (RIV) transfer function esti-mation algorithm in the CAPTAIN Toolbox for Matlab (see http://www.es.lancs.ac.uk/cres/captain/). Depending on the application of the DBM model, this simple optimisation pro-cedure can be extended to handle a more sophisticated model of the additive noise process that includes allowance for both autocorrelation and heteroscedasticity (changing variance). In the case of the heteroscedasticity, the variance is normally considered as a SDP function of the flow, with higher vari-ance operative at higher flows.

The physical interpretation of this estimated DBM model is normally straightforward. As pointed out above, the in-put SDP function can be considered as an effective rainfall nonlinearity, in which the SDP is dependent on the flow. Al-though unusual at first sight, this state dependency makes physical sense because the flow is acting simply as a surro-gate for the soil moisture content in the catchment (catch-ment storage: see the previously cited references). Thus, when the flow is low, implying low soil moisture, the rainfall

is multiplied by a small gain and the effective rainfall is at-tenuated; whilst at high flow and soil moisture, this gain and the associated effective rainfall are large, so that the rainfall has a much larger effect on the flow.

The effective rainfall is the input to the STF part of the model, which characterises the dominant catchment dynam-ics and defines the underlying catchment scale hydrograph. Typically, this 2nd order STF model can be decomposed into a parallel form that reveals the major identified flow path-ways in the effective rainfall-flow system. These normally consist of a “quick” flow component, with a short residence time (time constant) that seems to account for the surface and near surface processes; and a “slow” component with a long residence time, that often can be associated with the replen-ishment of the groundwater storage and the consequent dis-placement of groundwater into the river channel. Of course, it must be emphasised that these components are unobserved (not measured directly) and so their identification is done via statistical inference based on the specific model form: in this case, a generic nonlinear differential equation, the specific structure of which is identified from the data with the mini-mum of a priori assumptions.

The situation is quite different in the case of TOPMODEL (Beven and Kirkby, 1979), which has the specific model structure that is assumed a priori and so constrains the es-timation of the component flows. Consequently, the flow decompositions of the DBM model and TOPMODEL are quite different. Moreover, in both cases, the interpretation of the decomposed flow components is subjective, depend-ing not only on the model form but also on how the modeller views the unobserved flow components within the hydrolog-ical context.

Any model of a real system is only acceptable in tech-nical terms if it can be well validated. The final validation stage in DBM modelling is similar to that used for TOP-MODEL and most other hydrological modelling exercises. The model estimated on the calibration data set is applied, without re-estimation, to a new set (or sets) of rainfall-flow data from the same catchment. If this validation is successful, in the sense that the simulated output of the model matches the measured flow to within the uncertainty levels quantified during the model estimation (calibration), then the model is deemed “conditionally valid” and can be used in whatever capacity it is intended.

It must be emphasized again that DBM modelling is a sta-tistical approach to model building and any specific DBM model will depend upon the objectives of the modelling ex-ercise. For instance, the DBM model for the River Hodder, as considered in this paper, is intended primarily for use in data assimilation and flow forecasting (see Young, 1998, 2003), rather than simulation (see later). However, it does define the dominant dynamic characteristics of the catchment, as inferred from the data alone, and so these can be compared with those inferred by the alternative TOPMODEL. The lat-ter model is obtained from the rainfall-flow data by a process

(5)

of hypothesis and deduction (i.e. the hypothetico-deductive approach of Karl Popper), with the model evaluation assisted by the application of GSA and with the uncertainty estima-tion provided by the DBM analysis used as a benchmark within the GLUE calibration. In order to help clarify this combined approach, the calibration and validation analyses are carried out using the same data sets.

In combining and comparing the models and their asso-ciated modelling methodologies (see e.g. Young, 2002), the inductive and relatively objective procedure behind the DBM modelling implies that the sensitivity analysis in DBM is an implicit part of the statistical identification and estimation analysis. Indeed, sensitivity functions computed within the estimation process are influential in identifying the model structure and order. In particular, the process of model struc-ture identification ensures that the model is identifiable and minimal: i.e. it is the simplest model, within the generic model class, that is able to adequately explain the data. As such, the identified DBM model provides a parametrically ef-ficient (parsimonious) description of the data in which all the parameters, by definition, are important in sensitivity terms, thus negating the need for GSA.

On the other hand, the TOPMODEL structure represents one particular hypothesis about the nature of the hydrological processes involved in the transfer of rainfall into river flow and the model synthesis follows the hypothetico-deductive approach. In this case, the pre-defined model structure is evaluated critically by overt GSA, which is used to specify the most sensitive and important parameters. The subsequent exercises in calibration and validation, as performed here by GLUE, are carried out in the light of this GSA and the DBM results.

It is important to note that the DBM model developed in the present paper and TOPMODEL are quite different repre-sentations of the rainfall flow dynamic behaviour. Although unusual in some ways, the DBM model quite closely resem-bles conventional, conceptual hydrological models that use a “bucket” analogy. The idea of an “effective rainfall” input is common in such models: indeed, the catchment storage model used for effective rainfall generation in the DBM sim-ulation model of Young (2003) can be related to the conven-tional Antecedent Precipitation Index (API) approach. More-over, the impulse response of its STF component is directly interpretable in terms of the unit hydrograph; while the par-allel decomposition of this STF reveals first order dynamic elements that can be interpreted in conventional “bucket” and mass conservation terms.

TOPMODEL has quite different nonlinear hydrograph dy-namics, being formulated as the nonlinear differential equa-tion where the outflow is calculated as an exponential func-tion of the changing catchment average soil moisture deficit (see later, Eq. (9)). As a result, TOPMODEL has nonlinear recession behaviour that is dependent on the soil moisture deficit and so will change as the deficit changes. In contrast to this, the recession part of the underlying hydrograph, that

characterises the linear STF component of the DBM model, is of a conventional form (in the case of a second order STF, a combination of two decaying exponential curves) because the only significant nonlinearity identified from the rainfall-flow data by the DBM analysis is connected with the effec-tive rainfall transformation at the input to the model. Of course, another of TOPMODEL’s differences, in relation to the specific DBM model used in the present paper, lies in its ability to exploit Digital Terrain Map (DTM) data and to estimate the spatial distribution of soil moisture in the catch-ment.

Finally, it seems to us that the use of the elements of rel-ative objectivity inherent to the DBM approach provides a useful contribution within the GLUE context. GLUE intro-duces elements of meaningful subjectivity, so allowing the modeller to interact in the modelling process by constrain-ing the model to have a specific form prior to calibration. This is of course, both a strength and a weakness, and it is achieved by relaxing some elements of full Bayesian estima-tion. One qualifying element of our paper is that we provide an objective benchmark for uncertainty prediction, which, in conjunction with GSA, allows us to pre-calibrate priors for the GLUE calibration of TOPMODEL in order to eliminate dynamical features that have little effect on the model output and would be rejected by the DBM approach.

3 Global sensitivity analysis and model calibration

Since some aspects of sensitivity analysis are not broadly known, this section will provide an outline of the main topics in GSA that are important within the context of the present paper, i.e. with particular attention to the links with calibra-tion. All the GSA methods mentioned here will be sub-sequently applied for the calibration exercise of the TOP-MODEL. Any mathematical or computational model can be formalised as a mapping Y =f (X1, ..., Xk), where Xi are the

uncertain input factors. A factor is anything that can be sub-ject to some degree of uncertainty in the model. All Xi’s

are treated as random variables characterised by specified distributions, implying that the output Y is also a random variable, with a probability distribution whose characterisa-tion is the object of uncertainty analysis. Sensitivity analy-sis is “The study of how the uncertainty in the output of a model (numerical or otherwise) can be apportioned to dif-ferent sources of uncertainty in the model input” (Saltelli et al., 2000). This definition is general enough to cover a va-riety of strategies for sensitivity analysis, while committing the strategy to some sort of quantitative partition of the output uncertainty (however this uncertainty is defined) into factors-related components.

In the history of GSA, hydrologists and environmental scientists have contributed with one of the strategies to-day considered as a good practice and named by its propo-nents “regionalised sensitivity analysis”, RSA (Young et al.,

(6)

1978, 1996; Hornberger and Spear, 1981; Spear et al., 1994; Young, 1999). Beside RSA, we would also like to outline, in this section, other strategies that have received acceptance amongst practitioners, together with a discussion of the “set-tings” for sensitivity analysis. This is because, as Saltelli et al. (2004) have argued, the effectiveness of a sensitivity anal-ysis is greater if the purpose of the analanal-ysis is specified un-ambiguously beforehand. Over time, practitioners have iden-tified cogent questions for sensitivity analysis. These ques-tions define the setting, which in turn allows for the selection of the strategy. These settings and the associated methods are described succinctly in the next sub-sections, pointing out their role in the context of calibration.

3.1 Variance based methods

Variance-based sensitivity indices are the most popular mea-sures of importance used in GSA. The two key meamea-sures are the main effect

Vi = V [E(Y |Xi)] (1)

and the total effect

VT i= E[V (Y |X−i)] (2)

where X−i indicates the array of all input factors except Xi, V and E denote variance and expectation operators. All

mea-sures are usually normalised by the unconditional variance of

Y , to obtain the sensitivity indices, scaled between 0 and 1: Si = Vi/V (Y )

ST i= VT i/V (Y )

(3) The Si spectrum is sufficient to characterise the entire

sensi-tivity pattern of Y only for additive models, for which

Xk

i=1Si = 1 (4)

Equation (4) tells us that all the variance of the model Y can be explained in terms of first order effects. Models for which (4) does not hold are termed non-additive. For non-additive modelsPr_{j =1} Sj≤1 and these models are characterised by

the existence of interaction effects, leading to the most gen-eral variance decomposition scheme (Sobol’, 1990),

X i Si+ X i X j >i Sij + X i X j >i X l>j Sij l+ ....S12...k=1 (5)

The complete decomposition (5) comprises an array of 2k−1

sensitivity terms, giving rise to the so-called “curse of di-mensionality”, since the expression and its decomposition is neither cheap to compute, nor does it provide a succinct and easily readable portrait of the model characteristics. In this context, total indices provide the major part of the informa-tion needed to complement the main effects, with only k ad-ditional indices to be estimated. Total indices measure the overall effect of input factors, including both main effects

and all possible interactions with any other input factor. Re-turning to the additivity of models, when a model is additive, we will have Si=ST ifor all Xi’s; while, in general, Si≤ST i

and the difference between main and total effects is due to all interaction terms involving Xi. The main effects and

to-tal effects are also strictly linked to two sensitivity settings that are extremely relevant in the calibration context: factors prioritisation and fctors fixing.

– Factors prioritization (FP)

Assume that, in principle, the uncertain input factors Xi

can be “discovered”, i.e. determined or measured, so as to find their true value. One legitimate question is then “which factor should one try to determine first in order to have the largest expected reduction in the variance of Y ”? This defines the “factors prioritization” setting. Saltelli and Tarantola (2002) have shown that the main effect provides the answer to the FP setting, so that rank-ing input factors accordrank-ing to the Si values allows the

analyst to guide research efforts that reduce the uncer-tainty in Y , by investing resources on the factor having the largest Si.

– Factors fixing (FF)

Another aim of GSA is to simplify models. If a model is used systematically in a Monte Carlo framework, so that input uncertainties are systematically propagated into the output, it might be useful to ascertain which input factors can be fixed, anywhere in their range of variation, without sensibly affecting a specific output of interest, Y . This may be useful for simplifying a model in a larger sense, because we may be able then to con-dense entire sections of our models if all factors enter-ing in a section are non-influential. Saltelli and Taran-tola (2002) also showed that the total effect provides the answer to the FF setting, and the ranking of input factors according to the ST i values allows us to restrict

the research efforts by “fixing” the factors having null

ST i’s. It is useful to note here that the condition Si=0

alone is not sufficient for fixing factor Xi. This factor

might be involved in interactions with other factors, so that although its first order term is zero, there might be non-zero higher order terms.

Both the FP and the FF settings are extremely important in the calibration context: FP matches the need of highlighting the most clearly identifiable input factors driving the uncer-tainty of the model predictions and possibly reducing them; while FF matches the need to identify irrelevant compart-ments of the model that, subsequently, can be simplified. For example, when applied to the likelihood weights in a GLUE procedure, as in Ratto et al. (2001), input factors can be clas-sified as:

– factors with high main effect: such a situation flags a

(7)

ana-lyst has to concentrate on these to reduce the prediction uncertainty (FP);

– factors with small total effect: such factors have a

negli-gible effect on the model performance and can be fixed at a nominal value (FF).

– factors with small main effect but high total effect: here,

such a situation flags an influence mainly through inter-action, implying lack of identification.

The latter situation, when it characterises the entire spectrum of input factors, would flag a particularly badly defined cali-bration problem, whereby the analyst is not allowed to iden-tify any prioritisation to reduce prediction uncertainty; nor introduce any fixing to simplify the model structure. This usually corresponds to a highly over-parameterised, uniden-tifiable model.

3.2 Monte Carlo filtering and Regionalised Sensitivity Analysis

We now present an altogether different setting for sensitivity analysis, strictly related to calibration. We call this “Factors Mapping” and it relates to the situations where we are espe-cially concerned with particular points or portions of the dis-tribution of the output Y . For example, we are interested in

Y being above or below a given threshold: e.g. Y could be a

dose of contaminant and we are interested in how much (how often) a threshold level for this dose level is being exceeded; or Y could be a set of constraints, based on the information available on observed systems. The latter situation is typi-cal in typi-calibration. In these settings, we will naturally tend to partition the realisation of Y into “good” and “bad”. This leads very naturally to Monte Carlo Filtering (MCF), where one runs a Monte Carlo experiment producing realisations of the output(s) of interest corresponding to different sampled points in the input factors space. Having done this, one “fil-ters” the realisations of Y .

Regionalised Sensitivity Analysis (RSA, see Young et al., 1978, 1996; Hornberger and Spear, 1981; Spear et al., 1994; Young 1999, and the references cited therein) is a MCF pro-cedure that aims to identify which factors are most impor-tant in leading to realisations of Y that are either in the “be-haviour” or “non-be“be-haviour” regions. Note that, in this con-text, one is not interested in the variance of Y but in which factors produce realisations of Y in the specified zone. In the simplest cases, RSA can answer this question by examining, for each factor, the subset corresponding to “behaviour” and “non-behaviour” realisations. It is intuitive that, if the two subsets are dissimilar from one another (as well as dissimi-lar, one would expect, from the initial marginal distribution of that factor), then that factor is influential. Standard statis-tical tests such as the Smirnov test are usually applied for this purpose.

The GLUE technique can be seen as an extension of the RSA methodology where, instead of the binary classification

behaviour/non-behaviour, model realisations are weighted and ranked on a likelihood scale via conditioning on obser-vations

3.3 Other methods

In general, the variance-based measures constitute good practice for tackling settings. The main problem is computa-tional cost. Estimating the sensitivity coefficients takes many model realisations. Accelerating the computation of sensi-tivity indices of all orders, or even simply of the Si, ST i

cou-ple, is the most intensely researched topic in GSA. Recently, various authors presented efficient techniques for estimating main effects and low order effects up the 3rd level, using a single Monte Carlo sample of size N (Li et al, 2002, 2006; Oakley and O’Hagan, 2004; Ratto et al., 2004, 2006). This makes the estimation of main effects very efficient (N =250 to 1000). However, the estimation of total effects still re-quires a larger number of runs: N t ot=N k+2, where k is the number of input factors and N is as large as 500–1000. So, it would be useful to have methods capable of providing ap-proximate sensitivity information at lower sample sizes. One such simple method, the Elementary Effect Test, is to aver-age the absolute value of derivatives over the space of factors. Elementary effects are defined as

EE_ij = Y j₍₁j i) − Yj 1j_i (6) with, Yj(1j_i) = YXj 1, X j 2, ... X j i−1, Xi+ 1 j i, Xi+1j , ... X j k Yj = Y X₁j_{, X}j 2, ... X j k

where, for each factor Xiand selected grid points j , a

modu-lus incremental ratio is computed. Then, for each factor, the

EE_ij computed at different grid points are averaged and the factors ranked based on:

EETi = Xr

j =1EE j

i, (7)

where r is the sample size for each factor, usually of the or-der of 10, for an overall cost of r(k+1)<<N tot. EETi is

a useful measure as it is efficient numerically and it is very good for factor fixing: indeed, it is a good proxy for ST i.

Moreover, EET is rather resilient against type II errors, i.e. if a factor is seen as non-influential by EET , it is unlikely to be seen as influential by any other measure (see Saltelli et al., 2004, 2005; Campolongo et al., 2006).

The Elementary Effect Test is a development and refine-ment of the Morris (1991) method. It departs from “Morris” original idea in two ways (Campolongo et al., 2006):

– the sampling strategy of Morris is improved as

fol-lows: once the number of trajectories r has been cho-sen, we generate a number r∗>>r of trajectories

(8)

model evaluation the subset r that provides the best ex-ploration of the input space, i.e. to avoid oversampling of some levels and undersampling of others that often occurs in the standard Morris procedure;

– the standard “Morris” procedure computes the mean (µ)

and standard deviation (σ ) of elementary effects, while we only compute the mean of the absolute values of the elementary effects (EET). For screening purposes, this is actually the only measure needed, since this alone is able to provide negligible input factors (EET ≈0) while, in the standard Morris approach, one has to look at the bi-dimensional plot in the (µ, σ ) plane and se-lectively screen the parameters lying towards the origin. Of course, µ and σ can be used to make some guess about non-linearities or interaction effects, but this goes beyond the scopes of screening applied in the current paper.

4 A practical example

The data set used in this example contains hourly data for rainfall, evaporation and flow observations for the River Hod-der at HodHod-der Place, North West England, 1991. The River Hodder has a catchment area of 261 km2and it forms part of the larger River Ribble catchment area of 456 km2. The Hod-der rises in the Bowland Fells and drains a thinly populated, entirely rural catchment and, together with the River Calder, it joins the River Ribble just downstream of a flow gauging station at Henthorn. There is a reservoir in the catchment which alters the water regime on a seasonal basis.

Data from the Hodder catchment have been analysed be-fore using DBM modelling: first, with the objective of adap-tive forecasting, using the flow measurement as a surrogate for catchment storage (Young, 2002); and second, as a simu-lation model (Young, 2003), where the catchment storage is modelled as a first order dynamic system with rainfall as the only input. In both cases the data set used is quite short: it consists of 720 and 480 hourly samples, respectively, with the rainfall series based on the Thiessen polygon average of 3 tipping bucket rain gauges. Based on these data, the latter model, in particular, achieves excellent validation per-formance, with the model output explaining 94% of the ob-served flow, only marginally less than the 95% achieved in calibration.

In the present example, we have chosen a data set that is somewhat longer, with 1200 hourly samples, but the rain-fall measurements are from a single rain gauge. Probably as a result of this and other inconsistencies in the data due to a reservoir operation (see later discussion in Sect. 4.1), the rainfall-flow data used for calibration and validation ex-hibit some problems from a modelling standpoint and, as we shall see, the validation performance of both the DBM model and TOPMODEL is considerably worse than that achieved in the previous DBM modelling studies. However, this has

the virtue of allowing us to see how the modelling method-ologies function when confronted with data deficiency and how our proposed combined approach, involving GSA, can provide useful results, even when using relatively poor data such as these, which are similar to those often experienced in practical hydrological applications.

Only the rainfall-flow data are used in the DBM mod-elling; while the TOPMODEL calibration uses the DTM and evaporation data, in addition to the rainfall-flow mea-surements. The January–March 1991 rainfall-flow data are used for the identification and estimation (calibration) of both models; while the second part of the data (October– December) is used for the validation. The catchment is pre-dominantly covered by grassland and, during summer time, the flow is affected by abstractions from the reservoir situ-ated in the catchment. Therefore, for simplicity and to make the model comparisons more transparent, it will be noted that only Autumn–Winter months are used for the modelling. Of course, both models would require further evaluation, based on a much more comprehensive data set, before being used in practice.

4.1 DBM rainfall-flow model for Hodder catchment Following the DBM modelling procedure outlined in Sect. 2, the first model structure identification stage of the modelling confirms previous research and suggests that the only signif-icant SDP nonlinearity is associated with the rainfall input. This “effective rainfall” nonlinearity is identified from the data in the following form:

uet = f (yt) × rt

where uet denotes effective rainfall, rt denotes rainfall and f (yt) denotes an estimated non-parametric (graphical) SDP

function in which, as pointed out in Sect. 2, the flow ytis

act-ing as a surrogate for the soil moisture content in the catch-ment (catchcatch-ment storage).

The non-parametric estimate of the state-dependent func-tion f (yt) is plotted as the full line in Fig. 1, with the

stan-dard error bounds shown dotted. The shape of the graphical SDP function suggests that it can be parameterised as either a power law or an exponential growth relationship. However, in accordance with previous research (see above references), the more parsimonious power law relationship f (yt)=g ytβ

is selected for the subsequent “parametric estimation” stage of the DBM modelling, where g is a scale factor selected to make the the SDP coefficient f (yt)=1.0 at maximum flow.

The best identified linear STF model between the effective rainfall uet=g ytβ×rt and flow is denoted by the usual triad,

as [2 2 0]; i.e. a TF model with 2nd order denominator, 2nd order numerator and no advective time delay. This results in

(9)

0 0.5 1 1.5 2 2.5 3 x 10−3 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1.2

Flow (acting as surrogate for catchment storage)

State Dependent Parameter estimate

Fig. 1. This graph shows the estimated non-parametric SDP

rela-tionship (full line) between measured flow and the effective rainfall parameter, where the flow is acting as a surrogate for catchment soil moisture. The 95% confidence bands are shown by the dotted lines and the dash-dot line is a parametric estimate of the SDP relation-ship obtained in the later estimation stage of the modelling using a power law approximation.

the following, nonlinear DBM model between the rainfall rt

and flow yt: uet = 10.258 y0.411 t × rt ˆ xt = h 0.0045 1−0.9903z−1 +_0.86821z0.1269−1 i uet yt = ˆxt+ ξt; var(ξt) = 0.042 ˆxt2 (8)

where ξt represents the estimated noise on the relationship

which, as shown, is modelled as a (state-dependent) het-eroscedastic process with variance proportional to the square of the estimated flow, ˆx_t2, and z−1is the backward shift oper-ator: i.e. z−syt=yt −s.

Concerning the expression uet=f (yt)×rt, it is worth

not-ing that, in its present, purely surrogate role, the flow series is not being used as an additional input to the model (i.e. in addition to the rainfall) in the normal sense: rather, at each time point, it simply affects the value of the state-dependent gain coefficient f (yt) that multiplies the rainfall in its

con-version to effective rainfall. Therefore, no dynamic system is involved as there would be if one used the modelled flow rather than the measured flow in this equation. The resulting effective rainfall occurs on the same time point as the real rainfall and just the level of rainfall at this time is changed. In other words, no flow feedback occurs at all in the model.

However, this exploitation of the flow measurement as a surrogate measure of soil moisture content in the catchment does restrict the present model’s use to defining the dom-inant dynamic characteristics of the catchment, as well as for data assimilation and flow forecasting applications. It cannot be used for simulation, although an alternative DBM

0 200 400 600 800 1000 1200 0 1 2 3x 10 −3 slow component [m/h] 0 200 400 600 800 1000 1200 0 1 2 3x 10 −3 fast component [m/h] time [h]

Fig. 2. The estimated slow (upper panel) and quick (lower panel)

flow components of the DBM model output. The most likely in-terpretation of these component flows is that the slow component represents the “groundwater” (baseflow) effects and the quick com-ponent represents the “surface and near-surface” process effects.

simulation model has been developed by introducing a parsi-monious model for catchment storage st (e.g. Young, 2003).

This replaces the flow yt in the above definition of uet, and,

if required, a similar model could involve temperature and/or evapo-transpiration data, when these are available (see e.g. the rainfall-flow models developed by Whitehead and Young, 1975; Jakeman et al., 1990).

The noise ξt identified in DBM modelling represents that

part of the data not explained by the model. It can be quan-tified in various ways: e.g. simply by its variance; by its variance in relation to that of the model output; by a het-eroscedastic process, where the variance is modulated in some state-dependent manner, as in the present paper; by a coefficient of determination; by a stochastic model, if such a model is applicable; by its probability distribution; or by its spectral properties. Of course, it can be due to various rea-sons: measurement inaccuracies, limitations in the model as a representation of the data, the effects of unmeasured inputs, etc. Sometimes, if it has rational spectral density, this noise can be modelled stochastically, e.g. by an Auto-Regressive (AR) or an AutoRegressive, Moving Average (ARMA) pro-cess, but this is not always applicable when dealing with real data (see later). Consequently, the “instrumental variable” identification and estimation methods used for DBM mod-elling are robust in the face of noise that does not satisfy such assumptions (see e.g. Young, 1984).

In order to reveal the most likely mechanistic interpreta-tion of the model, the [2 2 0] STF relainterpreta-tionship in the model (8) is decomposed by objective continued fraction expan-sion (calculus of residues) into a parallel form that reveals the major identified flow pathways in the rainfall-flow sys-tem: namely, a “slow” component whose flow response is shown in the upper panel of Fig. 2, with a residence time (time constant) equal to 103 h; and a “quick” flow compo-nent, with a residence time of 7.1 h, whose flow response is

(10)

Table 1. Synthetic table of main calibration and validation results. Calibration R_T2 Validation R_T2 DBM 0.928 0.79 DBM (CT) 0.917 0.838 TOPMODEL (base) 0.905 0.747 TOPMODEL (pre-calib.) case (a) 0.883 0.766 case (b) 0.880 0.762 case (c) 0.53 < 0

shown in the lower panel of Fig. 2. However, it should be noted that, while the quick residence time is estimated well, with narrow uncertainty bounds (95 percentile range between 7.2 h and 6.9 h), the slow residence time is poorly estimated with a skew distribution (95 percentile range between 81 h and 136 h). This is due to the residence time being relatively large in relation to the observation interval of 1200 h, so that the information in the data on this slow mode of behaviour is rather low.

The partition of flow between these quick and the slow components, respectively, is identified from the decomposed model with about 68% of the total flow appearing to arrive quickly from surface and near surface processes; while about 32% appears as the slow-flow component, probably due, at least in part, to the displacement of groundwater into the river channel (base flow). The decomposition and its asso-ciated partitioning are a natural mathematical decomposition of the estimated model which has a nice interpretation in both dynamic systems and in hydrological terms (see e.g. Young, 2005). It is a function of two mathematical prop-erties of model: the eigenvalues, which define the residence times of the “stores” in the parallel pathways; and the steady state gains of these stores, that define how much flow passes through each store1. However, it must be emphasised that the uncertainty associated with the partitioning in this particular example is high because of the uncertainty in the estimation of the slow residence time.

It is clear from this analysis that the partitioning of the DBM model, as identified in this manner, is inferred from the data by the DBM modelling methodology and it is not imposed on the model in any way by the modeller, as in other approaches to rainfall-flow modelling. For example, the interested reader might compare the HyMOD model of Moradkhani et al. (2005) of the Leaf River, where the model structure is assumed by the modellers, with the DBM model of Young (2006), which is based on the same data set but where a similar but subtly different structure is inferred from the data.

1_{Another feedback decomposition is possible (identifiable) but}

this is rejected because it has no clear physical interpretation.

0 100 200 300 400 500 600 700 800 0 0.5 1 1.5 2 2.5 3x 10 −3 time [h] flow [m/h]

Fig. 3. Validation of the DBM model on October–November 1991

period, 79% of output variation explained; dots denote the obser-vations, simulations are marked by a solid black line, shaded area denotes 95% confidence bands.

In Table 1, the main results in calibration and valida-tion for the DBM model and TOPMODEL are synthesised. The DBM model (8) has a Coefficient of Determination,

R_T2=0.928 (i.e. 92.8% of the measured flow variance is ex-plained by the simulated model output), which is a sub-stantial improvement on the linear model (R_T2=0.84), where it involves the addition of the single power law parameter. Clearly, the use of observed flow in the definition of effec-tive rainfall feeds some information about output flow in the linear STF (see earlier discussion). However, similar per-formance is obtained when the catchment storage st is

mod-elled from rainfall data, without any use of flow data (Young, 2003). This confirms that it is a sensible identification of the input nonlinearity and its associated effective rainfall that allows the model to describe well the observed system, not the use of flow data per se. Finally, the estimated noise ξt is

identified as a heteroscedastic, high order process.

Note that we have not carried out any Monte Carlo Simula-tion (MCS) analysis in connecSimula-tion with the model (8), in or-der to emphasise that this is not essential in DBM modelling: the 95% confidence bands, as shown later in Fig. 3, for ex-ample, are a function of both the estimated parametric un-certainty and the residual heteroscedastic noise ξt. Without

the MCS analysis, the DBM modelling is very computation-ally efficient, normcomputation-ally involving only a very small computa-tional cost (a few seconds of computer time). This contrasts with the TOPMODEL analysis, considered in the next sec-tion, where computationally much more demanding Monte Carlo Simulation analysis is an essential aspect of the GLUE calibration and uncertainty analysis. Consequently, it is easy to use the DBM modelling for the role suggested in this pa-per.

(11)

Table 2. Parameter distributions applied in MC calibration of TOPMODEL.

For each parameter, the top line shows the ranges for the base calibration exercise, while modified ranges for pre-calibration cases (a,b,c) are specified, when applicable.

Parameters Distribution Min. Max. Mean Std

input uniform 0.8 1.2 1 0.116

case (c) uniform 0.8 1 0.9 0.0577

SK0 uniform 10 000 40 000 25 000 8600

m uniform 0.001 0.03 0.0157 0.008

cases (a, b) uniform 0.005 0.03 0.0175 0.0072

case (c) uniform 0.0025 0.003 0.00275 1.44e-4

LRZ uniform 1.e-4 0.04 0.02 0.012

case (c) uniform 0.015 0.04 0.0275 0.0072

KS log-uniform 1.e-7 1.e-3 1.08e-4 2.05e-4

case (c) log-uniform 1.e-4 1.e-3 3.9e-4 2.49e-4

δ uniform 0.001 0.2 0.1 0.0575

case (a) uniform 0.01 0.2 0.105 0.0548

case (b) fix 0.15 0.15 -

-case (c) uniform 0.1 0.2 0.15 0.0289

Quite often, however, MCS analysis is used as an junct to DBM modelling, in order to evaluate various ad-ditional aspects of the model: e.g. the uncertainty associ-ated with the “derived parameters”, e.g., the residence times and partitioning percentages of the decomposed model (see e.g. Young, 1999), as well as the effects of stochastic input noise. Typically, the uncertainty associated with the DBM flow response is mainly accounted for by model output error, which in DBM modelling is considered as output “measure-ment noise” (see earlier).

The model validation results are shown in Fig. 3 and Ta-ble 1: here only 79.0% of measured flow variation is ex-plained by the DBM model. This is a little better than TOP-MODEL which explains 74.7% of the output variation (see later Sect. 4.2). Interestingly, the performance of the DBM model is improved if the continuous-time (CT) version of the model (see e.g. Young, 2005) is identified and estimated using the continuous time version (RIVC) of the RIV algo-rithm in the CAPTAIN Toolbox. Although the calibration

R_T2=0.917 is then marginally less than for the discrete-time

DBM model, the validation R_T2=0.838 is improved.

The main difference between the models is that the DBM model validates well on the peak flows, while TOPMODEL does not do so well in this regard, although having somewhat better performance than the DBM model at lower flows. Both of these results are not particularly good, however, in com-parison with previous DBM rainfall-flow modelling results based on Hodder data (see previous discussion), where 94% of the output variation is explained by the model during val-idation. It is also worse than other typical DBM modelling results (see the cited references on DBM modelling), where

85–95% of the flow is normally explained by the model dur-ing validation.

As pointed out previously, this poor performance is almost certainly due to data deficiencies in the form of poor, sin-gle gauge rainfall observations and other inconsistencies in the flow measurements arising from the reservoir operations. Note that, in this example, the high level of uncertainty of the slow flow component (see earlier comments) has a deleteri-ous effect on the validation performance of the DBM model because the validation error is particularly sensitive to errors in this slow flow component. Also, it is interesting to note that, if the DBM model calibration is performed on the val-idation data set, only 84% of flow variation is explained by the model with the same structure. Compared with this, the above validation figures (79.0% and 74.7%) are not as bad as they look at first sight. However, this reinforces our com-ments about the deficiences in the data, particularly, it would appear, the validation data set.

Despite these limitations in the data, the modelling re-sults are satisfactory for the illustrative purpose of the present paper (i.e. presentation of the two different approaches to rainfall-flow modelling and how they may be combined) since both models are subject to exactly the same data defi-ciency and the results reveal interesting comparative aspects of the models.

4.2 TOPMODEL calibration for Hodder catchment In the present context, the main aim of the TOPMODEL (Beven and Kirkby, 1979) is the estimation of the uncertainty associated with the flow predictions for the Hodder catch-ment, using the same data as those used in the previous

(12)

sec-tion. The choice of TOPMODEL is justified by its simple structure and its mechanistic interpretation: it has a total of 9 parameters but only 4 of these are considered in the cali-bration and the associated sensitivity analysis described here, in addition to uncertainties pertaining to the input and output observations. TOPMODEL bases its calculations of the spa-tial patterns of the hydrological response on the pattern of a topographic index for the catchment, as derived from a Dig-ital Terrain Model (DTM). We have chosen the SIMULINK version of TOPMODEL, described in Romanowicz (1997), because of its clear, modular structure. This model has al-ready been applied in similar Monte Carlo settings by Ro-manowicz and Macdonald (2005). The saturated zone model is assumed to be nonlinear, with the outflow Qb(t )

calcu-lated as an exponential function of a catchment average soil moisture deficit S, i.e.,

dS(t )

dt = Qb(t ) − Qv(t ) Qb(t ) = Q0exp(−S(t )/m) (9)

where Q0=SK0 exp(−λ) is the flow when S(t)=0; Qv(t )

denotes the recharge to the saturated zone; SK0 is a soil transmissivity parameter; m is a parameter controlling the rate of decline in transmissivity with increasing soil moisture deficit; and λ is the mean value of the topographic index dis-tribution in the catchment (Beven and Kirkby, 1979). Other parameters control the maximum storage available in the root zone (LRZ) and the rate of recharge to the saturated zone (KS). The calibration exercise includes the analysis of un-certainties associated with the input and output observations (i.e., rainfall and flow), the model structure and its parame-ters.

We assume here that rainfall observations are affected by a biased measurement, accounted for by a multiplicative noise (input in Table 2) of ±20% that is applied to rainfall data at each Monte Carlo Simulation (MCS) run. Flow observation uncertainty is included in the choice of the error model struc-ture, which is assumed to follow a fuzzy trapezoidal member-ship function (for a similar application see Pappenberger and Beven, 2004). The breakpoints of the trapezoid are given by the array [A, B, C, D] = [1−4δ, 1−δ, 1+δ, 1+4δ], where δ denotes the characteristic width of the trapezoid. Breakpoints for each time location are determined by multiplication e.g. for the A breakpoint at time t, At=yt(1−4δ), implying a

het-eroscedastic error structure. The width δ of the trapezoidal membership function is assumed to be uncertain as well, in a range between [0.1% and 20%]. This will allow the sen-sitivity of the calibration to the assumptions about the error magnitude to be evaluated.

Parameter uncertainty is taken care of in the choice of the parameter ranges for the parameters SK0, m, LRZ, KS, as re-quired for the MCS analysis. The influence of the model structure uncertainty may be accounted for via a random sample from different model structures (e.g., TOPMODEL with and without dynamic contributing areas). In this paper,

we restrict the analysis to parametric and observational un-certainty.

To some extent and in contrast to the DBM modelling, the specification of the above uncertainty on the input parame-ters, as required for the MCS analysis, is subjective and will depend upon the experience and prior knowledge of the ana-lyst. Consequently, these specifications may well be adjusted following the investigation of the initial MCS results, as dis-cussed later.

In this example, the MCS analysis involves 1024 runs of TOPMODEL, sampling parameters from the ranges spec-ified in Table 2 using quasi-random LPTAU sequences (Sobol’ et al., 1992). The sample size requirements of GLUE applications are mainly linked to the convergence of the cu-mulative distribution functions (Pappenberger et al., 2005; Saltelli et al., 2004, p. 153–155) and thus to the “success rate”, i.e. the percentage of samples that provides good fit. Given the relatively small number of parameters in this case, the sample size was sufficient for the current application. Moreover, we also kept the computational cost at the mini-mum to show that careful pre-calibration based on sensitivity analysis results allows for a reduction of the computational costs of GLUE exercises, by improving the “success rate” (see Sect. 4.2.1 below).

In the GLUE approach, the predictive probability of the flow takes the form:

P ( ˆxt < y|) = X {i: ˆxt<y|ϑi,}

f (ϑi|) (10)

where ˆxt are simulated outputs and f (·) denotes the

likeli-hood weight of parameter sets 2= {ϑ1, ..., ϑn}, in which n

is the number of MC samples, conditioned on the available observations = {y1, . . ., yT}.

The next step is to investigate the weights. The trapezoidal membership function, at each discrete time t and each ele-ment ϑi, i=1, ...n of the MC sample, takes the values:

ft(ϑi|yt) = 1 if Bt < ˆxt < Ct ft(ϑi|yt) = _Bxˆt_t−A_−At_t if At < ˆxt < Bt ft(ϑi|yt) = xˆt−Dt Ct−Dt if Ct < ˆxt < Dt ft(ϑi|yt) = 0 if xˆt < At, ˆxt > Dt (11)

where At=yt(1−4δi), Bt=yt(1−δi), Ct=yt(1+δi), Dt=yt(1+4δi). The likelihood weight for each parameter

set is, therefore:

f (ϑi|) ∼ T X t =1

ft(ϑi|yt) (12)

In the present case study, the confidence bounds resulting from the GLUE methodology will depend on the width δ of the trapezoidal membership function. This function gives zero weight for the simulations falling outside the range

yt(1±4δ) at any time (i.e. outside a relative error bound of

(13)

0 100 200 300 400 500 600 700 800 900 0 1 2 3 4 5 6 7x 10 −3 time [h] flow [m]

Fig. 4. Validation of the TOPMODEL: 74.7% of output

varia-tion explained; dots denote the observavaria-tions, simulavaria-tions (posterior mean) are marked by a solid black line, shaded area denotes 95% confidence bands.

the range yt(1±δ) at any time (i.e. within a relative error

bound of 100δ%). The values of δ used in the MCS (Ta-ble 2) range from a very strict rejection criterion (simula-tions are rejected if the relative error at any time is larger than

±0.4%) to a very mild one (simulations are given the

max-imum weight when the relative error at any time is smaller than ±20%, while they are rejected only when they fall out-side the ±80% bound). This initially wide range will un-dergo a revision in the second part of this example, based on sensitivity analysis results and taking DBM uncertainty es-timation as a benchmark for adjusting (pre-calibrating) the priors in a less subjective way than this initial set-up. How-ever, as discussed later, δ is not the only parameter strongly driving the uncertainty bounds, with m playing an important role as well.

The calibration flow predictions of TOPMODEL explain 90.5% of the observed flow variations (see Table 1), not sig-nificantly different from that obtained using the DBM ap-proach (92.8%). Note that TOPMODEL utilises evaporation measurements in addition to the rainfall and flow measure-ments used in the DBM model. The model residuals are heteroscedastic and highly coloured, even more so than the DBM model residuals. On the validation data (Fig. 4), the model output explains 74.7% of the flow variations, a little worse than DBM (79.0%: see Fig. 3). On the other hand, looking at the results in more detail, two major differences appear:

1. The 95% uncertainty bound of the calibrated TOP-MODEL at the peak flows is about twice the bound of the DBM model. The latter uncertainty bounds seem more reasonable, given the nature of the observations around the peak flows.

0 200 400 600 800 1000 1200 0 2 4 6 8 10

Sat. & groundwat. fl. [m]

0 200 400 600 800 1000 1200 0 1 2 3 4 5x 10 −3 time [h] Surf. runoff [m] x 10−3

Fig. 5. Uncertainty estimation of the discharge from the satu-rated subsurface store (upper panel) and of the “surface runoff” (lower panel) of the TOPMODEL; simulations (posterior mean) are marked by a solid black line, shaded area denotes 95% confidence bands.

2. The partition of the output flow of the TOPMODEL into two components, deriving from rainfall on the dynamic saturated contributing areas (denoted here as “surface runoff”) and from the discharge from a nonlinear sat-urated subsurface store, differs from the DBM’s parti-tioning, as discussed previously, and is shown in Fig. 5. The saturated contributing area accounts for only 5% of the predicted discharge, with “quick” dynamical fea-tures merely adjusting the flow peaks, while the remain-der is contributed by the nonlinear subsurface store. 4.2.1 Sensitivity analysis: is it possible to reduce volatility

of predictions?

As discussed in Sect. 3, sensitivity analysis can play a key role in model calibration. We have seen that the uncertainty from the calibrated TOPMODEL is much larger than for the DBM model (notice the difference in the vertical axis scales of the figures caused by this). Taking DBM as a relatively objective benchmark for uncertainty estimation, we would like to analyse under which conditions the volatility of pre-dictions can be reasonably reduced. This is a typical example for the Factors Prioritisation setting, where one can identify strategies for the reduction of the uncertainties of the TOP-MODEL, acting on the parameters having the highest main effects. To do so, we perform the sensitivity analysis for the likelihood weights (Ratto et al., 2001).

In Table 3, first column, we show the main effect indices, computed on the same 1024 runs used for calibration and ap-plying the State Dependent Regression method by Ratto et al. (2004, 2006); while in Table 3, second column we show the Elementary Effect Test (EET), computed using only an additional 168 model runs. The main effects tell us that the

(14)

0.8 1 1.2 0 200 400 600 800 1000 1200 Input 0 0.01 0.02 0.03 0 200 400 600 800 1000 1200 m 1 2 3 4 x 104 0 200 400 600 800 1000 1200 sko 0 0.02 0.04 0 200 400 600 800 1000 1200 LRZ 10−7 10−5 10−3 0 200 400 600 800 1000 1200 KS 0 0.1 0.2 0 200 400 600 800 1000 1200 δ

Fig. 6. Scatter plots of the sample of likelihood weights versus uncertain input factors.

uncertainty in the width of the trapezoidal membership func-tion dominates, followed by the parameter m, which is the only physical parameter having a non-zero effect. Input rain-fall noise also plays a significant role. Looking at the EET indices, which are cheap proxies of the total effects as dis-cussed in Sect. 3, we can see that, among the other structural parameters, only KS plays a significant role through inter-actions, while SK0 and LRZ are not influential as far as the flow predictions are concerned (Factors Fixing setting). A detailed sensitivity analysis of each flow component suggests that KS, controlling the recharge from the root zone to unsat-urated zone, has influence on the model mostly in the periods of low flow (details of the SA are not shown here). In the periods of high flow, on the other hand, the model output dis-plays a large sensitivity to small values of m: for m<0.005, flow peaks related to “surface runoff” can reach extremely large values with respect to the average flows predicted. This suggests a way of restricting the support of m, in order to reduce the prediction uncertainty.

In addition to the above, let us consider the scatter plots of the likelihood weights versus the input factors (Fig. 6). Looking at the plot for δ, we can see that the lower bound of the range seems to be rejected by the data: in fact, for values smaller than about 0.01, the width of the trapezoid is too narrow, in such a way that most of the model runs falling in this region are rejected as being unlikely.

Based on these considerations, we conduct a second cali-bration step, by shrinking the support for δ to the range [0.01, 0.2] and the support for m to [0.005, 0.03] (this is an example of application of the Monte Carlo filtering technique). Such

Table 3. Sensitivity analysis of the TOPMODEL: first two columns

show the main effect sensitivity indices and the elementary effect tests for the initial calibration step. The third column shows the main effects for the pre-calibrated case discussed in Sect. 4.2.1, to reduce volatility of predictions.

Parameters Si EETi Si, case (a) Si, case (b)

input 0.0842 0.3946 0.123 0.304 m 0.1823 0.5526 0.214 0.49 SK0 0.0004 0.0236 0.0002 0.0004 LRZ 0.0003 0.0204 0.0002 0.0021 KS 0.0140 0.1664 0.03 0.103 δ 0.5760 1.0000 0.53 –

modifications are quite in order since, as pointed out pre-viously, the specification of the original input uncertainties is rather subjective. Moreover, this revision of δ (measure-ment error) from an initially wide interval is one possible way to conduct a calibration of the measurement error (de-noted as case (a) in Table 1 and Table 2); another one being to pre-calibrate δ based on DBM confidence bounds, which provides a statistically reliable estimate of the measurement error, when no direct information on the magnitude of such an error is available. In the latter case, one can consider the DBM heteroscedastic noise variance estimation in (8), which implies a noise standard deviation of 0.205 times the output flow, i.e. a 95% bound of ± 0.41 ˆxt. With the fuzzy

trape-zoidal membership function used here, this roughly corre-sponds to fixing δ ≈ 0.15. The results of the associated

(15)

0 100 200 300 400 500 600 700 800 900 0 0.5 1 1.5 2 2.5 3 x 10−3 time [h] flow [m]

Fig. 7. Revised validation of the TOPMODEL, to reduce volatility

of predictions, case (a): 76.6% of output variation explained; dots denote the observations, simulations (posterior mean) are marked by a solid black line, shaded area denotes 95% confidence bands. Surface flow partition: 3.5% of the overall flow.

calibration and validation exercises are denoted as case (b) in Table 1 and Table 2 .

Figure 7 shows the updated validation exercise for case (a): the portion of flow variance now explained by the new cali-brated model is 88.3% (Table 1), only slightly less than in the initial setting. The uncertainty bound is now comparable to the one obtained in the DBM analysis, i.e. we have managed to obtain less volatile model predictions. How significantly does this affect the fitting performance? In the present step, 10.4% of the data fall outside the 95% bound, whereas this was 4.7% for the previous case. This is exclusively due to a larger number of time periods where the model uncertainty bound is above the data, so the results are on the “safe” side. The partitioning is also hardly modified: “surface runoff” is now 4.3% of the overall flow. Cutting the high tails from the uncertainty distribution of surface runoff flows did not alter the partitioning too much, meaning that the high tails in the upper panel of Fig. 5 are not supported by the data. The validation step confirms the results (Table 1): 76.6% of the data is now explained by model predictions, i.e. a slightly larger amount than before. Although this is not a statistically significant improvement, there is clearly no degradation in the results, and this is an indication of the validity of the ap-proach proposed in this paper. The uncertainty bounds in this case are also similar to DBM, with the drawback of a larger portion of data below the bounds in periods of small flow: 14.3% versus the 6.4% of the initial case.

The calibration/validaton results for case (b) are almost identical to case (a) (see Table 1). This shows that, using DBM results for the GLUE calibration, can allow for the cal-ibration of the measurement error in a reasonable manner, thus reducing the number of uncertain parameters.

In both of these new calibration steps, the effect of the different uncertainty sources is different. For case (a), δ re-mains the most important parameter, but its sensitivity de-creases, while the sensitivity of m increases. Moreover, KS also acquires a small main effect (see Table 3, third col-umn). For case (b), on the other hand, we can see that fix-ing δ based on DBM estimation allows other sensitivity pat-terns to be revealed more clearly and in a more balanced way, and balanced main effects are usually a sign of better identification. In particular, we have that, among the struc-tural parameters, both m and KS have a significant effect on model performance. Lack of identification is still present for SK0 and LRZ, however, which can have virtually any value in their prescribed prior ranges without significantly affecting the model performance in predicting outflow (over-parameterisation or equifinality). This is an example of what is called, in sensitivity analysis, “fit for purpose”. We actu-ally know that both of these parameters should have an effect on flow, as they show threshold-like behaviour (Romanowicz and Beven, 2006). However, this behaviour has no significant impact on the fit of observations for the current case study, implying the irrelevance of SK0 and LRZ for the purposes of outflow predictions. Of course, this is not a general result: it does not mean that there are not other “behavioural” classifi-cations or other data sets that may highlight the role of SK0 and LRZ.

4.2.2 Sensitivity analysis: is other partitioning possible? As pointed out previously, the interpretation of the partition-ing and decomposed flows of rainfall-flow models is open to subjective judgement and it is expected that the partition-ing of the two models will be different because it is con-strained in TOPMODEL by the assumptions built into its formulation, namely by the surface contributing areas. Con-sequently, TOPMODEL’s decomposition may not be consid-ered any less physically acceptable than the one suggested by the DBM model. However, the relatively objective nature of the DBM analysis and its association of the partitioning with the estimated steady state gains and residence times of the DBM model is persuasive and, we believe, provides a very reasonable decomposition (although this is likely to depend on the view of the hydrologist and not all readers will agree). For instance, if we consider the observed flow response, there seems to be an appreciable base-flow component and the DBM model estimate of the “slow-flow” component seems consistent with what one might expect in this regard. On the other hand, TOPMODEL’s partitioning, based on predicted contributing areas, gives quite a different form of partition-ing, with part of the fast response provided by the nonlinear subsurface store. Consequently, purely as an illustrative ex-ercise, it might be instructive to use sensitivity analysis to investigate whether other partitions are possible within the uncertainty specifications of the TOPMODEL. This helps to further demonstrate how GSA can assist in model evaluation