• Aucun résultat trouvé

Chapter 2. Characterize the uncertainty of the total water storage and snow depth

N/A
N/A
Protected

Academic year: 2022

Partager "Chapter 2. Characterize the uncertainty of the total water storage and snow depth "

Copied!
53
0
0

Texte intégral

(1)

for integrated water resource assessment

Report on uncertainty characterization of the WP5 WRR tier 2 products

Deliverable No: D4.4 – Report on uncertainty characterization of the WP5 WRR tier 2 products Ref.: WP4 – Task 4.4

Date: November 2017

(2)

Deliverable Title D4.4 – Report on uncertainty characterization of the WP5 WRR tier 2 products

Filename E2O_D44_v2.1

Contributors Simon Munier (Meteo-France)

Marie Minvielle (Meteo-France) Bertrand Decharme (CNRS)

Jean-Christophe Calvet (Meteo-France) Eleanor Blyth (CEH)

Ted Veldkamp (VUA) Thymios Nikolopolous (ITC)

Reviewer Emmanouil Anagnostou (ITC)

Signed-off Frederiek Sperna Weiland (Deltares)

Date 27/10/2017

Prepared under contract from the European Commission Grant Agreement No. 603608

Directorate-General for Research & Innovation (DG Research), Collaborative project, FP7-ENV-2013- two-stage

Start of the project: 01/01/2014

Duration: 48 months

Project coordinator: Stichting Deltares, NL Dissemination level

X PU Public

PP Restricted to other programme participants (including the Commission Services) RE Restricted to a group specified by the consortium (including the Commission Services) CO Confidential, only for members of the consortium (including the Commission Services)

(3)

References ... 51

(4)

Introduction

Water resources estimates depend on two things: the accuracy of the meteorological forcing data and the ability of the model to interpret that forcing into appropriate quantities. The eartH2Observe project has been concerned with quantifying improvements that can be made to global and local water resource estimates if satellite products are used. They can be used to either improve the model or improve the meteorological (mainly precipitation) forcing data.

However, there is an additional dimension to the problem. Accuracy of the meteorological data will improve the further back in time you look where more data can be used, such as raingauges. If a near-real time estimate is needed, then the accuracy will depend on products that can be produced near-real time. That is weather-forecast model output and/or satellite products that are not rainguage corrected.

In this report we quantify the impact of using different satellite-based rainfall products on the water- resource metrics that have been developed in eartH2Observe.

Objective:

To quantify the error in water resource assessments from a multi-model ensemble driven by different precipitation forcing datasets including four satellite rainfall products and a rainguage- corrected atmospheric reanalysis precipitation product developed in eartH2Observe.

Methods:

We have 4 global satellite products that have been combined with the non-rainfall (i.e. temperature, wind speed, radiation etc.) meteorological forcing from MSWEB (Beck et al, 2016). The products are called CMORPH, GSMAP, TRMM and PERSIANN. Their description is given in Workpackage 3 reports (D3.8).

The models used to create Tier2 water resources analysis were then run with these 4 new driving datasets and the outputs compared to the model run using MSWEB. The errors associated with the satellite products are considered to be typical of the errors or uncertainty we have in rainfall observation.

The assessment of uncertainty spans several aspects of the water resources, and is covered in 4 chapters (numbering for figures and tables are referenced within each chapter):

1. Overview of the uncertainty in the seasonal and extreme water cycle 2. Characterise the uncertainty of the total water storage and snow depth 3. Characterise the uncertainty in the identification of global drought

4. Characterise the uncertainty when applied to a regional case study: the upper Blue Nile basin Some of the studies include a subset of the models when key metrics are unavailable in the dataset, such as river routing. It is likely that this can be expanded to use all the models with some adjustments to the model diagnostics.

(5)

90% threshold.

Figure 1.1

Uncertainty in the precipitation inputs to the E2O ensemble models, expressed as the SD of the ann.

freq. of precipitation being in the highest 10% of the baseline (global) distribution of values.

It is interesting to note the increased error in the mountainous regions of the Andes as well as the high rainfall areas of the African Tropical Rainforest.

7 models have been run with the satellite products and their error in reproducing the mean water budget is shown in Figure 1.2. This has also been reported in Workpackage 5, D5.4 and the Deliverable 4.3.

Figure 1.2. Evaluation of 7 models run with the 4 satellite rainfall products as well as the baseline (MSWEB) product.

(6)

It can be seen here that for the evaporation and runoff, there is more uncertainty or variation between the rainfall products than for the soil moisture and snow. It is also clear that the MSWEB products is the highest performer among the products. However, the rainfall products are not very different and especially for soil moisture (for example useful in agricultural applications) the results are good.

It is interesting to note how the models translate extreme precipitation. In the following analysis, the standard deviation of the number of percentage of times the model runoff or evapotranspiration is above the 90% threshold set by the base-line run are shown. Each model interprets the meteorological forcing data slightly differently, especially with respect to extreme rainfall which has not been used for any tuning. Figure 1.3 shows the results for runoff and Figure 1.4 for evapotranspiration.

Figure 1.3. Standard deviation of the monthly runoff (i.e. surface runoff + subsurface runoff (if exists in the model) + drainage) being in the highest 10% of the baseline distribution of values. The models are in the following order: Top left: JULES, Top right: HTESSEL, Middle left: ORCHIDEE, Middle right:

SURFEX, bottom left: WAterGAP and bottom right: average.

(7)

Figure 1.4. Standard deviation of the monthly evapotranspiration being in the highest 10% of the baseline distribution of values. The models are in the following order: Top left: JULES, Top right:

HTESSEL, Middle left: ORCHIDEE, Middle right: SURFEX, bottom left: WAterGAP and bottom right:

average.

A comparison of these plots suggests that the evapotranspiration is less sensitive to high-rainfall extremes than runoff.

(8)

Chapter 2. Characterize the uncertainty of the total water storage and snow depth

This chapter presents the comparison of model outputs from simulations conducted within the WP5 WRR tier 2. A document that focused on the analysis of model simulations from WRR tier 1 is available online at https://www.earth-syst-sci-data.net/9/389/2017/essd-9-389-2017- supplement.pdf. Compared to WRR tier 1, WRR tier 2 is based on improved meteorological forcing data and some model improvements. In general the comparison with observations (Total Water Storage and snow) shows that WRR tier 2 analysis data perform better than in WRR tier 1, especially over the Amazon basin or at high latitudes (e.g. snow). Main methodological differences in this new analysis is that some models added new variables that are now used to compute the Total Water Storage. Namely, the WaterGAP model from UnivK now simulates the water content in the soil column (TotMoist). This model is now included into the comparison. On the other hand, model outputs from the LISFLOOD model (JRC) were not available when the comparison was performed and this model has been excluded in this document.

2.1 Introduction

The aim of the WP5 was to produce a multi-model ensemble-based global water resources reanalysis (WRR tier 1 reanalysis) in which state-of-the-art land surface models and global hydrological models are forced by the most-accurate global meteorological forcing provided by atmospheric reanalysis. In a second step, a new set of improved meteorological forcings have been used to produce a new multi-model global water resources reanalysis (WRR tier 2). A part of WP4 is the validation of earth2observe water resources reanalyses, and one of the ways is through the terrestrial water storage and the snow depth evaluation. In this report we present for these two variables the validation of the WRR tier 1 and WRR tier 2 reanalysis (1979-2012) and compare the different WRR model performances. The report contains a description of the variables to be validated, a description of the observed datasets used for the validation, a presentation of the models which have provided the necessary variables and those finally retained, the results of the terrestrial water storage validation, and results of the snow depth validation. For a complete description of the atmospheric forcing used in the project, the different institutes and models, please refer to the deliverable D5.1 of the project. Also, please refer to the deliverable D5.2 for a complete description of the improvements related to WRR tier 2 (meteorological forcing and model developments).

2.2 Observed datasets 2.2.1 Terrestrial water storage

Terrestrial water storage (TWS) consists of snow and ice, surface water, soil moisture and permafrost, groundwater and vegetation water content. The GRACE satellite mission provides time-variable gravity field solutions which allow direct evaluation of the TWS variations. GRACE data can be used to estimate TWS from basin (Crowley et al. 2006; Seo et al. 2006) to continental scale (Schmidt et al.

(9)

2003). The most recent release (RL05) of 3 GRACE gravity model products were used for the analysis, each one generated by 3 different institutions: the Center for Space Research (CSR at the University of Texas), the Jet Propulsion Laboratory (JPL) and the GeoforschungsZentrum (GFZ). Deriving month- to-month gravity field variations from GRACE observations requires a complex methodology, and many parameter choices. As recommended, we used all three data centers’ products, and averaged them. For more details concerning GRACE data, please refer to http://grace.jpl.nasa.gov/data/. For this study, GRACE data from April 2002 to December 2014 were available, but we only used the 2002-2014 period, because 2014 marks the end of the WRR tier 2 reanalysis. During this period, some months are not available, with the result that 146 months are considered.

2.2.2 Snow Depth

To perform the evaluation of snow depth in the WRR reanalysis, 4 different sources of daily data exist, in different regions of the world. First, 600 stations over Russia from RIHMI-WDC (All-Russia Research Institute of Hydrometeorological Information - World Data Centre, described in Bulygina et al. 2014) with more than 20 years year-round data, the first records starting in 1874. Over USA, 355 stations over 30°N have been retrieved from the National Climatic Data Center (NCDC, ftp ://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/), with more than 40 years and 350 observations per year, the first records starting in 1889. Data from NCDC over Germany, Netherlands Norway and Sweden are also available (ftp ://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/), 448 stations with more than 40 years, 350 observations per year, the first records starting in 1887. We used also 719 quality controlled stations of daily measurements of Canadian snow depth from Ross Brown (Environment Canada, Brown and Braaten 1998) and NCDC : more than 20 years with 300 observations per year on average, first records starting in 1881, last in 2003, and extension beyond 2003 from NCDC. This ensemble of 2120 stations constitute our base of available observations. As explained in Section 2.4, not all of these stations will not be included in our analysis.

Fig.2.1 Snow depth stations

(10)

2.3 Models

2.3.1 Terrestrial water storage

In this report, to distinguish the models, the acronyms corresponding to the institutes (first column of Table 2.1) will be used to refer to the model simulations from these institutes.

For comparison to GRACE data, the monthly TWS variations simulated by the models are calculated in terms of anomalies of the sum of total soil moisture (TotMoist), surface water storage (SurfStor, including lakes reservoirs, rivers,..), ground water reservoir (GoundMoist), snow water equivalent (SWE), snow water equivalent intercepted by vegetation (SWEVeg) and total canopy water storage (CanopInt). Ideally, all these variables are needed to estimate the variations of terrestrial water storage as:

ΔTWS = ΔTotMoist + ΔSurfStor + ΔGroundMoist + ΔSWE + ΔSWEVeg + ΔCanopInt However, a significant disparity exists between the different models taking part in EartH2Observe, and the required variables are not always output (see Table 2.1).

Table 2.1 Variables required for the calcul of ΔTWS and their availability for each model.

Consequently, depending on the model, the calculated ΔTWS varies and only includes 5, 4 or 3 of these variables. The simulated ΔTWS from the different models, compared below (Section 2.4), are consequently the sum of different sets of variables. This variation does not create an absolute barrier to performing a global analysis of the WRR reanalysis and having an overview of the ability of the models to reproduce the TWS variations. However, because the calculated TWS for the models are not exactly similar, it is not possible to make a detailed comparison and rank the models. For instance, we decided to not conserve for the study the simulations from ETH, because only SWE was available. Also, output variables from the LISFLOOD model were not available when the comparison was performed, and is then not included in the following.

Finally, as mentioned above, the GRACE TWS estimates was first filtered in order to remove noise and errors in the gravity field measurements, which can modify the signal by reducing the seasonal amplitude of the final TWS. To be consistent with the GRACE data, the simulated TWS were

(11)

altitudes of the station and the grid point are compared, and if the difference exceeds 100m, the station is eliminated. Some time-criteria are also introduced, and stations with a lot of missing values can’t be used. For example, it is also estimated that if more than 5 consecutive days are missing, the station can’t be used to determinate the annual maximum depth snow. Finally, 1424 stations are used to obtain the results given in Table 2.6.

Table 2.2 Snow depth and its availability for each model.

2.4 Terrestrial Water Storage – RESULTS

To begin the validation of the observed and simulated ΔTWS, we first present analysis at the global scale. Global average in this study was computed using a latitudinal weighting. Figure 2.2 a) shows the monthly global average of simulated and observed ΔTWS, from 2002 to 2014. An important seasonal cycle exists and is represented, for global averaged ΔTWS, Figure 2.2 b). A maximum of the global averaged ΔTWS is observed (black line) during boreal spring and a minimum in autumn. The seasonal cycle is globally well reproduced by the different models. The seasonal cycle of the multi- model mean (grey line) presents a comparable amplitude, but there is a noticeable time lag of one month in the multi-model mean. All models have a tendency to simulate an early seasonal cycle, where the size of the lag varies with the model. It is clear from the timeseries 2002 to 2014 (Fig 2.2) that the seasonal cycle is correctly reproduced, and that the principal differences between GRACE and the models come from the low frequency variability (the trend). In the GRACE product, global averaged ΔTWS tends to decrease. Some models (e.g. METFR) are able to simulate this negative trend, others (e.g. UNIVU) show a contrary positive trend. However, maps of trends have been

(12)

drawn, and the trends displayed in Fig 2.2 are not representative of a global signal, but are mainly due to some strong trends over few located grid points.

Figure 2.3 shows the spatial distribution of the climatological ΔTWS simulated by the models and estimated by GRACE (first line) from 2002 to 2014, for DJF, MAM, JJA and SON respectively (left to right). For each season, the spatial correlation between the model and GRACE is indicated bottom left. The global spatial pattern of ΔTWS is well represented by the models, particularly in MAM and SON, as already shown in previous studies (Vergnes et al. 2012). Despite good model performance in terms of anomalies, some models significantly underestimate ΔTWS (NERC, CSIRO or CNRS). But it is important to remember that some models only have 3 of the 6 variables necessary to calculate a perfect ΔTWS. It is concluded here however, that the spatial and seasonal structure are generally acceptable.

Fig.2.2 a) Global average of monthly ΔTWS since 2002, for the GRACE product (black line) and the models (coloured lines). b) Annual cycle of ΔTWS for the 2002-2014 period. Same colour code as a)

except the grey line for the multi-model mean.

(13)

Fig. 2.3 Climatological comparison of the total TWS (cm) between GRACE (top line) and models for (left to right) DJF, MAM,JJA and SON. For each model and each season, the spatial correlation with

the TWS GRACE product is indicated bottom left of each map.

The zonal averages of the ΔTWS are given Figure 2.4. In the left column the GRACE product is plotted (black line) with all the models (colours), and in the right column with the multi-model mean with the multi-model spread (in red). The zonal average of the multi-model ΔTWS gives a good estimation of the ΔTWS and is quite similar to the GRACE product, especially in MAM and SON, as noted previously. Although the multi-model mean corresponds closely to the GRACE product, the largest spread between models appears where the ΔTWS is the largest, for instance in the tropical regions.

(14)

Fig. 2.4 Left column: zonal average of GRACE TWS (black line) and all models (coloured lines) for (top to bottom) DJF, MAM, JJA and SON. Right column: GRACE TWS product in black, multi-model average

in red, and in pink the multi-model spread.

As seen in the previous figures, the spatial structures of ΔTWS and the values of the anomalies are not uniform and depend of the geographical region. That's why we have chosen to continue the analysis by defining and using 11 geographical boxes. These boxes are represented in Figure 2.5 and correspond to Northern America/Canada (CAN), Western North America (AMO), Eastern North America (AME), Northern Europe (EUN), Southern Europe (EUS), Siberia (SIB), Sahel (SAH), Asia (ASI), Amazon (AMA), Eastern South America (AMS) and Central Africa (AFR). All results presented thereafter refer to these boxes.

(15)

Fig. 2.5 Location of the 11 geographical boxes used for the analysis of TWS.

Figure 2.6 compares the annual cycles of the simulated ΔTWS with the GRACE estimates over the 11 boxes defined Figure 2.5. For some regions (CAN, AMO, AMA, AMS, SIB, SAH), as previously discussed in the global mean, the simulated ΔTWS seasonal cycle is shifted and ahead of the GRACE estimate.

Fig. 2.6 TWS seasonal cycle over the 11 regions previously defined. Black line: GRACE product.

coloured lines: simulated TWS.

(16)

Fig.2.7 Monthly mean ΔTWS anomalies (seasonal cycle removed) over the 11 regions previously defined from 2002 to 2014. Black line: GRACE product. coloured lines: simulated ΔTWS.

A full timeseries of monthly mean ΔTWS anomalies, observed and simulated, averaged over the 11 boxes are represented Figure 2.7. Correlations between the simulated ΔTWS and the GRACE product, corresponding to this Figure 2.7 are grouped in the following table (Table 2.3).

Finally, to complete the analysis of the simulated ΔTWS, we have also calculated the RMSE and the ratio of variance of the monthly anomalies (seasonal cycle removed), between the simulated ΔTWS and the GRACE product, for each model and over each box. Results are grouped together table 2.4 and 2.5.

(17)

Table 2.3 Correlation of the monthly simulated ΔTWS (annual cycle removed) and the GRACE product for each model and each geographical box.

Table 2.4 Same as Table 2.1 for RMSE.

Table 2.5 Same as Table 2.3 for the ratio of variance. A value greater than 1 (lower than 1) corresponds to a overestimation (underestimation) of the variance, relative to the GRACE one.

(18)

For the correlation and RMSE (Table 2.3 and Table 2.4), results are highly dependent on the considered regions. Models show similar behaviors with relatively high correlations over AMO, AME, AMA, EUN and SIB, lower over CAN, AMS and EUS, and correlations are poor over SAH, AFR and ASI.

Obviously, in addition to these general considerations, models are not equals, and some models have consistently better correlation in all regions than others.

Considering the Table with RMSE, results are similar. RMSE is particularly high for some specific regions in any model. This is particularly true for AMA and AME, but also for AMS, SAH, AFR and ASI.

For the correlation and RMSE, we can conclude than some models show better results than others, but similarities exist and some regions seem to be more difficult to simulate than others, whatever model is considered.

The last table groups the results for the ratio of variance (Table 2.5). Contrary to correlation and RMSE, the main differences are here due to the model and not the region. Some models tend to underestimate ΔTWS (e.g., nerc and univk) while others tend to overestimate ΔTWS (e.g., univu). It is important to remember that all the models do not account for all the variables involved in the computation of ΔTWS. Finally, all the models largely underestimate ΔTWS over SAH, AFR and AMAe, except univu.

2.5 Snow Depth – RESULTS

Various statistical indices were computed in order to evaluate the performance of the different models. More than 1400 stations of daily snow depth have been used to calculate bias, correlation and RMSE for the following variables: annual mean snow depth, number of days per year with snow on the ground, duration of the longest period with continuous snow on the ground, first and last day of the year (since 1st August) with continuous snow. A day with snow on the ground is defined as a day with a snow depth higher than 1cm. The following 4 figures represent maps of bias for these variables, follow-up to the annual cycle of observed and simulated snow depth. Finally, a table groups bias, correlation and RMSE for the four models.

(19)

Fig. 2.8 Bias (cm) between the simulated averaged snow depth and the observations over the 1979- 2012 period. Bottom : climatological observed values (cm).

Figure 2.8 shows the annual mean snow depth for the stations used to validate the models (bottom), and the maps of bias for the four available models. In terms of annual average, all the models tend to underestimate the snow depth, except in the highest latitudes of Siberia where METFR, ECMWF and NERC show positive biases.

Figure 2.9 shows number of days with continuous snow on the ground in the observations (bottom map), defined as days with at least 1cm of snow on the ground, and the bias in the models (4 first maps). METFR, CNRS and NERC overestimate this duration except over some regions (highest latitudes for METFR, central part of Northern America for ECMWF. NERC shows more discrepancies, with large positive biases at high latitudes and negative biases at mid-latitudes.

(20)

Fig. 2.9 Bias (days since 1st August) of duration of continuous snow between models and observations over the 1979-2012 period. Bottom : climatological observed values (days).

Figure 2.10 represents the first day (since 1st August) with continuous snow on the ground observed in average (bottom map). The bias (in days) of the simulated date of this first day for each model is represented by the four top maps. All the models tend to simulate an earlier continuous snow period than observed. Biases shown in Figures 2.9 and 2.10 are clearly related, showing that longer continuous snow periods correspond to continuous snow period occurring earlier.

(21)

Fig. 2.10 Bias (days since 1st August) of first day of continuous snow period between models and observations over the 1979-2012 period. Bottom : climatological observed values (days).

Figure 2.11 is identical to Figure 2.10 but for the last days (since 1st August) of the continuous snow period. The bias (in days) of the simulated date of this last day for each model is represented by the four top maps. This time, the NERC experiment presents a strong positive bias, corresponding to a later snow cover, especially over high latitudes. Other models present small positive or negative biases.

(22)

Fig. 2.11 Bias (days since 1st August) of last day of continuous snow period between models and observations over the 1979-2012 period. Bottom : climatological observed values (days).

(23)

METFR, but particularly CNRS show an annual mean snow depth too shallow.

Fig. 2.12 Annual cycle for observed snow depth, and simulated snow depth over the grid points corresponding to the stations.

As shown by the previous figures, the mean state biases are evenly distributed throughout the world.

We have consequently chosen to summarize the models performances in the following table (Table 2.6) which groups global bias, correlation and RMSE for the annual mean snow depth, the number of days with snow in a year, the duration and first and last day of continuous snow period.

(24)

Table 2.6 Statistical scores for the different models compared to observed snow depth stations.

Another point to consider for the validation of the simulated snow depth and not discussed previously, is the ability of the models to capture the interannual variability. In the analysis of seasonal cycle, we demonstrated that results were similar at the global and regional scale. For this analysis of the interannual variability, we have chosen to examine different regions. These regions or boxes, represented in Figure 2.13, have been defined based on the available data (see the stations localisation in Figure 2.1). We have worked with three boxes with a larger number of stations : a box over Canada (CAN) with 236 stations, over Russia (RUS) with 256 stations , and over Eastern North America (AME) with 187 stations. The analysis is also performed in global mean (GLO), considering all stations.

Figure 2.13 Location of the 3 geographical boxes used for the analysis of snow depth.

Figure 2.14 represents annual mean anomalies of observed and simulated snow depth over these different boxes. By representing anomalies, we avoid the mean state bias previously mentioned. It appears that the interannual variability is very well captured by the models. Correlations are between 0.68 (model NERC box RUS) and 0.96 (METFR and ECMWF box AME).

(25)

Figure 2.14 Annual mean (since 1st August) of snow depth anomalies, over the different regions:

Canada (CAN), Eastern North America (AME), Russia (RUS) and global average (GLO). From 1979 to 2011. Result for 1979 corresponds to the mean from August 1979 to July 1980. In black, the observed

stations. In colours, the models.

Figure 2.15 Taylor diagram for snow depth annual mean. Models are identified by colours, and numbers represent geographical boxes.

(26)

A Taylor diagram of the annual snow depth is represented Figure 2.15. Taylor diagrams are used to compare results of models to observations. A perfect model would be represented by a circle located on REF (ratio of standardized deviation and correlation equal to 1). Our four models are represented by a different colour and for each model the numbers 1, 2, 3 and 4 allow us to distinguish the four geographical areas (respectively CAN, RUS, AME and GLO). Although the correlations are good, we can see important discrepancies in the simulated snow depth variance by the different models. All the models underestimate the variance of the annual mean of snow depth, with a mean ratio of 0.62 over CAN, 0.68 over RUS, 0.63 over AME. At the global scale (box GLO), the ratio is closer to 1 (0.92 in average)

(27)

3.1 Introduction

Different types of precipitation datasets have been developed over the past years. From satellite based high-resolution precipitation datasets to reanalysis datasets that are a blend between observations and modelled data. In this chapter we evaluate the use of 5 different precipitation datasets in 3 global hydrological and/or land surface models. More specifically, we assess their performance in the identification and characterization of hydrological droughts at the global scale.

3.2 Methods 3.2.1 Data and models

Four precipitation datasets, CMORPH, GSMAP, TRMM, TRMMT, and MSWEP, have been used in this research as forcing for three global models (jrc, univk, ecmwf) in order to receive monthly river discharge estimates over the period 2001 – 2013 at a 0.25 degree resolution and with global coverage.

3.2.2 Observational data: GRDC

The repository of the Global Runoff Data Centre was used as input for observational data (GRDC, 56068 Koblenz, Germany). The GRDC repository consists of >9,000 river gauge stations in 160 countries, with an average record length of 42 years. For our analysis, we only selected stations that met the following criteria:

1) A minimum of 5-year coverage during the period 1971–2010 with a completeness of observations of ≥99%. To ensure a consistent comparison of modelled and observed data, we sampled the GHMs accordingly;

2) A minimum upstream catchment area of 800 km2 to omit catchments whose hydrological processes cannot be adequately represented by GHMs operating at 0.25° (Hunger and Döll, 2008).

The aforementioned selection criteria resulted in 1394 stations for analysis, of which 602 are outlet stations (figure 1). To match the GRDC stations with the correct grid cell in each of the models, we evaluated which of the 9 grid-cells directly surrounding the coordinates of the respective GRDC station showed the lowest bias in long-term mean modelled discharge compared to its observed discharge. To avoid any artificial variations in results across the different precipitation datasets that may result from the selection of different grid-cells for a particular GRDC station in the models given the use of different precipitation datasets, we used the MSWEP-based grid-cell selection as a reference for all other precipitation data-sets.

(28)

3.2.3 Performance analysis

3.2.3.1 General hydrological performance

To evaluate the general hydrological performance of the different precipitation dataset and global models we used the Kling Gupta Efficiency (KGE) index and its individual components describing the:

(a) linear correlation coefficient (rKGE); (b) the bias ratio (βKGE); and (c) the variability ratio (γKGE) (Gupta et al., 2009; Kling et al., 2012). The KGE is a widely applied indicator for the validation of hydrological performance in modelling studies at the global and regional scale and is considered to provide a good representation of the “closeness” of simulated discharges to observations (Guse et al., 2017; Huang et al. 2017, Kuentz et al., 2013; Nicolle et al., 2014; Revilla-Romero et al. ,2015;

Thiemig et al., 2013, 2015; Thirel et al., 2015; Wöhling et al., 2013). Usage of its three sub- components enables, moreover, identification of reasons for sub-optimal model performance (Gupta et al., 2009; Kling et al., 2012; Thiemig et al., 2013).

3.2.3.2 Hydrological drought characterization and identification

Hydrological droughts have been identified in both the model- and observational datasets by applying a Q90 variable threshold level method (Van Loon et al., 2015) over the monthly discharges at each GRDC location. In doing so periods of hydrological drought have been characterized by means of their frequency and average duration. Subsequently, we applied the Critical Success Index (CSI) to evaluate the ability of the different precipitation datasets and models to correctly identify historical hydrological drought events. The CSI represents the ratio between false alarms (FA), missed hits (M), and correct hits (H), and is an often-used indicator to assess the ability of GHMs to detect extreme hydrological events (Halperin et al., 2016; Hao & AghaKouchak, 2014; Hoch et al., 2017; Schaefer, 1990; Thiemig et al.,2015). As such, the CSI allows for an evaluation of the value of hydrological data and models in policy making, adaptation planning, and disaster risk reduction activities targeted towards the management of freshwater resources and hydrological extremes.

3.2.4 Presentation of results

Results on general hydrological performance and hydrological drought characterization and identification have been summarized at the global scale by using globally weighted mean values, with land area as weighting factor. Moreover, a division was made in results per Giorgi region. Apart from the global and regional-weighted means, we also evaluated at the global and regional scale for how many GRDC stations the model-performance (general and with regards to drought characterization and identification) improved or decreased when using one of the satellite-based precipitation datasets in comparison to the MSWEP dataset.

For a selection of 16 stations we present more in-depth results. For these stations we evaluated in more detail the variation in performance across the precipitation datasets and models by showing their hydrographs, exceedance probability curves, and matrices of missed hits and false alarms.

These results are accompanied by tables summarizing the per station results, both with respect to their general hydrological performance as with regards to the identification and characterization of hydrological droughts. Station-specific results for all other stations can be provided upon request.

(29)

Figure 3.1: Division of Giorgi regions and location of GRDC outlet measurement stations (n= 602).

3.3 Results

3.3.1General hydrological performance

Figures 2a-d shows for each of the precipitation forcing datasets and the global models the weighted mean global KGE performance, as well as for its sub-parameters: the KGE bias ratio, the KGE variability ratio, and the correlation coefficient. When looking at the KGE bias ratio we find that, using the globally-weighted means, CMORPH and GSMAP show an underestimation of the long- term mean monthly discharge for all models. TRMM and TRMMT, on the other hand, show overestimations using ecmwf, an underestimations using jrc. For MSWEP the results are most close to zero for ecmwf and jrc. Univk shows, at the global scale, a slight underestimation, though, for the MSWEP forcing, especially in comparison with TRMM and TRMMT. The globally-weighted mean variability ratio results hint at a significant overestimation of the hydrological variability for all models and under all precipitation forcing datasets. Univk shows consistently the lowest variability ratio as compared to the other models. At the same time, GSMAP shows a significantly higher variability ratio compared to the other precipitation forcing datasets. Using the global-means, we find that correlation coefficients are reasonably high, especially for ecmwf and univk. Results are significantly lower for jrc. No significant differences were found when comparing the different precipitation forcing dataset, although TRMMT shows substantial lower results for jrc.

Figures 3.2e-f show for each precipitation forcing dataset the number of stations with the highest KGE or KGE sub-parameter values. For both the overall KGE as well as for the KGE bias ratio and the KGE correlation coefficient we find for univk and jrc the highest number of stations with the highest performance when using the MSWEP precipitation forcing dataset. For ecmwf the highest performer is often TRMM, although differences between TRMM and WFDEI are small. The KGE variability ratio shows mixed results, variability is best represented in most GRDC stations by univk and ecmwf when using the GSMAP precipitation forcing dataset whilst for jrc the MSWEP precipitation forcing dataset yields the highest results.

(30)

Figure 3.2: Weighted mean global KGE performance and number of stations per precipitation forcing dataset and global model with the highest KGE performance.

Figure 3.3 shows the weighted mean regional KGE performance. When comparing the different regions we find significant difference in KGE performance. For example with very low performance in SSA and WNA across all models and precipitation datasets and generally high performance in AMZ, ENA, ALA, NEU, SAF, EAS. Regional differences in performance across the different models highlight that univk is the overall best performer in CAM, WNA, ENA, ALA, NEU, SAF while results are mixed in the other regions. Only for a few regions significant differences in performance were found when comparing the precipitation forcing datasets being used in the different models. GSMAP clearly underperforms in all models in WAF and SAF, whilst the TRMMT dataset leads to negative outliers in AMZ, CAM, CNA, ENA, ALA, SAF, and EAS, and especially in combination with the jrc model.

(31)

Figure 3.3: Weighted mean regional KGE performance per precipitation forcing dataset and global model using the Giorgi regions.

The hydrographs and KGE performance values for the 16 focus catchments (Figure 3.4, Table 3.1) highlight the differences in results across precipitation forcing dataset and global model further.

When looking at the hydrographs we find, in general, that the largest differences in results stem from the choice of model rather than from the choice of precipitation forcing dataset. Moreover, the results indicate that there is no universally best performing model or precipitation dataset. Different combinations of model and precipitation forcing dataset lead to the highest KGE results for different stations. We find that ecmwf is the best performer for 6 stations, univk for 7 stations, and jrc for 3 stations. On the other hand, when looking at the forcing datasets we find that CMORPH produces the highest results in 5 instances, GSMAP for 7 stations, TRMM and TRMMT for 5 stations, and MSWEP for 6 stations.

(32)

Figure 3.4: Hydrographs of the 16 focus catchments showing the LTM monthly discharge per station per precipitation forcing dataset and global model.

(33)

Congo 0.23 0.28 0.58

-0.50 -0.30 -0.70

0.09 -0.01 0.35

0.09 0.27 -0.03

0.25 0.34 0.44

Volga 0.52

0.37 -0.89

0.60 0.53 -0.34

0.51 0.62 -0.20

0.51 0.62 -0.22

0.51 0.62 -0.29

Ob 0.35

0.64 0.15

0.37 0.64 0.003

0.42 0.62 0.18

0.42 0.57 -0.06

0.42 0.62 0.44

Rhine 0.59

0.04 -0.28

0.76 0.21 0.05

0.76 0.53 0.25

0.76 0.16 -0.20

0.82 0.45 0.74

Oranje -0.43

-1.56 -1.82

-0.42 0.36 -4.16

-0.39 -0.94 NaN

-0.39 -1.45 -2.03

-0.37 0.35 -0.59

Yenisei 0.60

0.57 -0.55

0.61 0.58 -0.51

0.61 0.57 -0.50

0.61 0.53 -0.37

0.61 0.56 0.34

Olenek 0.53

0.50 -0.25

0.53 0.50 -0.26

0.53 0.50 -0.26

0.53 0.50 -0.32

0.53 0.51 0.44 St. Lawrence 0.13

-2.27 -1.90

0.20 -3.22 -2.94

0.05 -2.04 -0.58

0.05 -2.71 -3.55

0.22 -2.79 -0.39

Colorado -1.90

-1.02 -2.51

-2.05 -1.28 -3.21

-1.94 -1.17 -4.05

-1.94 -1.62 -5.46

-1.94 -1.37 -3.43

Fraser 0.65 0.65 0.77 0.77 0.77

(34)

0.67 -0.22

0.58 -0.44

0.47 0.17

0.48 -0.15

0.48 0.62 Assiniboine -0.30

-0.35 -0.17

-0.20 0.14 -0.23

-0.21 -0.01 -0.21

-0.21 -0.23 -0.32

-0.23 0.08 -0.16

Labe 0.63

0.12 -0.15

0.84 0.31 0.17

0.81 0.56 0.29

0.81 0.41 -0.15

0.82 0.51 0.59

Neva 0.24

-0.09 -0.64

0.32 -0.24 -0.49

0.35 -0.21 -0.40

0.35 -0.21 -0.25

0.35 -0.19 0.60

Don 0.14

-0.20 -0.38

-0.25 -0.52 -0.04

-0.31 0.12 0.13

-0.31 -0.07 -0.11

-0.10 -0.04 0.09

3.3.2 Hydrological drought characterization

Subsequently we evaluated the ability of the different global models in combination with their precipitation forcing datasets to characterize hydrological droughts. Hydrological droughts have been identified here by using a Q90 variable threshold level method. Figures 3.5, 3.6 and 3.7 show the weighted mean global and regional drought frequency and average duration per precipitation forcing dataset and global model. Moreover, the figures show the weighted mean global and regional drought frequency and average duration as estimated using the GRDC observations.

All models and all precipitation forcing dataset overestimate both the globally-weighted mean frequency and average duration of Q90 drought events, as compared to the observations. When looking at the differences per global model we find an inverse relation between frequency and average duration. Whilst jrc shows – on average- the highest weighted mean average duration across all precipitation forcing dataset it gives the lowest frequency. The opposite holds for ecwmf and univk. While these models indicate a relative lower average duration, they hint at a relative higher global weighted mean frequency of drought events. Globally weighted mean results on average duration and frequency are relatively similar across the different precipitation forcing dataset when looking at univk and ecmwf. Larger differences were found, however, for jrc. Overall, we find that the differences between estimated frequency and average duration are larger when comparing models, as when comparing forcing datasets. At the global scale, only for the jrc model significant variations are found when looking at the frequency and average duration estimates for the different precipitation forcing datasets.

(35)

When looking at the regional results we find that estimates of average duration and frequency of drought events vary significantly across global models used. In most regions this cross-model variation is larger than the variation in estimates of frequency and average duration when comparing the different precipitation forcing datasets. Regarding these precipitation forcing datasets and their performance in the weighted mean regional drought frequency we find a clear outlier for TRMM when looking at AUS under the jrc and ecmwf model. For GSMAP we find outliers in SAF and SEA. For the regions ALA, NEU, and EAS all model-precipitation forcing combinations significantly overestimate the weighted mean drought frequency compared to estimates based on historical observations. In contrast, we do not find any regions with a distinct underestimation in drought frequency.

Drought duration, on the other hand, is significantly overestimated in WNA, SSA, and MED.

Underestimation across all model-precipitation forcing dataset combinations were find for CAM and NAS. When comparing the results across the use of precipitation forcing dataset we find outliers for TRMMT in AUS (ecmwf, jrc), for GSMAP in WAF and SAF (all global models), WFDEI in TIB (jrc), and CMORPH in NEU and NAS (jrc).

(36)

Figure 3.6: Weighted mean regional drought frequency per precipitation forcing dataset and global model.

Figure 3.7: Weighted mean regional average duration of drought events per precipitation forcing dataset and global model.

(37)

combinations swaps when moving from very low exceedance probability (peak-flows) to very high exceedance probability (low-flows).

Figure 3.8: Exceedance probability curves of monthly discharge in the 16 focus catchments

(38)

3.3.3 Hydrological drought identification

The ability of the different global models and precipitation forcing datasets to correctly identify hydrological droughts was evaluated by means of the Critical Success Index. This index relates the number of missed hits, false alarms, with the number of correct hits. Figure 3.9 shows the weighted mean global CIS performance as well as the number of stations per precipitation forcing dataset and global model with the highest CSI performance. What can be seen from these figures is that, in general, TRMMT results in the globally weighted lowest average CSI performance across the different models. Although differences for univk are small between TRMM and TRMMT. On the other hand, GSMAP and MSWEP result in the highest weighted mean globally average CSI performance for univk and jrc, and ecmwf respectively.

If we not take into account the relative size of the catchment areas the different stations represent but evaluate the number of stations with the highest CSI per precipitation forcing dataset we find that for both jrc and univk, MSWEP gives the highest amount of stations. Hence, CSI values are highest in these stations when using MSWEP as a forcing dataset. For ecmwf we find TRMM to lead to the best CSI results in most stations globally.

Figure 3.9: Weighted mean global CSI performance and number of stations per precipitation forcing dataset and global hydrological model with the highest performance.

When looking at the regional variation in weighted mean CSI performance across the different models and forcing datasets we find that for most regions the variation in results due to model- spread is larger than the variation in result due to choice of precipitation forcing data. Figure 10 shows that this is the case in, for example: AMZ, CAM, ALA, GRL, NEU, WAF, EAS, TIB, NAS, although for almost all regions at least one model shows significant spread across the precipitation forcing datasets. Spread across the different precipitation forcing datasets is for all models relatively large in AUS, ENA, MED, SAF

(39)

Figure 3.10: Weighted mean regional CSI performance per precipitation forcing dataset and global model using the Giorgi regions.

Finally we evaluated for the 15 focus stations the number of missed hits and false alarms when using the observational drought threshold in the different model-forcing dataset combinations (Figure 3.11). In doing so, we see for most stations a significant overestimation of drought occurrences although quite some differences exist between the different models. For example, for the Assiniboine river both univk and jrc show a clear overestimation of the historical drought events, resulting in a relatively high number of false alarms but with correct hits. Ecmwf, on the other hand, indicates a relative underestimation of drought events for this particular station. This results in relatively minor number of false alarms but a number of missed hits in times droughts occurred in the observational dataset. For only a few stations we see a clear distinction in results when comparing the different precipitation forcing datasets. This is the case for example when looking at the use of GSMAP in comparison to other precipitation forcing datasets in the ecmwf or jrc models in the Congo basin; the use of CMORPH in combination with univk in the Labe, Neva, Rhine, Volga, or St. Lawrence catchments; or the use of MSWEP in combination with jrc in the Congo, Don, Fraser, Labe, Mackenzie, Neva, Ob, Rhine of Volga catchments.

(40)

Figure 3.11: Missed hits and false alarms per precipitation forcing dataset and global model for the 16 focus catchments. Per catchment the panel shows the number of missed hits and false alarms with respect to the identification of a Q90 hydrological drought as estimated with the variable threshold level method based on the observational data.

(41)

other sources has provided a unique opportunity to advance understanding of terrestrial hydrologic processes at regions where in situ information is sparse or nonexistent. Africa is a continent where this aspect is particularly emphasized because it is generally characterized by sparse hydrologic observations while at the same time there is need for efficiently managing water resources to enhance water and food security. Ethiopia’s hydrology plays a significant international role, being the headwaters of the Blue Nile Basin, where it contributes about 86% of the total annual flow of the Nile (Sutcliffe, 1999). Managing water resources in a sustainable manner requires at the very least an adequate characterization of hydrological fluxes (precipitation, streamflow, evapotranspiration, and sub-surface flow) and states (soil and vegetation moisture, groundwater) at monthly, seasonal and annual scales.

Comparing models and datasets is an effective way to identify strengths and weaknesses of LSMs and meteorological forcing in different regions in the globe and at different scales. The main objective of this study is to identify the most suitable combination of Land Surface Model (LSM) and precipitation forcing to represent the hydrological cycle components for the Upper Blue Nile. In order to achieve this, we present a comprehensive evaluation of water cycle components for the Upper Blue Nile basin in Ethiopia estimated from four state-of-the-art water resources reanalysis (WRR) products associated with different LSMs, meteorological forcing and precipitation datasets.

Specifically, evaluation is carried out for a dataset produced through NASA’s Land Data Assimilation System (LDAS) regional (FLDAS) scale and the latest version (tier 2) of the global WRR product of the EU Earth2observe project. Each product includes a multi-model ensemble output. The final ensemble output, considering all products, incorporates differences in forcing, model space/time resolutions and assimilation procedures used in each WRR product. Results from this analysis highlight the current strengths and limitations of available WRR datasets for analyzing the hydrological cycle and dynamics of East Africa region and provide unprecedented information for both developers and end users in similar hydroclimatic regimes.

4.2. Methodology 4.2.1 Study Area

The Blue Nile River is the largest tributary of the main Nile River, and its upper part is fully located in Ethiopia (Figure 2.1). The upper Blue Nile River basin contributes about 60% of the annual streamflow to the main Nile River (Conway, 2005). Elevations range from ~4000m in the Ethiopian highlands to ~500 m at the Ethiopia-Sudan border. The precipitation in the upper Blue Nile basin is highly seasonal and subject to a tropical highland monsoon: its main rainy season taking place from

(42)

June to September, a short rainy season from March to May, and a dry season from October to May (Taye et al., 2015; Mellander et al., 2013).

Figure 4.1: Study Area

4.2.2 Evaluation

To identify the most accurate representation of the hydrological cycle components over the Upper Blue Nile basin we conducted a comprehensive intercomparison and evaluation of eighteen different reanalysis products. Two datasets produced through NASA’s Land Data Assimilation System (LDAS) regional (FLDAS) scale and sixteen datasets form the latest version (tier 2) of the global WRR product of the EU Earth2observe project. For each dataset have been used different LSM/Hydrological Models, different precipitation forcing and different spatial resolution (Table 2.1). There are two groups of reanalysis data, the one includes the products with reanalysis precipitation forcing and the other the ones satellite derived precipitation (TRMM and CMORPH) forcing. More specific, precipitation, evapotranspiration (ET) and streamflow were evaluated against daily precipitation from rain gauges measurements, remote sensing derived actual ET (Zhang et al 2016) and in-situ streamflow observations respectively. The evaluation have been conducted over two different spatial scales, the Eldiem basin (177642.9 km2) and the Kessie basin (50418.08 km2) (Figure 4.1) on monthly basis. Figure 2.2 shows a schematic representation of the temporal extent of the evaluation for every variable over the two basins. The performance of the WRR products is assessed using relative error (RE) calculated as:

𝑅𝐸 =𝑆𝐼𝑀 − 𝑂𝐵𝑆 𝑂𝐵𝑆

(43)

Table 4.1: WRR products under consideration

Figure 4.2: Schematic representation of the temporal extent for the evaluation

(44)

4.2.3 Water Budget Analysis

Improved understanding of the water balance in the Blue Nile is of critical importance because of increasingly frequent hydroclimatic extremes under a changing climate. A water budget analysis has been performed over the Eldiem basin from 2002 to 2012 in monthly scale (we consider the hydrological year October to September). The main objectives of this analysis were to understand how different LSMs represent the annual variations in water budget components (Precipitation, ET, Streamflow and Terrestrial Water Storage (TWS)) and the overall hydrologic regime of the basin. To achieve this, the terrestrial water storage estimates were calculated using the equation adapted for watersheds:

𝑇𝑊𝑆(𝑡) = ∫(𝑃(𝑡) − 𝐸𝑇(𝑡) − 𝑄(𝑡))𝑑𝑡

and were evaluated against the NASA Gravity Recovery and Climate Experiment (GRACE) product.

The GRACE data are observations of surface mass changes with a spatial sampling of 1/2 degrees in both latitude and longitude (approx. 56 km at the equator).

4.3. Results

The spatial and temporal distribution of precipitation field have an important role in the water budget. Figure 3.1a and 3.1d show the distribution of the relative errors of the basin-averaged, monthly precipitation form Multi-Source Weighted-Ensemble Precipitation (MSWEP), Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2), Tropical Rainfall Measuring Mission (TRMM) and Climate Precipitation Center Morphing Technique (CMORPH) datasets for the two basins. All products obtained lower relative errors over the large basin. This is because of the complex terrain of the Kessie basin. In situ data to characterize precipitation are usually scarce in complex terrain and remote sensing retrieved precipitation estimates are generally characterized by significant biases. Moreover, as shown in the figures mentioned above, MSWEP obtained the closest agreement with the observations over the two basins. MSWEP merge gauge-, satellite-, and reanalysis-based precipitation data. On the contrary, MERRA-2 and CMORPH achieved the highest relative errors over Eldiem and Kessie respectively.

Figure 4.3b and 4.3e show the relative errors of the basin-averaged, monthly ET from the WRR under consideration. The relative errors of ET seem to be slightly higher than the ones of precipitation.

Here, some models represent better the ET over the large basin and some others perfume better over the small basin, but all of them (except one) underestimate the magnitude of ET. The LSMs that have been used to obtain these data have different schemes to estimate ET, which seem to affect the accuracy of the WRR data more than the precipitation forcing. For the datasets produced with the same LSM but with different precipitation forcing the accuracy of precipitation seems to effect the accuracy of ET estimates. VIC and HTESSEL-CaMa achieved the lower relative errors of ET (in terms of the median values) over Eldiem and Kessie basin respectively.

Figure 4.3c and 4.3f show the distribution of the relative errors of the basin-averaged, monthly streamflow from all the products. The relative errors of the streamflows are much higher than of ET and precipitation which implies that the schemes which the LSMs use to estimate streamflow affect the performance of the products more than the precipitation forcing. Here, in contrast with ET, there is no correlation between the accuracy of the precipitation forcing and the streamflow estimates for the datasets produced with the same LSM but with different precipitation forcing. All datasets are in

(45)

Figure 4.3: Boxplot of relative errors of a, d) precipitation, b, e) ET and c, f) streamflow in monthly scale for products with reanalysis (a, b and d) and satellite derived (d, e and f) precipitation forcing.

Figure 4.4 demonstrates the relative skill with which each WRR product represents the precipitation, ET and streamflow. For precipitation, as mentioned above, all WRR products are in close agreement with the observations both in terms of correlation and magnitude. Notable in this Figure is the low correlation of ET for some models (WaterGAP3, ORCHIDEE and Jules) mainly over the small basin.

WaterGAP3 and VIC showed the poorest performance in terms of RMSE and SD for both basins. Jules and HTESSEL-CaMa and HTESSEL-CaMa exhibited the best performance statistics (RMSE and SD) for the small and the big basin respectively. The products with satellite derived precipitation forcing seems to be in closer agreement with the observations than the ones with the reanalysis precipitation forcing. In contrast with the ET, the correlation of streamflow estimates from all the WRR products is higher than 0.75 except from ORCHIDEE. Although in ET WaterGAP3 exhibited the poorest performance, the opposite is the case for streamflow. WaterGAP3 with all the three different precipitation forcing data showed the best performance among the other models for both basins. For the Eldiem basin, VIC seems to be in close agreement with the observation.

e) f) d)

(46)

Figure 4.4: Taylor diagrams of a, d) precipitation, b, e) ET and c, f) streamflow in monthly scale for products with reanalysis (a, b and d) and satellite derived (d, e and f) precipitation forcing.

Figure 4.5 shows the anomalies of precipitation, ET, streamflow and TWS in centimeters. These anomalies were calculated by subtracting the average value of the twelve years under consideration

a)

f) e)

d)

c) b)

(47)

monthly values of ET and streamflow where slightly shifted. There is strong variability between the TWS estimates of the WRR products but an overall agreement in the sign. The TWS values derived from WRR products does not perform well in terms of magnitude during the first years. However, from 2006 most of the models are in close agreement with the GRACE data. VIC and WaterGAP3 achieved the best representation of the observed TWS anomalies. TWS anomalies seem to be slightly shifted in time related to the observations. This might be because TWS estimates derived from the models seem to be driven primarily by precipitation. Moreover, this time shift of TWS might be affected by the fact that during the winter season, the models underestimate ET and produce high values of streamflow.

(48)

Figure 4.5: Water budget analysis plot. Annual values of precipitation, ET and streamflow and anomalies of TWS estimates from products with a) reanalysis and b) satellite derived precipitation forcing.

a)

b) TWS (cm) Q (cm) ET (cm) P (cm) TWS (cm) Q (cm) ET (cm) P (cm)

(49)

However, a global analysis of the simulated TWS has been carried out to highlight principal performances or bias of models. It has been shown that the spatial and seasonal climatologies are correctly captured (Fig. 2.2 a) and 2.3). The simulated seasonal cycle tends to be ahead of the observed one (Fig. 2.2 b) and Fig. 2.6). Biases in the representation of the variability of monthly anomalies are more dependent on the region than on the model, for instance over Sahel, Asia or Africa, monthly correlations with the GRACE product are low for all models. This is also true for RMSE.

The validation of the snow depth was carried out with 4 the models which have this variable as output: METFR, ECMWF, NERC and CNRS. A multimodel approach is difficult with so few models and has not been considered here. Biases are spatially evenly distributed. The annual mean snow depth is quite well simulated by all models. The seasonal cycle is in phase with the observed one, but longer for ECMWF and NERC, and smaller for METFR and CNRS. The interannual variability is generally well captured by the models, although underestimated by CNRS.

The final study focused on the evaluation of the hydrological cycle components over the Upper Blue Nile basin from a WRR perspective. We evaluated eighteen different reanalysis products related with different LSM and precipitation forcing datasets. We estimated the TWS derived from the fluxes (P, ET and streamflow) of the reanalysis products and we compared four of the most important water budget variables with observations in two spatial scales. The intercomparison and evaluation of these WRR datasets offered improved understanding of the basin scale water budget variables in the Upper Blue Nile basin.

Among the four different precipitation datasets that have been evaluated MSWEP provided the best representation of precipitation over the two basins (in terms of relative errors). In terms of ET, HTESSEL-CaMa (forced with the three different precipitation datasets) achieved the best representation in monthly scale for the two basins. WaterGAP3 exhibited the best representation of streamflow over the large and the small basin with all the three precipitation datasets that it have been forced. There is not one WRR product that is in close agreement with the observations of both ET and streamflow. This might be attributed to the different schemes that the LSM models use to estimate the water budget variables.

Water budget analysis showed that for some products, ET anomalies are slightly shifted in time.

Variations in magnitude of ET and streamflow values seem to be affected mainly from the differences in the schemes that the various LSMs use and not from variations of precipitation forcing. For the TWS there is strong variability of TWS derived from the different WRR products, but an overall

(50)

agreement in terms of the sign (between the products). This analysis showed that changes in TWS seem to be driven primarily by precipitation. High annual precipitation values most of the times resulted into higher positive changes of TWS.

(51)

Brun, E., V. Vionnet, A. Boone, B. Decharme, Y. Peings, R. Valette, F. Karbou and S. Morin, 2013 : Simulation of northern Eurasian local snow depth, mass, and density using a detailed snowpack model and meteorological reanalyses. J. Hydrometeor., 14, 203-219. doi : 10.1175/JHM-D-12-012.1.

Bulygina, O.N., V.N. Razuvaev, T.M. Aleksandrova, 2014. DATA SET “SNOW COVER CHARACTERISTICS FROM RUSSIAN METEOROLOGICAL STATIONS AND FROM SOME METEOROLOGICAL STATIONS OVER THE FORMER USSR TERRITORY”. Сertificate of state registration №2014621201.

http://meteo.ru/english/climate/descrip2.htm

Conway, D., 2005. From headwater tributaries to international river: observing and adapting to climate variability and change in the Nile basin. Global Environmental Change, 15(2), pp.99-114.

Crowley, J. W., J. X. Mitrovica, R. C. Bailey, M. E. Tamisiea, and J. L. Davis, 2006: Land water storage within the Congo Basin inferred from GRACE satellite gravity data. Geophys. Res. Lett., 33, L19402, doi:10.1029/2006GL027070.

Decharme, B., R. Alkama, E. Douville, M. Becker, and A. Cazenave, 2010: Global evaluation of the ISBA-TRIP continental hydrological system. Part II: Uncertainties in river routing simulation related to flow velocity and groundwater storage. J. Hydrometeor., 11, 601–617.

Mellander, P.E., Gebrehiwot, S.G., Gärdenäs, A.I., Bewket, W. and Bishop, K., 2013. Summer rains and dry seasons in the Upper Blue Nile Basin: the predictability of half a century of past and future spatiotemporal patterns. PloS one, 8(7), p.e68461.

Rodell, M., J. S. Famiglietti, J. Chen, S. I. Seneviratne, P. Viterbo, S. Holl, and C. R. Wilson, 2004: Basin scale estimates of evapotranspiration using GRACE and other observations. Geophys. Res.

Lett., 31, L20504, doi:10.1029/2004GL020873.

Seo, K.-W., C. R. Wilson, J. S. Famiglietti, J. L. Chen, and M. Rodell, 2006: Terrestrial water mass load changes from Gravity Recovery and Climate Experiment (GRACE). Water Resour. Res., 42, W05417, doi:10.1029/2005WR004255.

Schmidt, R., and Coauthors, 2006: GRACE observations of changes in continental water storage.

Global Planet. Change, 50, 112–126.

Sutcliffe J, Parks Y.,1999. The Hydrology of the Nile. Technical Report International Association of Hydrological Sciences Oxfordshire, UK. Special Publication No. 5.

Swenson, S., J. Wahr, and P. Milly, 2003: Estimated accuracies of regional water storage variations inferred from the Gravity Recovery and Climate Experiment (GRACE). Water Resour. Res., 39, 1223, doi:10.1029/2002WR001808.

(52)

Tapley, B. D., S. Bettadpur, J. C. Ries, P. F. Thompson, and M. M. Watkins, 2004: GRACE measurements of mass variability in the Earth system. Science, 305, 503–505.

Taye, M.T., Willems, P. and Block, P., 2015. Implications of climate change on hydrological extremes in the Blue Nile basin: a review. Journal of Hydrology: Regional Studies, 4, pp.280-293.

Vergnes, J.-P., and B. Decharme, 2012: A simple groundwater scheme in the TRIP river routing model:

global off-line evaluation against GRACE terrestrial water storage estimates and observed river discharges. Hydrol. Earth Syst. Sci., 16, 3889-3908, doi:10.5194/hess-16-3889-2012.

Wahr, J., S. Swenson, V. Zlotnicki, and I. Velicogna, 2004: Time-variable gravity from GRACE: First results. Geophys. Res.Lett., 31, L11501, doi:10.1029/2004GL019779.

Yeh, P. J.-F., S. C. Swenson, J. S. Famiglietti, and J. Wahr, 2006: Remote sensing of groundwater storage changes in Illinois using the Gravity Recovery and Climate Experiment (GRACE). Water Resour. Res., 42, W12203, doi:10.1029/2006WR005374.

Zhang, K., J.S. Kimball and S.W. Running, 2015. A review of remote sensing based actual evapotranspiration estimation, WIREs WATER, 3, pp.834-853, doi: 10.1002/wat2.1168.

(53)

Références

Documents relatifs

Learners when developing their oral communication in a foreign language learning classroom need to structure, plan meaningful tasks and experiences within this

In this paper we perform an experimental analysis, on about 3,000 medical doc- uments from a Radiotherapy Department, which aims to evaluate how classification performance are

To evaluate the differing power of the different feature sets described in Section 3 to predict the comprehension difficulty of a document we built and evaluated

Considering that the method of indirect rate of growth and the linear econometric methods provided the value for one type of meat that was closest to the real value,

fraction θ of the future return from her productive investment and the market value of the bubble at period t + 2. As in the model AI, two assets serve as a collateral: the capital

fraction θ of the future return from her productive investment and the market value of the bubble at period t + 2. As in the model AI, two assets serve as a collateral: the capital

T here ar e few s tudies qua ntifying t he economic burden of VL on households conducted prior to the elimination initiative showed the profound impact of a VL episode on

In 2001 in the reference model, a total of 1.5 Pg C of anthropogenic carbon had entered the Mediterranean Sea with 52 % from the air–sea flux and 48 % from Atlantic Wa- ter