• Aucun résultat trouvé

Downscaling using Probabilistic Gaussian Copula Regression model.

N/A
N/A
Protected

Academic year: 2021

Partager "Downscaling using Probabilistic Gaussian Copula Regression model."

Copied!
2
0
0

Texte intégral

(1)

Poster template by ResearchPosters.co.za

Downscaling using Probabilistic Gaussian Copula Regression model

Mohamed Ali Ben Alaya*

1

, Fateh Chebana

1

& Taha Ouarda

2

1

INRS-ETE, University of Quebec,

2

Masdar Institute of science and technology

1. Introduction

Context:

Atmosphere–ocean general circulation models (AOGCMs) are useful to simulate large-scale climate evolutions. However, AOGCM data resolution is too coarse for regional and local climate studies. Downscaling techniques have been developed to refine AOGCM data and provide information at more relevant scales. Among a wide range of available approaches, regression-based methods are commonly used for downscaling AOGCM data.

Motivation:

When several variables are considered at multiple sites, regression models are employed to reproduce the observed climate characteristics at small scale, such as the temporal variability and the relationship between sites and variables.

 Limitations of traditional regression-based approaches:

The underestimation of the temporal variability and the poor representation of extreme events.

The assumption of normality of data.

The inconsistency between downscaled and observed relationships between sites and variables.

Objective

: Introducing a Probabilistic Gaussian Copula Regression (PGCR) model to address the limitations of traditional regression-based approaches in a downscaling perspective.

3. Methodology

6. Conclusion

Contact Information

A PGCR model is proposed for the downscaling of AOGCM predictors to multiple predictands at multi-sites simultaneously and to preserve relationships between sites and variables.

PGCR offers a simple way to reproduce relationships of multiple

variables as well as to introduce exogenous forcing covariates.

The modular structure of the proposed PGCR is mathematically rich, making it a valuable tool in hydrometeorology and climate research analyses where often non-normally distributed random variables, like precipitation, wind speed, cloud cover, humidity, are involved.

Mohamed Ali Ben Alaya, PhD student

490 rue de la couronne

G1K 9A9, Québec, Canada

Tel: 418 654 2430#4468

Email: mohammed_ali.ben_alaya@ete.inrs.ca

Probabilistic Regression

Gaussian Copula

The study area is located in Quebec (Canada), in latitudes between 45° N and 60°N and longitudes between 65°W and 75° W.

2. Data series and study area

(

)

[

]

= • • • … = •  1 l 1

( ') ( '), , ( ') : predictors values on a day t' from the validation period.

m: number of predictand variables.

, , : generated uniform random variables using the gaussian

copula. m x t x t x t v v v f

[

]

( ) | ( ) : conditional PDF of the k predictands on a day t'.

( ( )) : Conditional cumulative distribution function of the k predictands on a day t'. th t k k th tk k y t x t F y t

Example of PGCR results at cedars station during 1991.

Probabilistic Gaussian Copula Regression (PGCR)

The proposed PGCR model relies on a probabilistic regression framework to specify the marginal distribution for each downscaled variable at a given day through AOGCM predictors, and handles multivariate dependence between sites and variables using a Gaussian copula.

{

− −

}

Φ Φ 1 … Φ 1

1

A copula is a multivariate distribution whose marginals are uniformly distributed on the interval [0, C(w;C) = 1]. A Gaussian copula C ( ), , is define ( ); d as : where Φ q w w Cq 1 is the standard normal cumulative distribution function and Φ is

the cumulative distribution function for a multivariate normal vector w = (w ,…,w ) having zero mean and covariance matrix C.

q

q

- Predictands series Y: daily Tmax , Tmin and precipitation at 4 stations.

- Predictors X: 6 CGCM3 grid, for each grid point, 25 NCEP predictors are

provided (150 predictors). A PCA is performed and the first 40 components are retained as predictors.

- All data cover the period between Jan 1st 1961 and 31 Dec 31 2000.

4. Model Calibration & Testing

Calibration

• Data from 01-01-1961 to 31-12-1990.

• Conditional distribution choices for the probabilistic regression model:

- Tmax and Tmin: Normal distributions.

- Precipitation occurrences (Poc): Bernoulli distribution. - Precipitation amounts (Pam): Gamma distribution.

Validation

• Data from 01-01-1991 to 31-12-2000.

• Statistical criteria: RMSE, ME and the difference between observed and

modeled variances D.

• Scatter plot of cross-correlations.

• Comparaison with Multivariate Multiple Linear Regression (MMLR), and

Multivariate Multisite Statistical Downscaling Model (MMSDM).

5. Results

Quality assessment of the estimated series for GCQR, MMLR and MMSDM during the validation period (1991–2000) for the four weather stations. Criteria are ME, RMSE, and differences between observed and modeled variance D. For PGCR and MMSDM models criteria were calculated from the conditional mean.

Scatter plot of observed and modeled cross-predictand correlations. Correlation values of PGCR and MMSDM models are obtained using the mean of the correlation values calculated from 100 simulations.

Observed versus modeled differences of daily temperatures on wet days and dry days.

The distribution of each predictand at the observed sites is represented by an appropriate probability density function (PDF), and then a regression model with outputs are parameters of the PDF is employed.

Example: a conditional normally distributed response would have two outputs, one for the conditional mean and one for the conditional variance.

1( ') ( ') l x t x t  1 ' 1 ' ( ')| ( ') ( ')| ( ') t m mt f y t t f y t t             x x  1 ' [ , , )] 1 ( ' k t k v t k m F− =  1 ˆ ( ') ˆ ( ')m y t y tProbabilistic

regression GaussianCopula

1 m v v

PGCR

50 100 150 200 250 300 350 0 20 40 60 Day P rec ipi tat ion ( m m ) PGCR Observed 50 100 150 200 250 300 350 -40 -20 0 20 40 Day T ma x (° C ) 95% confidence interval Observed PGCR 50 100 150 200 250 300 350 -40 -20 0 20 40 Day T mi n (° C ) 0.88 0.9 0.92 0.94 0.96 0.98 0.88 0.9 0.92 0.94 0.96 0.98 Tmax-Tmin Observed correlation C or rel at ion -0.05 0 0.05 0.1 0.15 -0.05 0 0.05 0.1 0.15 Tmax-Pam Observed correlation C or rel at ion -0.05 0 0.05 -0.05 0 0.05 Tmax-Poc Observed correlation C or rel at ion 0 0.05 0.1 0.15 0 0.05 0.1 0.15 Tmin-Pam Observed correlation C or rel at ion -0.05 0 0.05 0.1 -0.05 0 0.05 0.1 Tmin-Poc Observed correlation C or rel at ion 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 Pam-Poc Observed correlation C or rel at ion Observed MMLR PGCR MMSDM

Results indicate the superiority of the proposed PGCR over both

the multivariate multiple linear regression model and the multivariate multisite statistical downscaling model in term of the three statistical criteria.

In terms of reproducing spatial and inter-variable properties, both

PGCR and MMSDM models provide interesting results.

Traditional Regression

Probabilistic Regression

Conditional

mean &variance

(heteroscedastic) Conditional mean, Canstant variance (homoscedastic) -2 0 2 4 6 -2 0 2 4 6 Tmax

Observed diffrences on wet and dry days

E s ti m a te d 0 2 4 6 0 2 4 6 Tmin

Observed diffrences on wet and dry days

E s ti m a te d Observed MMLR PGCR MMSDM

(2)

Poster template by ResearchPosters.co.za

Downscaling using Probabilistic Gaussian Copula Regression model

Mohamed Ali Ben Alaya*

1

, Fateh Chebana

1

& Taha Ouarda

2

1

INRS-ETE, University of Quebec,

2

Masdar Institute of science and technology

1. Introduction

Context:

Atmosphere–ocean general circulation models (AOGCMs) are useful to simulate large-scale climate evolutions. However, AOGCM data resolution is too coarse for regional and local climate studies. Downscaling techniques have been developed to refine AOGCM data and provide information at more relevant scales. Among a wide range of available approaches, regression-based methods are commonly used for downscaling AOGCM data.

Motivation:

When several variables are considered at multiple sites, regression models are employed to reproduce the observed climate characteristics at small scale, such as the temporal variability and the relationship between sites and variables.

 Limitations of traditional regression-based approaches:

The underestimation of the temporal variability and the poor representation of extreme events.

The assumption of normality of data.

The inconsistency between downscaled and observed relationships between sites and variables.

Objective

: Introducing a Probabilistic Gaussian Copula Regression (PGCR) model to address the limitations of traditional regression-based approaches in a downscaling perspective.

3. Methodology

6. Conclusion

Contact Information

A PGCR model is proposed for the downscaling of AOGCM predictors to multiple predictands at multi-sites simultaneously and to preserve relationships between sites and variables.

PGCR offers a simple way to reproduce relationships of multiple

variables as well as to introduce exogenous forcing covariates.

The modular structure of the proposed PGCR is mathematically rich, making it a valuable tool in hydrometeorology and climate research analyses where often non-normally distributed random variables, like precipitation, wind speed, cloud cover, humidity, are involved.

Mohamed Ali Ben Alaya, PhD student

490 rue de la couronne

G1K 9A9, Québec, Canada

Tel: 418 654 2430#4468

Email: mohammed_ali.ben_alaya@ete.inrs.ca

Probabilistic Regression

Gaussian Copula

The study area is located in Quebec (Canada), in latitudes between 45° N and 60°N and longitudes between 65°W and 75° W.

2. Data series and study area

(

)

[

]

= • • • … = •  1 l 1

( ') ( '), , ( ') : predictors values on a day t' from the validation period.

m: number of predictand variables.

, , : generated uniform random variables using the gaussian

copula. m x t x t x t v v v f

[

]

( ) | ( ) : conditional PDF of the k predictands on a day t'.

( ( )) : Conditional cumulative distribution function of the k predictands on a day t'. th t k k th tk k y t x t F y t

Example of PGCR results at cedars station during 1991.

Probabilistic Gaussian Copula Regression (PGCR)

The proposed PGCR model relies on a probabilistic regression framework to specify the marginal distribution for each downscaled variable at a given day through AOGCM predictors, and handles multivariate dependence between sites and variables using a Gaussian copula.

{

− −

}

Φ Φ 1 … Φ 1

1

A copula is a multivariate distribution whose marginals are uniformly distributed on the interval [0, C(w;C) = 1]. A Gaussian copula C ( ), , is define ( ); d as : where Φ q w w Cq 1 is the standard normal cumulative distribution function and Φ is

the cumulative distribution function for a multivariate normal vector w = (w ,…,w ) having zero mean and covariance matrix C.

q

q

- Predictands series Y: daily Tmax , Tmin and precipitation at 4 stations.

- Predictors X: 6 CGCM3 grid, for each grid point, 25 NCEP predictors are

provided (150 predictors). A PCA is performed and the first 40 components are retained as predictors.

- All data cover the period between Jan 1st 1961 and 31 Dec 31 2000.

4. Model Calibration & Testing

Calibration

• Data from 01-01-1961 to 31-12-1990.

• Conditional distribution choices for the probabilistic regression model:

- Tmax and Tmin: Normal distributions.

- Precipitation occurrences (Poc): Bernoulli distribution. - Precipitation amounts (Pam): Gamma distribution.

Validation

• Data from 01-01-1991 to 31-12-2000.

• Statistical criteria: RMSE, ME and the difference between observed and

modeled variances D.

• Scatter plot of cross-correlations.

• Comparaison with Multivariate Multiple Linear Regression (MMLR), and

Multivariate Multisite Statistical Downscaling Model (MMSDM).

5. Results

Quality assessment of the estimated series for GCQR, MMLR and MMSDM during the validation period (1991–2000) for the four weather stations. Criteria are ME, RMSE, and differences between observed and modeled variance D. For PGCR and MMSDM models criteria were calculated from the conditional mean.

Scatter plot of observed and modeled cross-predictand correlations. Correlation values of PGCR and MMSDM models are obtained using the mean of the correlation values calculated from 100 simulations.

Observed versus modeled differences of daily temperatures on wet days and dry days.

The distribution of each predictand at the observed sites is represented by an appropriate probability density function (PDF), and then a regression model with outputs are parameters of the PDF is employed.

Example: a conditional normally distributed response would have two outputs, one for the conditional mean and one for the conditional variance.

1( ') ( ') l x t x t  1 ' 1 ' ( ')| ( ') ( ')| ( ') t m mt f y t t f y t t             x x  1 ' [ , , )] 1 ( ' k t k v t k m F− =  1 ˆ ( ') ˆ ( ')m y t y tProbabilistic

regression GaussianCopula

1 m v v

PGCR

50 100 150 200 250 300 350 0 20 40 60 Day P rec ipi tat ion ( m m ) PGCR Observed 50 100 150 200 250 300 350 -40 -20 0 20 40 Day T ma x (° C ) 95% confidence interval Observed PGCR 50 100 150 200 250 300 350 -40 -20 0 20 40 Day T mi n (° C ) 0.88 0.9 0.92 0.94 0.96 0.98 0.88 0.9 0.92 0.94 0.96 0.98 Tmax-Tmin Observed correlation C or rel at ion -0.05 0 0.05 0.1 0.15 -0.05 0 0.05 0.1 0.15 Tmax-Pam Observed correlation C or rel at ion -0.05 0 0.05 -0.05 0 0.05 Tmax-Poc Observed correlation C or rel at ion 0 0.05 0.1 0.15 0 0.05 0.1 0.15 Tmin-Pam Observed correlation C or rel at ion -0.05 0 0.05 0.1 -0.05 0 0.05 0.1 Tmin-Poc Observed correlation C or rel at ion 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 Pam-Poc Observed correlation C or rel at ion Observed MMLR PGCR MMSDM

Results indicate the superiority of the proposed PGCR over both

the multivariate multiple linear regression model and the multivariate multisite statistical downscaling model in term of the three statistical criteria.

In terms of reproducing spatial and inter-variable properties, both

PGCR and MMSDM models provide interesting results.

Traditional Regression

Probabilistic Regression

Conditional

mean &variance

(heteroscedastic) Conditional mean, Canstant variance (homoscedastic)

December 2014

-2 0 2 4 6 -2 0 2 4 6 Tmax

Observed diffrences on wet and dry days

E s ti m a te d 0 2 4 6 0 2 4 6 Tmin

Observed diffrences on wet and dry days

E s ti m a te d Observed MMLR PGCR MMSDM

Références

Documents relatifs

Abstract: We wish to estimate conditional density using Gaussian Mixture Regression model with logistic weights and means depending on the covariate.. We aim at selecting the number

We prove that the default times (or any of their minima) in the dynamic Gaussian copula model of Crépey, Jeanblanc, and Wu (2013) are invariance times in the sense of Crépey and

Using both the Wiener chaos expansion on the systemic economic factor and a Gaussian ap- proximation on the associated truncated loss, we significantly reduce the computational

Once the models are trained and applied to the testing set, we analyze the quality of the probabilistic forecasts using several qualitative and quantitative tools: reliability

Using two datasets of 45 days, it is shown that online GPR models based on quasiperiodic kernels outperform both the persistence model and GPR models based on simple kernels,

This procedure is applied to the problem of parametric estimation in a continuous time conditionally Gaussian regression model and to that of estimating the mean vector of

In Section 3 it is shown that the problem of estimating parameter θ in (1) can be reduced to that of estimating the mean in a conditionally Gaussian distribution with a

Our approach is close to the general model selection method pro- posed in Fourdrinier and Pergamenshchikov (2007) for discrete time models with spherically symmetric errors which