Poster template by ResearchPosters.co.za
Downscaling using Probabilistic Gaussian Copula Regression model
Mohamed Ali Ben Alaya*
1
, Fateh Chebana
1
& Taha Ouarda
2
1
INRS-ETE, University of Quebec,
2
Masdar Institute of science and technology
1. Introduction
Context:
Atmosphere–ocean general circulation models (AOGCMs) are useful to simulate large-scale climate evolutions. However, AOGCM data resolution is too coarse for regional and local climate studies. Downscaling techniques have been developed to refine AOGCM data and provide information at more relevant scales. Among a wide range of available approaches, regression-based methods are commonly used for downscaling AOGCM data.Motivation:
When several variables are considered at multiple sites, regression models are employed to reproduce the observed climate characteristics at small scale, such as the temporal variability and the relationship between sites and variables. Limitations of traditional regression-based approaches:
The underestimation of the temporal variability and the poor representation of extreme events.
The assumption of normality of data.
The inconsistency between downscaled and observed relationships between sites and variables.Objective
: Introducing a Probabilistic Gaussian Copula Regression (PGCR) model to address the limitations of traditional regression-based approaches in a downscaling perspective.3. Methodology
6. Conclusion
Contact Information
A PGCR model is proposed for the downscaling of AOGCM predictors to multiple predictands at multi-sites simultaneously and to preserve relationships between sites and variables.
PGCR offers a simple way to reproduce relationships of multiplevariables as well as to introduce exogenous forcing covariates.
The modular structure of the proposed PGCR is mathematically rich, making it a valuable tool in hydrometeorology and climate research analyses where often non-normally distributed random variables, like precipitation, wind speed, cloud cover, humidity, are involved.Mohamed Ali Ben Alaya, PhD student
490 rue de la couronne
G1K 9A9, Québec, Canada
Tel: 418 654 2430#4468
Email: mohammed_ali.ben_alaya@ete.inrs.ca
Probabilistic Regression
Gaussian Copula
The study area is located in Quebec (Canada), in latitudes between 45° N and 60°N and longitudes between 65°W and 75° W.
2. Data series and study area
(
)
[
]
= • • • … = • 1 l 1( ') ( '), , ( ') : predictors values on a day t' from the validation period.
m: number of predictand variables.
, , : generated uniform random variables using the gaussian
copula. m x t x t x t v v v f
[
]
•( ) | ( ) : conditional PDF of the k predictands on a day t'.
( ( )) : Conditional cumulative distribution function of the k predictands on a day t'. th t k k th tk k y t x t F y t
Example of PGCR results at cedars station during 1991.
Probabilistic Gaussian Copula Regression (PGCR)
The proposed PGCR model relies on a probabilistic regression framework to specify the marginal distribution for each downscaled variable at a given day through AOGCM predictors, and handles multivariate dependence between sites and variables using a Gaussian copula.
{
− −}
Φ Φ 1 … Φ 1
1
A copula is a multivariate distribution whose marginals are uniformly distributed on the interval [0, C(w;C) = 1]. A Gaussian copula C ( ), , is define ( ); d as : where Φ q w w Cq 1 is the standard normal cumulative distribution function and Φ is
the cumulative distribution function for a multivariate normal vector w = (w ,…,w ) having zero mean and covariance matrix C.
q
q
- Predictands series Y: daily Tmax , Tmin and precipitation at 4 stations.
- Predictors X: 6 CGCM3 grid, for each grid point, 25 NCEP predictors are
provided (150 predictors). A PCA is performed and the first 40 components are retained as predictors.
- All data cover the period between Jan 1st 1961 and 31 Dec 31 2000.
4. Model Calibration & Testing
Calibration
• Data from 01-01-1961 to 31-12-1990.
• Conditional distribution choices for the probabilistic regression model:
- Tmax and Tmin: Normal distributions.
- Precipitation occurrences (Poc): Bernoulli distribution. - Precipitation amounts (Pam): Gamma distribution.
Validation
• Data from 01-01-1991 to 31-12-2000.
• Statistical criteria: RMSE, ME and the difference between observed and
modeled variances D.
• Scatter plot of cross-correlations.
• Comparaison with Multivariate Multiple Linear Regression (MMLR), and
Multivariate Multisite Statistical Downscaling Model (MMSDM).
5. Results
Quality assessment of the estimated series for GCQR, MMLR and MMSDM during the validation period (1991–2000) for the four weather stations. Criteria are ME, RMSE, and differences between observed and modeled variance D. For PGCR and MMSDM models criteria were calculated from the conditional mean.
Scatter plot of observed and modeled cross-predictand correlations. Correlation values of PGCR and MMSDM models are obtained using the mean of the correlation values calculated from 100 simulations.
Observed versus modeled differences of daily temperatures on wet days and dry days.
The distribution of each predictand at the observed sites is represented by an appropriate probability density function (PDF), and then a regression model with outputs are parameters of the PDF is employed.
Example: a conditional normally distributed response would have two outputs, one for the conditional mean and one for the conditional variance.
1( ') ( ') l x t x t 1 ' 1 ' ( ')| ( ') ( ')| ( ') t m mt f y t t f y t t x x 1 ' [ , , )] 1 ( ' k t k v t k m F− = 1 ˆ ( ') ˆ ( ')m y t y t Probabilistic
regression GaussianCopula
1 m v v
PGCR
50 100 150 200 250 300 350 0 20 40 60 Day P rec ipi tat ion ( m m ) PGCR Observed 50 100 150 200 250 300 350 -40 -20 0 20 40 Day T ma x (° C ) 95% confidence interval Observed PGCR 50 100 150 200 250 300 350 -40 -20 0 20 40 Day T mi n (° C ) 0.88 0.9 0.92 0.94 0.96 0.98 0.88 0.9 0.92 0.94 0.96 0.98 Tmax-Tmin Observed correlation C or rel at ion -0.05 0 0.05 0.1 0.15 -0.05 0 0.05 0.1 0.15 Tmax-Pam Observed correlation C or rel at ion -0.05 0 0.05 -0.05 0 0.05 Tmax-Poc Observed correlation C or rel at ion 0 0.05 0.1 0.15 0 0.05 0.1 0.15 Tmin-Pam Observed correlation C or rel at ion -0.05 0 0.05 0.1 -0.05 0 0.05 0.1 Tmin-Poc Observed correlation C or rel at ion 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 Pam-Poc Observed correlation C or rel at ion Observed MMLR PGCR MMSDM
Results indicate the superiority of the proposed PGCR over boththe multivariate multiple linear regression model and the multivariate multisite statistical downscaling model in term of the three statistical criteria.
In terms of reproducing spatial and inter-variable properties, bothPGCR and MMSDM models provide interesting results.
Traditional Regression
Probabilistic Regression
Conditional
mean &variance
(heteroscedastic) Conditional mean, Canstant variance (homoscedastic) -2 0 2 4 6 -2 0 2 4 6 Tmax
Observed diffrences on wet and dry days
E s ti m a te d 0 2 4 6 0 2 4 6 Tmin
Observed diffrences on wet and dry days
E s ti m a te d Observed MMLR PGCR MMSDM
Poster template by ResearchPosters.co.za
Downscaling using Probabilistic Gaussian Copula Regression model
Mohamed Ali Ben Alaya*
1
, Fateh Chebana
1
& Taha Ouarda
2
1
INRS-ETE, University of Quebec,
2
Masdar Institute of science and technology
1. Introduction
Context:
Atmosphere–ocean general circulation models (AOGCMs) are useful to simulate large-scale climate evolutions. However, AOGCM data resolution is too coarse for regional and local climate studies. Downscaling techniques have been developed to refine AOGCM data and provide information at more relevant scales. Among a wide range of available approaches, regression-based methods are commonly used for downscaling AOGCM data.Motivation:
When several variables are considered at multiple sites, regression models are employed to reproduce the observed climate characteristics at small scale, such as the temporal variability and the relationship between sites and variables. Limitations of traditional regression-based approaches:
The underestimation of the temporal variability and the poor representation of extreme events.
The assumption of normality of data.
The inconsistency between downscaled and observed relationships between sites and variables.Objective
: Introducing a Probabilistic Gaussian Copula Regression (PGCR) model to address the limitations of traditional regression-based approaches in a downscaling perspective.3. Methodology
6. Conclusion
Contact Information
A PGCR model is proposed for the downscaling of AOGCM predictors to multiple predictands at multi-sites simultaneously and to preserve relationships between sites and variables.
PGCR offers a simple way to reproduce relationships of multiplevariables as well as to introduce exogenous forcing covariates.
The modular structure of the proposed PGCR is mathematically rich, making it a valuable tool in hydrometeorology and climate research analyses where often non-normally distributed random variables, like precipitation, wind speed, cloud cover, humidity, are involved.Mohamed Ali Ben Alaya, PhD student
490 rue de la couronne
G1K 9A9, Québec, Canada
Tel: 418 654 2430#4468
Email: mohammed_ali.ben_alaya@ete.inrs.ca
Probabilistic Regression
Gaussian Copula
The study area is located in Quebec (Canada), in latitudes between 45° N and 60°N and longitudes between 65°W and 75° W.
2. Data series and study area
(
)
[
]
= • • • … = • 1 l 1( ') ( '), , ( ') : predictors values on a day t' from the validation period.
m: number of predictand variables.
, , : generated uniform random variables using the gaussian
copula. m x t x t x t v v v f
[
]
•( ) | ( ) : conditional PDF of the k predictands on a day t'.
( ( )) : Conditional cumulative distribution function of the k predictands on a day t'. th t k k th tk k y t x t F y t
Example of PGCR results at cedars station during 1991.
Probabilistic Gaussian Copula Regression (PGCR)
The proposed PGCR model relies on a probabilistic regression framework to specify the marginal distribution for each downscaled variable at a given day through AOGCM predictors, and handles multivariate dependence between sites and variables using a Gaussian copula.
{
− −}
Φ Φ 1 … Φ 1
1
A copula is a multivariate distribution whose marginals are uniformly distributed on the interval [0, C(w;C) = 1]. A Gaussian copula C ( ), , is define ( ); d as : where Φ q w w Cq 1 is the standard normal cumulative distribution function and Φ is
the cumulative distribution function for a multivariate normal vector w = (w ,…,w ) having zero mean and covariance matrix C.
q
q
- Predictands series Y: daily Tmax , Tmin and precipitation at 4 stations.
- Predictors X: 6 CGCM3 grid, for each grid point, 25 NCEP predictors are
provided (150 predictors). A PCA is performed and the first 40 components are retained as predictors.
- All data cover the period between Jan 1st 1961 and 31 Dec 31 2000.
4. Model Calibration & Testing
Calibration
• Data from 01-01-1961 to 31-12-1990.
• Conditional distribution choices for the probabilistic regression model:
- Tmax and Tmin: Normal distributions.
- Precipitation occurrences (Poc): Bernoulli distribution. - Precipitation amounts (Pam): Gamma distribution.
Validation
• Data from 01-01-1991 to 31-12-2000.
• Statistical criteria: RMSE, ME and the difference between observed and
modeled variances D.
• Scatter plot of cross-correlations.
• Comparaison with Multivariate Multiple Linear Regression (MMLR), and
Multivariate Multisite Statistical Downscaling Model (MMSDM).
5. Results
Quality assessment of the estimated series for GCQR, MMLR and MMSDM during the validation period (1991–2000) for the four weather stations. Criteria are ME, RMSE, and differences between observed and modeled variance D. For PGCR and MMSDM models criteria were calculated from the conditional mean.
Scatter plot of observed and modeled cross-predictand correlations. Correlation values of PGCR and MMSDM models are obtained using the mean of the correlation values calculated from 100 simulations.
Observed versus modeled differences of daily temperatures on wet days and dry days.
The distribution of each predictand at the observed sites is represented by an appropriate probability density function (PDF), and then a regression model with outputs are parameters of the PDF is employed.
Example: a conditional normally distributed response would have two outputs, one for the conditional mean and one for the conditional variance.
1( ') ( ') l x t x t 1 ' 1 ' ( ')| ( ') ( ')| ( ') t m mt f y t t f y t t x x 1 ' [ , , )] 1 ( ' k t k v t k m F− = 1 ˆ ( ') ˆ ( ')m y t y t Probabilistic
regression GaussianCopula
1 m v v
PGCR
50 100 150 200 250 300 350 0 20 40 60 Day P rec ipi tat ion ( m m ) PGCR Observed 50 100 150 200 250 300 350 -40 -20 0 20 40 Day T ma x (° C ) 95% confidence interval Observed PGCR 50 100 150 200 250 300 350 -40 -20 0 20 40 Day T mi n (° C ) 0.88 0.9 0.92 0.94 0.96 0.98 0.88 0.9 0.92 0.94 0.96 0.98 Tmax-Tmin Observed correlation C or rel at ion -0.05 0 0.05 0.1 0.15 -0.05 0 0.05 0.1 0.15 Tmax-Pam Observed correlation C or rel at ion -0.05 0 0.05 -0.05 0 0.05 Tmax-Poc Observed correlation C or rel at ion 0 0.05 0.1 0.15 0 0.05 0.1 0.15 Tmin-Pam Observed correlation C or rel at ion -0.05 0 0.05 0.1 -0.05 0 0.05 0.1 Tmin-Poc Observed correlation C or rel at ion 0.1 0.2 0.3 0.4 0.1 0.2 0.3 0.4 Pam-Poc Observed correlation C or rel at ion Observed MMLR PGCR MMSDM
Results indicate the superiority of the proposed PGCR over boththe multivariate multiple linear regression model and the multivariate multisite statistical downscaling model in term of the three statistical criteria.
In terms of reproducing spatial and inter-variable properties, bothPGCR and MMSDM models provide interesting results.
Traditional Regression
Probabilistic Regression
Conditional
mean &variance
(heteroscedastic) Conditional mean, Canstant variance (homoscedastic)
December 2014
-2 0 2 4 6 -2 0 2 4 6 TmaxObserved diffrences on wet and dry days
E s ti m a te d 0 2 4 6 0 2 4 6 Tmin
Observed diffrences on wet and dry days
E s ti m a te d Observed MMLR PGCR MMSDM