• Aucun résultat trouvé

Existence of a common structure

Dans le document Benjamin Dubois pour obtenir le grade de (Page 46-52)

Sep, Oct, Nov

2.4 Exploratory analysis of the local loads

2.4.2 Existence of a common structure

Although the presentation in Section 2.4.1 emphasizes the heterogeneity among the substations, there are still strong similarities between the joint distribution of the load and the input variables. They are particularly visible on the empirical conditional expectations of the loads, presented in Section 2.4.1. We call these resemblances the (unidentified) common structure.

Under the condition that this common structure is sufficiently pervading, it might be relevant to mutualize the information at different substations by coupling the individual models, which is the ambition of this work. In this section, we provide additional evidence that supports this idea.

Correlation of the substations Due to the yearly and weekly cycles, we ex-pect the loads of the different substations to be highly correlated. This is indeed confirmed in Figures2.20 and F.15.

We also present in Figures2.21andF.16the correlations between the substations after subtracting an estimator of the load computed with kernel regression for each day of the year and each hour of the week.

−1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 Correlations between the substations

0 100000

#pairs

FIGURE 2.20: Correlations between the substations

Histogram of the correlation between the substations over the 5-year-long dataset. Note that we should investigate whether the negative correlations might be due to load reports or have another interpretation.

−1.00 −0.75 −0.50 −0.25 0.00 0.25 0.50 0.75 1.00 Correlations between the detrended substations

0 100000

#pairs

FIGURE 2.21: Correlations between the detrended substations Histogram of the correlations between the substations after an estimate of the load for each day of the year and hour of the week is subtracted. The estimate is obtained with a kernel regression on the observations.

In addition, the matrix of load observations can be relatively well approximated by a low-rank matrix, as illustrated in Figure2.22. We observe that90% of the variance in the columns of the load matrix can be explained by 10 principal components.

Thereby, it is legitimate to believe that the necessary complexity to explain the loads at all substations is less than proportional to their number.

100 101 102 103

r 0.00

0.05 0.10 0.15

0.20 kL˜L˜(r)k2F

kL˜k2F

FIGURE 2.22: Singular values of the load matrix

The matrix L˜ is obtained by centering the rows of the original load matrix L∈Rn,K containingn = 43824observations of the loads over the 5 years in the dataset at the K= 1751 substations. For r ∈N, the matrix L˜(r) is the closest rank-r approximation of L˜ in terms of the Frobenius norm. Note that if Pdiag(s1, . . . , smin(n,K))QT is a singular value decomposition of L˜, with P ∈Rn,min(n,K), s1 ≥. . .≥smin(n,K) ≥0 and Q∈RK,min(n,K), then the best rank-r approximation of L˜ is the matrixPdiag(s1, . . . , sr,0, . . . ,0)QT and the plot is simply the graph ofr 7→Pmin(n,K)

t=r+1 st/Pmin(n,K)

t=1 st. The value of this function for r = 0 equals 1 but the plot begins at r = 1 because of the logarithmic scale.

Clustering the substations While the strong correlations and the low-rank ap-proximation presented previously support the claim that the substations have a common underlying structure, they do not give an idea of its organization.

Because the load at different substations have varying amplitudes, using a stan-dard K-means clustering algorithm [MacQueen et al.,1967] to group the resembling substations together is probably not appropriate because there is a risk that the substations are clustered depending on their amplitudes. Instead, it seems more relevant to group the substations depending on their behavior over time and for different values of the inputs. We propose a possible partition of the substations

into 5 clusters in Figure 2.23a, which is obtained with a subspace clustering algo-rithm [Elhamifar and Vidal, 2013].

Figure 2.23ais visually satisfying. Indeed, without being aware of the geograph-ical coordinates of the substations, the subspace clustering algorithm found groups relatively well organized spatially. However, it is not clear that this clustering is really informative since the substations may have been grouped together only be-cause they are exposed to the same local weather conditions, while we would like to measure more precisely how similarly the loads behave when the distribution of the calendar and the meteorological variables are the same. A subspace clustering of the weather stations into 5 groups is presented in Figure 2.23b and it does not make this doubt disappear.

The interpretations drawn from these visualizations should be taken with caution and measuring the similarity between two substations remains an open problem.

(a)Subspace clustering into 5 groups of the 1751 substations.

(b) Subspace clustering into 5 groups of the 32 weather stations.

FIGURE 2.23: Clusters of substations and weather stations

Forecast from a few leaders Alternatively, we tried to assess how much in-formation is shared between the load curves by performing a linear regression on individual loads with all the others. If every substation were a linear combination of the others, we could obtain a perfect estimation of the load at one substation from all the others. A simple extension of this idea leads to wonder whether we could in fact predict all the substations with a few of them, that we call the leaders. Such a task can be performed by simultaneously learning all the regression models, adding a group-Lasso regularization term and solving the following optimization problem :

Cmin∈RK,K

1

2n kL−LCk2F +λkCk1,2, (2.2)

where λ ≥ 0, L ∈ Rn,K is the load matrix with n observations at K substations,

withc` the `-th row of the matrix C. The group-Lasso regularization [Bakin et al., 1999; Obozinski et al., 2010; Yuan and Lin, 2006] is known for encouraging some of the groups to have zero norm. Thereby, it induces in our case some of the rows of C to be zero, which eliminates them from the set of predictors. Note that these forecasts are not feasible in practice since we need the loads of the leaders for the instants to forecast.

Because the group-Lasso regularization induces a bias and due to the nice prop-erties of the Elastic-Net penalty as a variable selection procedure [Zou and Hastie, 2005], we instead proceeded sequentially as follows. First, we solve the selection problem with the Elastic-Net penalty :

Cmin∈RK,K non-zero rows of the estimated solution of Problem (2.4) and Lˇ the matrix obtained by concatenating the selected columns. The columns ofLˇ are considered as a new set of predictors and we then solve :

Dmin∈Rr,K

The results of this 2-step procedure are presented in Figure2.24. Withλ= 1 in the first step, 14 substations are selected and using them as a set of predictors leads to an averager2 over the substations of0.8. Withλ= 0.1, 100 substations are selected and they lead to an averager2 of 0.9. This confirms that the loads in the different substations have a common structure.

Outliers Although we provided evidence that there is an underlying structure common to a significant number of the substations, we must keep in mind that there are outliers : some substations have their own particular behavior and it would probably not be a good idea to mutualize parts of the learning process with them.

We allowed ourselves to discard a significant amount of the substations, 338 out of 2089, from the database with the correction procedure described in SectionB.2.

Still, there may remain outliers that cannot be well forecast with the selected leaders, as illustrated in Figure 2.25, and we do not have a clear criterion to know which ones to discard yet.

As a conclusion of this section, there is heterogeneity among the substations, but it is reasonable to believe that the data is somehow structured. The whole point of this work is to try to leverage this unknown structure to improve the quality of the forecasts.

0.0 0.2 0.4 0.6 0.8 1.0

Mr2 1.0

0.1 0.01 0.001 0.0001

101 100 101

λ 0

50 100

#predictors

FIGURE 2.24: Forecasts with a few leaders

( top ) Average coefficient of determination Mr2 over all the sub-stations of the individual coefficients of determination (r2k)k[[1,K]]for different values ofλ, each curve corresponding to a different values of the hyperparameter µ in the second step.

( bottom ) Numbers of predictor selected in the first step for different values of λ.

In this procedure, we have a 3-year-long training set, from 2013 to 2015 and the performances of the model are computed in 2016.

0.0 0.5 1.0 r2

0 200 400

#substations

0.0 0.2 0.4

r2 0

10 20 30

FIGURE 2.25: Presence of outliers

( left ) Histogram of the coefficients of determination (r2k)k[[1,K]]. ( right ) Same histogram zoomed on the outliers.

The hyperparameters for the 2-step procedure described by Equation (2.4) and Equation (2.5) areλ= 1 so that 14 leaders are selected andµ= 0.001.

Other than that, the learning process is the same as for Figure 2.24.

Dans le document Benjamin Dubois pour obtenir le grade de (Page 46-52)