Selecting the inputs for the local models

Independent models

3.5 Experiments with independent models

3.6.5 Selecting the inputs for the local models

For the national short-term forecasting model, it is pretty natural that the only past loads the model can access are past national loads and we did not question this choice. Similarly, we followed the choices made in the literature for the national model about the weather information and used the linear combination of the weather stations given in TableF.1. However, the situation is different for other aggregation levels. We decided that every substation should have access, in addition to the calendar variables, to its own past loads and to the weather information at the 2 closest weather stations. We discuss these decisions in this section.

Choice of the past loads At the substations level, it is legitimate to wonder whether giving each substation access to the past loads of other substations might improve the forecasting performances. The set of possibilities is combinatorial and we had to limit our experiments. We tried to give each substation access to the past national load or to a fixed number of other substations, the 2 geographically closest for example. Empirically, we could not obtain any improvement and concluded that every substation should only be aware of its own past loads.

Number of weather stations In the local load forecasting model of Pierrot and Goude [2011], a meteorologist fixed one weather station for each substation.

Since we did not have access to this assignment, we explored two possibilities to try and find an intermediary setting where each substation can access several weather substations but not necessarily all of them, like in the multi-step variable selection procedure proposed byThouvenot [2015].

First, considering that the most informative weather station for a given substa-tion is not necessarily the closest one, we have tried an automatic selecsubsta-tion proce-dure with a single optimization step : we have given each substation access to all the weather stations but used a group-Lasso penalty so that only a subset is effectively selected. We have also tried a non-convex version of the group-Lasso to reduce the bias [Fan and Li, 2001; Zou, 2006, and references therein].

Secondly, we have given each substation access to a fixed number of the closest weather stations. This provided the best results. According to Figure3.28, it is best for each substation to have access to the 2 closest weather stations.

Although we conclude here that on average, the local models need access to the 2 closest weather stations to obtain the best performances, it might be because the loads of some substations are in fact more driven by the second closest weather station and not necessarily a combination of the two. It could even be the third closest, but other substations suffer in this case from overfitting. For these reasons, we believe that this question would deserve a longer study.

Ablation study Like for the national model, we present an ablation study of the inputs for the local models in Table 3.11.

10⁻³ 10⁻² 10⁻¹ 10⁰ 10¹ 10² 10³ regularization factor for the weather-related inputs 7.2

7.3 7.4 7.5 7.6

RMNMSE

1 weather station 2 weather stations 3 weather stations

FIGURE 3.28: Selecting the number of weather stations

RMNMSE of the local models for different regularization hyperparameters and numbers of weather stations injected in the inputs. For each number of stations, we compute the value of the RMNMSE on a test set for different values of the weather-related regularization coefficients, where the values are obtained from the best configuration by multiplying them by the same factor. This factor corresponds to the x-axis.

With only one weather station, the models do not overfit the training data but it seems that adding more weather stations makes them more expressive and able to better fit both the training and the test data, with the adequate regularization. The best results are obtained with 2 weather stations.

RMNMSE_train RMNMSE_test

∅ 5.89 7.22

constant 5.89 7.23

timestamp 5.95 7.25

Christmas period 5.95 7.25

cloud cover and daylight 5.94 7.26

hour of the week and day of the year 6.08 7.28 past loads and hour of the week 5.98 7.29

minimum temperatures 6.04 7.31

day of the year 6.09 7.34

temperatures and day of the year 6.11 7.34 past holidays and hour of the week 6.10 7.35

maximum temperatures 6.16 7.35

coming holidays and hour of the week 6.23 7.43

delayed temperatures 6.13 7.44

temperature 6.16 7.49

hour of the week 6.41 7.52

holidays and hour of the week 6.50 7.62

delayed loads 6.66 8.44

TABLE 3.11: Ablation study of the local models

Presentation of the ablation study with the inputs at the level of the sub-stations. Starting from the best local models, we removed the inputs in the left column one by one (with replacement) and evaluated the corresponding performances on the training and the test sets (middle and right columns).

These inputs have been sorted from the least important to the most impor-tant, where importance is defined by the damage the removal does to the RMNMSE on the test set. Compared with the ablation study of the national model in Table 3.8, removing the univariate effects of the temperature, the hour of the week or the delayed loads affect much more the local models.

On the contrary, removing the bivariate component related to the hour of the week and the day of the year has on average a minor effect on the local models while it is particularly useful for the national model.

Dans le document Benjamin Dubois pour obtenir le grade de (Page 115-119)