Conclusion of the manuscript - Benjamin Dubois pour obtenir le grade de

Description of the load forecasting problems We have proposed in Chapter2 an introduction to the load forecasting problems and a preliminary exploration of the database. Thus, we have had the opportunity to describe the 3 common cycles of the electricity demand, namely the daily, weekly and yearly cycles, along with the conditional expectations of the load with respect to the calendar information and the meteorological conditions. We also described the different settings that we consider and insisted on the variability, as well as the irrelevant values encountered in the load measurements at the level of the substations. Chapter2also allowed us to present the detection and correction procedures that we have used to clean the database.

A standard bivariate linear model In Chapter 3, we have described a trans-formation of the inputs to feed thereafter a linear model tuned by minimizing a classical measure of the squared errors. The same model is proposed to model the load both at the national level and the local levels, with adapted regularization hy-perparameters. Unlike the state-of-the-art GAM that include different submodels for the different hours of the day, we have introduced a single modeling, leading to a reasonable computational time with performances comparable to state-of-the-art results and to be used at all times of the day and the year. It is consequently simpler to analyze. The observation of the results with this model allowed us in particular to underline the importance of the interactions between some of the inputs.

Modeling difficulties Meanwhile, we illustrated in Chapter 3 the main diffi-culties that we encountered. These diffidiffi-culties are accentuated at the level of the substations because of the higher variability of the load curves. For instance, non-stationarity has been emphasized and we have pointed several works to alter the resulting problems. We have also identified Mondays in particular as a difficult time period to forecast. We consider that this is due to their following Sundays and the fact that we use in the modeling the delayed loads in a rather basic man-ner. Although we have considered simple patches, like the interaction between the past loads and the hour of the week, we believe that a dedicated modeling would be worth studying. In particular, we have not considered so far the possibility of having different hyperparameters for the different days of the week.

Implementation We have tried to propose an algorithmic framework simulta-neously for the national and the local load forecasting problems and we have not insisted in the manuscript on the problems related to the implementation of the models. Nevertheless, the dimensions of the optimization problems associated to the different levels of aggregation are significantly different. It was ambitious to use a single tool for the different aggregation levels because the most efficient algo-rithmic tools depend on the size of the problems. Thus, we believe that while we encompassed the different problems in a single algorithm, some decisions are sub-optimal and it might be relevant to consider a specific algorithm for each problem separately from the others.

Similarity structure In the end of Chapter3and in the beginning of Chapter 4, we illustrated the similarities between the independent models learned for each sub-station in order to motivate the multi-task framework discussed in Chapter 4. We consider that the proposed illustrations are not entirely satisfying so far : how to measure quantitatively the similarities between different models with distinct inputs remains an open question. In particular, we could not convincingly decide which substations should be coupled and which ones should be isolated, if relevant. Still, the presented figures illustrate the presence of a common structure in the learned co-efficients for the different substations, that is the task structure, and in the residuals of the models, which corresponds to the output structure.

Task structure The clustering models and in particular the low-rank models, both leveraging the task structure, allowed us to conclude that the number of param-eters in the independent models of Chapter3is unnecessarily large. The results with the different variants of these models also point out that even if some parameters are shared by the models, a sufficient flexibility is necessary to obtain results compara-ble with the state-of-the-art models. We believe that it is worthwhile pursuing the research of flexible multi-task models, mixing shared and individual components.

Sparse Reduced Rank Regression The interest for the low-rank constraint motivated the analysis in Chapter 5 of Sparse Reduced Rank Regression, which is a non-convex and non-differentiable optimization problem with respect to a thin matrix U. In particular, we proved the convergence to critical points under rea-sonable assumptions of a subgradient-type algorithm, which correspond essentially to a proximal gradient descent. We also proved local linear convergence for a cer-tain range of regularization coefficients leveraging a Polyak-Łojasiewicz inequality satisfied by the objective in a neighborhood of the global minima.

Output structure In the last section of Chapter4, we have considered the output structure and the possibility of coupling the models at different scales. While we have not found a procedure to screen information sharing, we have tried to identify the relevant level to couple the models. We obtained positive results by ensuring that the forecasts are consistent at the districts levels, thereby improving the accuracy of the local models in some districts.

Selectively screen the sharing of information As a conclusion, the models and the results in Section4.6support the interest of a multi-task approach. They provide a guarantee for the TSO of having reasonable forecasts both at local and aggregated levels. Although we spent a significant time trying to couple the 1751 models of all the substations, the empirical results also indicate that coupling at a smaller scale is not only less demanding computationally speaking, it also seems more relevant.

Eventually, we consider that the research of the most relevant levels for coupling the local models and the development of a procedure to screen information sharing are the next priorities. The analysis of the estimated clusters in Section 4.4 and the low-rank matrices in Section 4.5 is a potential way to illustrate the underlying structure and screen information sharing.

Notations

Sets of numbers

• The set of natural non-negative integers is denoted N, from which we define N^∗ :=N\{0}and N\{0, 1}.

• The set of natural integers is denotedZ, from which we define Z^∗.

• The set of real numbers is denoted R, the non-zero real numbers R^∗ and the non-negative real numbers R⁺.

• The interval between two numbers x≤y is denoted [x, y].

• The set of integers between p∈Z and q ∈Zwith p≤q is denoted [[p, q]].

• The set of equivalence classes of numbers modulo1 is denoted with the torus R/Z.

Variables

• Scalar observations and coefficients are written with a normal font e.g. b, x, y.

• Random scalar variables are written with a sans-serif font e.g. x,y.

• Vector of observations and coefficients are written in bold e.g. b,x,y.

• Random vectors are written with a sans-serif font in bold e.g. x,y.

• Thei-th element of a vectorbis denoted bi, unless explicitly stated otherwise.

• Matrices of observations and coefficients are capitalized and bold e.g.

B,X,Y.

• The i-th row of a matrixB is denoted b_i, unless explicitly stated otherwise.

• The j-th column of a matrixB is denoted b^(j).

• The element in the i-th row and j-th column of a matrix B is denoted b^j_i.

• Tensors of observations and coefficients are capitalized e.g. X.

• Given two vectors a ∈ R^p and b ∈ R^q, the element in the i-th row and j-th column of the matrix a⊗b ∈R^p,q is aibj.

• Given s ∈ R^p, diag(s₁, . . . , s_p) ∈ R^p,p is a diagonal matrix with elements s1, . . . , sp on the diagonal.

• Given a vector `∈Rⁿ, the average of its elements is denoted `.¯

• GivenK ∈N^∗, the constant vector denoted 1_K contains only 1s.

Power sets

• The set of real-valued vector of size p is denoted R^p.

• The set of real-valued matrices of size (p, q)is denoted R^p,q.

• The set of real-valued tensors of size (p, q, r) is denoted R^p,q,r.

• Given two setsAand B, the set of functions fromA toB is denotedB^A, thus the set of real-valued functions defined in R is denoted R^R.

• A sequence of p∈N^∗ real-valued functions defined in R is denoted(R^R)^p.

• An array of p×q real-valued functions defined in R is denoted (R^R)^p,q.

• An array with size(a, b)of elements included in {0,1} is denoted {0,1}^a,b. Attributes

• The rank of a matrix M is denoted rank(M).

• The transpose of a matrix M is denoted M^T.

• The cardinal of a set S is denoted |S|.

• The positive part of a number x is denoted (x)+ := max(x,0).

• The factorial of a non-negative number n ∈Nis denoted n!.

• The binomial coefficient indexed by k ≤n is denoted ⁿ_k . Norms

• The 2-norm of a vectorb is denoted kbk2.

• The Frobenius norm of a matrix M is denoted kMkF.

• The Frobenius scalar product between vector or matrices is denoted h ·, · i.

• The trace-norm of a matrixM is denotedkMk_∗, it is the sum of its singular values.

Dans le document Benjamin Dubois pour obtenir le grade de (Page 170-175)