Univariate splines - Feature engineering

Independent models

3.1 Feature engineering

3.1.1 Univariate splines

Basis functions An ideal family of basis functions for a nonlinear transformation of the inputs contains elements that simultaneously :

• are regular,

• have a simple analytical form,

• are orthogonal for the scalar product related to the distribution of the inputs,

• have a localized support.

Different families were proposed for modeling but none satisfies all the sought conditions. We compromise and use the framework provided by the cardinal B-splines [De Boor et al., 1978; Eilers and Marx, 1996]. They satisfy 3 out of the 4 criteria because their supports are not disjoint and the orthogonality condition is not satisfied.

The splines and their different variants are particularly adequate with well-studied approximation properties [Schumaker, 2007; Wahba, 1990]. Like a vast majority of the basis function encountered in the literature, they are defined by a set of knots. With splines, the modeled function is allowed to have discontinuous derivatives at these knots.

Different approaches were proposed to select the knots. They can be fixed and uniformly distributed or computed from the quantiles of the data. Alternatively, Friedman et al.[1991] proposed a forward selection algorithm for Multivariate Adap-tive Regression Splines (MARS). Adaptive models have alternatively been consid-ered with other iterative procedures [Zhou and Shen, 2001], with the Trend Filter-ing models (TF) [Tibshirani et al.,2014] and with the Locally Adaptive Regression Splines (LARS) [Mammen et al.,1997] that consider combinations of the elements of the truncated power basis functions and lead to piecewise polynomials with pieces as large as possible. Because the supports of the truncated power basis functions

are not compact,Bakin et al. [1999] later adapted the MARS model to B-splines in theBMARSframework, supposedly leading to better conditioned design matrices.

Cardinal B-splines The B-splines are piecewise polynomial with possible discon-tinuities of the derivatives localized in a finite set of knots. They are additionally called cardinal if the knots are equidistant. Although their supports are not disjoint, this family is particularly suitable for approximation since any spline function can be written as a linear combination of B-splines.

More precisely, we build basis functions with the cardinal B-spline with degree 1and support [0,2],

B¹ :ξ7→(ξ)+−2(ξ−1)++ (ξ−2)+, (3.1) where(ζ)₊:= max(ζ,0). There are more regular B-splines with higher degrees and larger supports, that we illustrate in Figure3.1. For instance the cubic splines [Stone and Koo,1985] insist on the regularity of the basis functions but have larger supports.

The cardinal B-spline with degree δ and support [0, δ+ 1] is given by : B^δ:ξ 7→ 1 splines with a higher degree has the notable advantage of producing continuous and sparser representations.

Family of basis splines We generate a family of basis functions thanks to com-positions of affine functions with B¹. We follow the ideas of a multi-resolution approximation [Forster, 2011], with possibly non-dyadic cuts. This last point is rel-evant for hours of the day or hours the week since24 = 2³×3and 168 = 7×24are not powers of 2.

Consider a sequence of cuts (cr)_r∈N ∈ (N\{0, 1})^N and a level of detail ` ∈ N\{0, 1}, we define the granularityC:=Q`

r=1cr. It is inversely proportional to the

support width of the splines that we build. Given a translation parameter τ ∈ Z, we define the perspective function B_τ,C with support[_C^τ,^τ+2_C ]as :

Bτ,C :ξ7→ 1

CB¹(Cξ−τ). (3.3)

As illustrated in Figure3.2b, the support of Bτ,C is centered at ^τ+1_C .

0 1 2

0 1

(a) The cardinal B-splineB¹.

τ FIGURE 3.2: The transformed Cardinal B-spline B¹.

Restriction to [0,1] To describe a general procedure, we consider that the inputs have already been affinely transformed so that the cyclic inputs with original values in[0, c], wherecis the maximum value (e.g.the value ofcis167 = 7×24−1for the hour of the week) now lie in[0,1−_c+1¹ ]and the other inputs have been transformed so that the minimum is 0 and the maximum is 1, this is detailed in Appendix C.

Thus, we only select those elements whose support has a non-trivial intersection with the interval[0, 1]:

S^C :={Bτ,C, τ ∈[[−1, C −1]]}. (3.4) The family S^C spans the set of piecewise linear continuous functions that are zero outside [−_C¹,1 + _C¹] and whose derivative may be discontinuous at the knots _m

C, m∈[−1, C+ 1] . We adapt it for the acyclic and cyclic inputs classified in Table 3.1: to anticipate extrapolation in the first case and to satisfy the additional constraint in the second case.

Acyclic features Generally, estimators near the boundaries of the observed do-main tend to be erratic, which leadFriedman et al.[2001, Section 5.2.1] to consider the natural cubic splines that are piecewise third-order polynomials with the addi-tional condition, from which the adjective natural is coined, that the second-order derivative is zero on the two edges of the domain, the extrapolation outside the observed domain being linear.

In our case, although the training data is affinely transformed to lie in the inter-val [0,1], the same transformations on new unseen data might have values outside [0,1]. Following the ideas of the natural splines, we choose instead of S^C, a family of functions whose span is the set of piecewise linear functions with possible discon-

outside of[_C¹,1−_C¹]. LetA^C denote this family ofC+ 1continuous transformations for acyclic inputs, shown in Figure3.3 :

A^C = {φ0 :ξ 7→max(0, 1 FIGURE 3.3: Family of univariate acyclic splines

Modification of the acyclic basis functions for extrapolation purposes.

Finally, we denote by φthe concatenation of the linearly independent elements (φ0, . . . , φC) of A^C. Since PC

j=0φj equals the constant function ξ 7→ _C¹, the union of such families for different inputs will not be linearly independent. It is the case for instance if we consider an additive model with at least 2 temperatures as inputs, each one being associated to a different vector of features.

Cyclic features For a cyclic input, there is an additional constraint but extrapo-lation is not a concern anymore. Among the functions ofS^C, onlyB₋1,C andBC−1,C

do not have a trivial cyclic extension. However, we see in in Figure 3.4 that they are naturally replaced by merging them.

0 _C¹ ¹₂ 1− _C¹ 0

C φ0

B_−1,C BC−1,C

FIGURE 3.4: Family of univariate cyclic splines

Modification of the basis to satisfy the cyclic constraint. The pair (B₋1,C, BC−1,C)in S^C is substituted with φ0 in C^C.

Therefore, we define the family of cyclic basis functions :

C^C = {φ0 :ξ∈R/Z7→max[B₋1,C(ξ), BC−1,C(ξ)]}

∪ {φ_τ+1 :ξ∈R/Z7→B_τ,C(ξ), τ ∈[[0, C−2]]}.

Because of the additional constraint, the number of elements in C^C is only C.

We denote by φ the multivariate feature obtained by concatenating the elements (φ0, . . . , φC−1) of C^C. Note that for an input with discrete values in[0,_c+1¹ , . . . ,1−

c+1] where c∈N, we can build indicators from the representations above based on splines if we choose a sufficient level of detail C =c+ 1. This will be of particular interest when considering as input the hour of the weekh ∈[[0,167]] with c= 167 : having an indicator for each hour of the week means that the set of basis functions spans all functions of these discrete values.

3.1.2 Interactions

Bivariate features To allow interactions in the model between the different in-puts, we build bivariate features with tensor products of univariate features [Bakin et al., 1999; Binev et al., 2007]. Consider two inputs ξ, ζ ∈ [0, 1], for instance the past load and the hour of the week, and the associated vector of featuresφ∈(R^R)^p and ψ ∈ (R^R)^q built in Section 3.1.1, where p, q ∈ N^∗. We define the interaction features with the tensor product :

φ⊗ψ∈(R^R)^p,q. (3.5)

Given two inputs ξ, ζ ∈ R, it is convenient to see the covariates associated to this interaction as a matrix :

Φ(ξ, ζ) :=φ(ξ)ψ(ζ)^T ∈R^p,q. (3.6) Thus, any linear combination of these covariates with a coefficient matrixM ∈R^p,q can be written :

hΦ(ξ, ζ),Mi. (3.7)

Dans le document Benjamin Dubois pour obtenir le grade de (Page 72-77)