Inference for stationary functional time series:

(1)

Faculté des Sciences Département de Mathématique

Inference for stationary functional time series:

dimension reduction and regression

Lukasz KIDZI ´NSKI

Thèse présentée en vue de l’obtention du grade deDocteur en Sciences, orientation Statistique

Promoteur: Siegfried H¨ormann

Jury: Maarten Jansen, Davy Paindaveine, Thomas Verdebout, Laurent Delsol, Piotr Kokoszka Septembre 2014

(2)

(3)

“Simplicity is the final achievement.

After one has played a vast quantity of notes and more notes, it is simplicity that emerges as the crowning reward of art.”

Fryderyk Chopin in If Not God, Then What? by Joshua Fost.

(4)

Acknowledgements

First and foremost my heartfelt thanks to Professor Siegfried H¨ormann who, throughout three years, taught me how to be a great scientist and a better person. I would like to thank him, for all these long hours spent in front of the blackboard, for the passage from hard theoretical problems to neat and valuable solutions, for his incredible precision and attention to detail which finally drove me to be more careful, for his constantly positive attitude, charisma and expertise which will always be a unique example for me, for the trust he gave me by letting me follow my own paths. It is a great honour and privilege to be his first PhD student.

My gratitude is also extended to Piotr Kokoszka, for the support he showed and keeps showing me from the moment we met, for his hospitality and the priceless opportunity to work together at the Colorado State University.

My thanks goes to David Brillinger as well, who found time to share his experience with me at UC Berkeley, regardless of the many obstacles.

My sincere thanks also go out to Cheng Soon Ong for accommodating me in the challenging environment of the ETH Z¨urich.

My thanks also goes to my thesis committee for their guidance and our yearly recaps, to Davy Paindaveine for his sharp remarks and exceptional humor, to Maarten Jansen for giving a great example of scientific commitment, to Pierre Patie for valuable remarks at the beginning of my work, and to Thomas Bruss for sharing his experience through countless stories and digressions during lunches and coffee breaks. Likewise, my thanks to other members of my jury, Thomas Verdebout, Laurent Delsol, for accepting my request and for their time.

Next, I would like to acknowledge the Communaut franaise de Belgique, for the grant within Actions de Recherche Concert´ees (2010–2015) and the Belgian Science Policy Office, for the grant within Interuniversity attraction poles (2012–2017). Thanks for the indispensable means which allowed me to spend three years on my project.

Furthermore, I am aware that a scientific journey starts much earlier than in a doctoral school.

I would not be who I am without all the support from teachers starting from my childhood up till now – I know that this thesis is not just mine, but their success too. In particular, I would like to thank my primary school teacher Krzysztof Lukasiewicz and my high school teacher Jerzy Konarski who taught me to enjoy mathematics.

Many fellow students and faculty members also supported me at the Universit Libre de Bruxelles.

A great thank you to my office colleague Remi for all the necessary breaks for random mathematical problems, for refreshing algorithmic competitions, for chess games or simple discussions about the essence of the universe. Thanks to my second office colleague Rabih, and neighbours, Sarah, Carine, Stavros, Dominik, Germin and Christophe, fellow students Robson and Isabel and many others for teaching me French and for maintaining my sanity through chats, dinners, joggings and others.

Thanks to the whole Gauss’oh Fast team, for the taste of victory and to the BSSM co-organisers,

(5)

Julien, Julie, Patrick, Yves, Thomas, Nicolas and others for quite the same reason.

I am also honoured by the support from outside of the university. Thanks to Daniel, Bella, Felipe, Astrid, Senna, Thiago, Wolney, Anna, Omid, Maryam, and Sarah for enriching discussions about science, politics, economics and any sort of regular gossip during Friday’s dinners. Thanks to Jan, Dominika and Micha l for being there whenever I needed help. Thanks to my fantastic Polish friends, Sebastian for his persistence, Karol for finding time for me no matter what, Natalia who makes me remember I can achieve everything and Kinga for her exceptional life attitude. Thanks to Leo for his constant positive thinking.

Thanks to my family, to my mother and sister who taught me the value of time, who always believed in me and who will always protect me, to my father who was always motivating me to reach for more.

Last, but certainly not least, I must acknowledge with tremendous and deep gratitude my lovely Magda, for her limitless smiles, trust and support for all my ideas and decisions no matter how crazy they seem. Together we are a team and for such a team every challenge is feasible.

(6)

Introduction Functional data analysis The continuous advances in data collection and storage techniques allow us to observe and record real-life processes in great detail. Examples include financial transaction data, fMRI images, satellite photos, earths pollution distribution in time etc. Due to the high dimensionality of such data, classical statistical tools become inadequate and inefficient. The need for new methods emerges and one of the most prominent techniques in this context is functional data analysis (FDA).

The main objective of this work is to analyze temporal dependence in FDA. Such dependence occurs, for example, if the data consist of a continuous time process which has been cut into segments, days for instance. We are then in the context of so-called functional time series.

Many classical time series problems arise in this new setup, like modeling or prediction. In this work we we will be concerned mainly with regression and dimension reduction, comparing time–domain methods with frequency–domain methods.

In this chapter, we further discuss the motivational examples and introduce articles upon which this thesis is based.

1 Functional data analysis

1.1 Motivation

The main concern of statistics is to obtain essential information from a sample of observations X1, X2, ..., XN. We are given a finite sample of size N ∈N, where {X_i}_i∈Z can be scalars, vectors or more complex objects, like genotypes, fMRI scans or images.

Functional data analysis deals with observations which can be naturally expressed as functions.

Figures 1, 2 and 3 present several cases from different areas of science which fit into the framework of functional data analysis.

When we deal with a physical process it is often natural to assume that it behaves in a con- tinues manner and that the observations do not oscillate significantly between the measurements.

Although, in the Digital Era, we rarely record analog processes continuously, we often have enough datapoints that interpolation does not cause a significant measurement error. Models incorporating this additional structure can lead to more precise and meaningful foundings. In this context, FDA can be seen as a tool which embeds the continuity feature into the model.

On the other hand, except for the good approximation of a continuous process, FDA can also prove to be useful in a noisy, discontinuous case. Then, FDA can serve as a tool for denoising and smoothing the data and is beneficial whenever the underlying process is the main concern.

From a pragmatic perspective, functional data can be seen simply as infinitely dimensional vectors, with extended notion of variance and mean, and thus we may be tempted to employ classical multivariate techniques. However, there are many practical and theoretical problems that need to be addressed. For example, in the context of linear models, the inversion of the (infinite dimensional) covariance operator is not straight forward and needs to be treated carefully, both, from the theoretical and practical perspective. This issue, together with our novel approach to the

(12)

Introduction Functional data analysis

0 5 10 15 20 25 30

80100120140160180

x

Figure 1: Berkeley Growth Data: Heights of 20 girls taken from ages 0 through 18 (left). Growth process easier to visualize in terms of acceleration (right). Tuddenham and Snyder [49] and Ramsey and Silverman [43]

Figure 2: Lower lip movement (top), acceleration (middle) and EMG of a facial muscle (bottom) of a speaker pronouncing the syllable “bob” for 32 replications. Malfait, Ramsay, and Froda [32]

Figure 3: Projections of DNA minicircles on the planes given by the principal axes of inertia (three panels on the left side: TATA curves, right: CAP curves). Mean curves are plotted in white. Panaretos, Kraus and Maddocks [36]

(13)

Introduction Functional data analysis

classical functional regression problem, is the topic of Chapter 1.

The FDA approach is also useful in a parsimonious representation of the data by taking advantage of their smoothness. Instead of looking at a function as a dense vector of values, we can often represent it in an linear combination of a handful of (well chosen) basis functions.

Finally, there are also advantages in the FDA approach which stem from the structure of the data. For example, one of the drawbacks of an acclaimed multivariate Principal Component Anal- ysis (PCA) is it’s scale dependence. It makes no sense to rescale a function componentwise (with different scaling factors at different arguments) and hence for the functional counterpart of PCA, the Functional Principal Component Analysis (FPCA), the lack of scale-invariance is not an issue.

A detailed introduction to Functional Principle Components is given in Section 1.6. In Chapter 3 we describe an extension of the technique benefiting from the time–dependent framework.

1.2 Brief overview of functional data research

One of the most influential works in the field of FDA is the seminal book by Ramsay and Silver- man [43]. Together with the R and Matlab libraries, significantly facilitating both research and practice in the area, they are a main reference in the field. Many important results were mapped from the multivariate cases, often taking the advantage of the unique features of functional object, whereas others, like the analysis of derivatives, were derived uniquely in this setting.

As a running example, Ramsay and Silverman [43] consider growth curves of 10 girls measured at a set of 31 ages. They argue that statistics obtained on derivatives can be more informative than the classical analysis of the curves themselves, performed earlier by Tuddenham and Snyder [49].

Practical applications of functional data analysis are spread across many areas of science and engineering. Panaretos et al. [36] use [0,1]→R³ closed curves to analyze the behavior of DNA mi- crocircles, providing the testing methodology for the comparison of two classes of curves. Aston and Kirch [2] analyze the stationarity and change point detection for functional time series, with applications to fMRI data. Hadjipantelis et al. [18] analyze Mandarin language using functional principal components. Functional time series also naturally emerge in financial applications – Kokoszka and Reimherr [29] analyze predictability of the shape of intraday price curves.

From the theoretical perspective, Berkes et al. [5] extensively studied the problem of change points within a set of functional observations, whereas Horv´ath et al. [26] recently investigated testing for stationarity. Many multivariate techniques were extended to an infinite dimensional setup, like functional dynamic factor models [20] or functional depth [31].

These works are only a fraction of the ongoing research and for a more accurate survey on applications and theory we refer to books [43], [16], [25] and [6].

1.3 Hilbert spaces

For most of the results presented in this work we only require that the functional space is a separable Hilbert space, i.e. a complete inner product space with a countable basis. This allows us to state

(14)

Introduction Functional data analysis more general results, so that the space of square integrable functions L2([a, b]), a < b is a special case.

Although most of our examples are concerned real–valued functions defined on a finite interval, one should keep in mind other possible applications, including, for example, multivariate functions or images and audio files, as described in Section 1.2.

1.4 Notation

Let H1, H2 be two (not necessarily distinct) separable Hilbert spaces. We denote by L(H_i, Hj), (i, j ∈ {1,2}), the space of bounded linear operators from H_i to H_j. Further we write h·,·i_H for the inner product on Hilbert space H and kxk_H = p

hx, xi_H for the corresponding norm.

For Φ ∈ L(H_i, H_j) we denote by kΦk_L(H_i_,H_j₎ = sup_kxk

Hi≤1kΦ(x)k_H_j the operator norm and by kΦk_S(H_i_,H_j₎ =

P∞

k=1kΦ(e_k)k²_H

j

1/2

, wheree₁, e₂, ... ∈H_i is any orthonormal basis (ONB) of H_i, the Hilbert-Schmidt norm of Φ. It is well known that this norm is independent of the choice of the basis. Furthermore, with the inner product hΦ,Θi_S(H₁_,H₂₎ = P

k≥1hΦ(e_k),Θ(e_k)i_H₂ the space S(H₁, H2) is again a separable Hilbert space. For simplifying the notation we use L_ij instead of L(H_i, H_j) and in the same spirit S_ij,k · k_L_ij,k · k_S_ij and h·,·i_S_ij.

All random variables appearing in this work will be assumed to be defined on some common probability space (Ω,A, P). A random elementXwith values inH is said to be inL^p_H ifνp,H(X) :=

(EkXk^p_H)^1/p <∞. More conveniently we shall say that X has p moments. If X possesses a first moment, then X possesses a mean µ, determined as the unique element for which EhX, xi_H = hµ, xi_H,∀x∈H. Forx ∈Hi and y ∈Hj let x⊗y :Hi →Hj be an operator defined as x⊗y(v) = hx, viy. IfX ∈L²_H, then it possesses a covariance operatorC, given byC =E[(X−µ)⊗(X−µ)].

It can be easily seen thatC is a Hilbert-Schmidt operator. AssumeX, Y ∈L²_H. Following Bosq [6], we say thatX and Y are orthogonal (X ⊥Y) ifEX⊗Y = 0. A sequence of orthogonal elements inH with a constant mean and constant covariance operator is calledH–white noise.

1.5 Representation and fit

Since we are dealing with infinite dimensional objects we need to represent and approximate them in a convenient way. This is important from the practical as the theoretical perspective. From the practical point, due to the limited computer memory, we will always work with approximations. We want to use low dimensional approximations for computational reasons.

One of the possibilities to represent a curve, is to select a sufficiently fine grid and process the vector of values of the function on the intervals induced by the gridpoints. This approach, often used in practice, does not benefit from the continuity of functions.

In this work, we follow the ideas popularized by Ramsey and Silverman [43], based on the basis function expansion. Most prominent the Karhunen–Lo`eve or Fourier extension. Let {e_i}_1≤i≤∞ be an orthonormal basis of a separable Hilbert space H. Then, any element x ∈ H can be uniquely

(15)

Introduction Functional data analysis represented as

x=

∞

X

i=1

hx, e_iie_i. Note that, by Parseval’s formula,

kxk²=

∞

X

i=1

|hx, e_ii|². Since the sum is finite, for anyε, there existdthat

∞

X

i=d

|hx, e_ii|² < ε.

We can therefore approximate the function with arbitrary precision ε >0 using only the firstd basis elements. This approach is consistent with intuition. Indeed, if we use, for example, Fourier basis functions, then the high frequency components are expected to be negligible and will be diminishing.

Although the fitting and representation of functional data is an important and intensively studied topic on its own, in this work we assume that observations arefully observed, i.e. we are given the actual curves. For more information on fitting we refer to [43].

1.6 Dimension reduction

From a theoretical perspective a curve observationX is an intrinsically infinite–dimensional object.

Besides the choice of an appropriate basis, there is also the need for dimension reduction.

Arguably,functional principal components analysis(FPCA) is the key technique to this problem.

Like its multivariate equivalent, FPCA is based on the analysis of the covariance operator and it is concerned with finding directions which contribute most to the variability of the observations.

LetX be a functional random variable taking values in some Hilbert spaceH and C=EX⊗X be its covariance operator. (Without loss of generality we assume here and in many places that EX0 = 0.) For C to exist, we assume that EkXk² < ∞. One can show that C is a symmetric, positive definite Hilbert-Schmidt operator and can hence by the spectral theorem be decomposed into

C=

∞

X

i=1

λ_ie_i⊗e_i, (1)

where λ₁ ≥ λ₂ ≥ ... and λ_i ≥ 0 are the eigenvalues of C and and {e_i}_i∈N are the corresponding eigenfunctions, forming an orthonormal basis of the underlying Hilbert space H.

If we pick the firstdbasis elements {e_i}^d_i=1 and project the observationX on the space spanned by them, we obtain the optimald-dimensional approximation in terms of the mean square error, i.e.

EkX−

d

X

i=1

hX, e_iie_ik² ≤EkX−

d

X

i=1

hX, e⁰_iie⁰_ik²,

(16)

Introduction Functional Time Series

Time

0 2000 4000 6000 8000 10000

275602758027600276202764027660

Figure 4: Horizontal component of the magnetic field measured in one minute resolution at Honolulu magnetic observatory from 1/1/2001 00:00 UT to 1/7/2001 24:00 UT. 1440 measurements per day.

for any other orthonormal collection{e⁰_i}_1≤i≤d. The directionse_iare called theprincipal components of X and the coefficients hX, e_ii are called PC scores. A simple computation shows that PC scores are uncorrelated, which is another key feature.

We remark again that a main advantage of FPCA over the multivariate version is that scale- invariance is not relevant. Consequently, it is much easier to interpret functional PCs and linear combinations thereof. For detailed theory of multivariate principle components we refer to [28] and to [45] for the functional setup.

FPCA gained popularity in both iid and time–dependent setup. However, in Chapter 3 we argue that this technique is no longer optimal for time series and may lead to misconception when used not carefully. We then propose an extension of FPCA, which benefits from the temporal dependence structure.

2 Functional Time Series

In many practical situations functions are naturally ordered in time. For example, when we deal with daily observations of the stock market or with sequences of tumor scans. Then, we are in the context of a so–called functional time series (FTS).

As a motivating example consider Figure 4. Here, the assumption of independence can be too strong – values at the beginning of each day are highly correlated with those at the end of the preceding day. Moreover, we see that big jumps are often followed by significant drops.

These, and similar features, may indicate significant temporal dependence not just within a subject, but also between different subjects (e.g. days). In this section we discuss possible frameworks which allow to quantify, test and use this additional information.

(17)

Introduction Functional Time Series 2.1 Stationarity

Many physical processes are known to have time-invariant distribution. This motivates the frequen- tionist approach to time series, where we assume that the structure does not change in time and we interfer from estimated covariances. A functional test for stationarity was recently introduced by Horv´ath et al. [26]. Non–stationary time series are also extensively studied, however they are beyond the scope of this work.

Let{X_t}be a series of random functions. We say that{X_t}is stationary in the strong–sense if for anyh∈Z, k∈Nand any sequence of indicest1, t2, ..., t_kvectors (Xt1, ..., Xt_k) and (X_h+t₁, ..., X_h+t_k) are identically distributed.

We also define weak stationary by looking only on the second order structure of the series. We say that{X_t}is weakly stationary if EkX_tk²<∞ and

1. EX_t=EX₀ for each t∈Z and 2. EX_tX_s=EXt−sX₀ for each t, s∈Z.

2.2 Model approach

Arguably, one of the most popular models of temporal dependence is the functional autoregressive model (FAR(p)) studied in great detail by Bosq [6]. In this model we assume that the state in time tis a linear function of the p previous states and some independent white noise process. The main concern is the estimation of theplinear operators involved. Once the AR structure is identified, we can profit from the explicit probabilistic structure and dynamic of the time series. We describe this model in detail in Chapter 1.

Many time series can, however, not be approximated by some FAR(p) process and the need for more complex models arises. ARMA or GARCH-type models (cf. [21]) could serve as alternatives, but the required theoretical foundations beyond the relatively simple auto-regressions is still sparse.

Furthermore, for many time series it is not clear which model they follow. Nonetheless time–series procedures may still apply. It is then preferable to only impose a certain dependence structure, rather than requiring a particular model. In the next sections we introduce three popular notions of dependence and justify the choice of the framework employed throughout this work.

2.3 L_p-m-approximability

In this framework, weak dependence is defined by a “small” L_p distance between the process and it’s approximation based on only lastminnovations. This idea is made precise in the following two definitions.

Definition 1. Suppose(Xn)n≥1is the random process with values inHand letF_n⁻=σ(..., Xn−2, Xn−1, Xn) and F_n⁺ =σ(X_n, X_n+1, X_n+2, ...) be the σ-algebras generated by terms up to time n and after time nrespectively. Process Xn is said to bem-dependent if F_n⁻ andF_n+m⁺ are independent.

(18)

Introduction Functional Time Series In practice, processes usually do not have the property from Definition 1, however they can be often approximated by such series. This motivates the following approach to week dependence Definition 2(H¨ormann and Kokoszka [23]). A random sequence{X_n}_n≥1with values inH is called L^p–m–approximable, if it can be represented as

X_n=f(δ_n, δn−1, δn−2, ...),

where theδ_i are iid elements taking values in a measurable space S and f is a measurable function f : S^∞ → H. Moreover, if δ_i⁰ are independent copies of δi defined on the same probability space, then for

X_n^(m)=f(δ_n, δn−1, δn−2, ..., δn−m+1, δ_n−m⁰ , δ⁰_n−m−1, ...) (2) we have

∞

X

m=1

EkX_m−X_m^(m)k^p<∞.

Note, that the independent copies in (2) are used for simplicity of proofs and the representation can be more intuitively stated as

X_n^(m) =f(δn, δn−1, δn−2, ..., δn−m+1,0,0, ...),

leading to analogous results. Let us also stress, that the representation 2 is rather general, and incorporates most time series models encountered in practice. Furthermore, checking the validity of the dependence condition is simply reduced top-th order moments, which is typically much simpler than establishing classical mixing conditions explained next.

2.4 Mixing conditions

There exist numerous variants for mixing. We introduce the strong mixing (orα-mixing) condition, which is one of the most prominent ones. In the functional context it has e.g. been used by Aston and Kirch [1]. For an extensive introduction to mixing we refer to Bradley [8].

In this approach we quantify and bound the dependence of sigma fields generated by variables X₀, X−1, ... andX_m, X_m+1, ...for given m∈N.

Definition 3. A strictly stationary process {X_j :j ∈Z} is called strong mixing with mixing rate rm if

sup

A,B

|P(A∩B)−P(A)P(B)|=O(rm), rm→0,

where the supremum is taken over all A∈σ(. . . , X−1, X₀) and B ∈σ(X_m, X_m+1, . . .).

(19)

Introduction Linear models 2.5 Cumulant condition

Another approach to quantifying weak dependence is based on so–called cumulants, expressing the high order cross–moment structure. In the finite dimensional case, it was popularized by Brillinger [9]. In the context of Functional Time Series it was brought recently by Panaretos and Tavakoli [37]. The k−th order cumulant kernel is given by

cum(Xt1(τ1), ..., Xt_k(τ_k)) = X

v=(v1,...,vp)

(−1)^p−1(p−1)!

p

Y

l=1

E



 Y

j∈v_l

Xtj(τj)



.

where the sum extends over all unordered partitions of {1, ..., k}. If we assume that EkX₀k^l₂ <∞ forl≥1, then the cumulant kernels are well–defined in L². For a given cumulant kernel of order 2k one can define anorder cumulant operatorR_t₁_,...,t_2k−1 :L²([0,1]^k,R)→L²([0,1]^k,R), defined by

R_t₁_,...,t_2k−1h(τ1, ..., τ_k)

= Z

[0,1]^k

cum(X_t₁(τ₁), ..., X_t_2k−1(τ2k−1), X_t₀(τ₀))×h(τ_k+1, ..., τ_2k)dτ_k+1...dτ_2k.

We say that a time series satisfies the cumulant condition if and only if 1. EkX₀k^k₂ <∞,P∞

t1,...,tk−1=−∞kcum(X_t₁, ..., Xtk−1, X0)k₂ <∞,∀_k≥2, 2. kR_tk₁ <∞ wherek · k₁ is the nuclear norm or Schatten 1-norm.

2.6 Discussion

Many results for stationary processes were obtained assuming strong mixing conditions. However they are hard to check in practice and they exclude some important statistical models, like, for example, AR(1) time series with discrete innovations.

Although many concepts of dependence were developed in recent years, there is no dominant framework. Therefore, researchers try to state results in a general setting, using only the basic results from the dependence framework, wherever possibly. This approach allows future scientists to use different dependence concepts as long as these basic results hold. An example can be found in the work of Aston and Kirch [1], where they obtain their results considering both mixing conditions and Lp-m-approximability.

In this work we take a similar approach, restricting ourselves to convergence results of eigenvec- tors and eigenvalues of covariance operators. We choose to use L_p-m-approximability dependence structure, since all our required results are established in H¨ormann and Kokoszka [23].

3 Linear models

As we have already pointed out in previous sections, many multivariate techniques have a natural analogue in the functional data setup. In the sequel, we are concerned with linear models. They

(20)

Introduction Linear models constitute the fundamental framework in many areas of statistics, thus they are naturally of great interest in functional data analysis.

We start by introducing the classical linear regression, allowing the variables to be dependent in time. Next, we discuss time series models in the functional setup. Finally, we briefly discuss advantages of frequency domain methods, which are used in Chapters 2 and 3 for exploiting the temporal dependence structure.

3.1 Linear regression

One of the most popular frameworks in classical statistics is the linear regression, where we try to quantify the linear dependence between two (possibly multivariate) variablesXandY. The problem of finding the relation of this type can be also addressed in FDA.

Assume the model

Yt=β(Xt) +εt, t≥1 (3)

where β is a linear Hilbert-Schmidt operator from H₁ →H₂ and ε_t is some strong white noise sequence, independent from (Xt).

As we are concerned with functional time series, we will assume that Xt, Yt are stationary and weakly dependent. Classical case of iidX_tis of great scientific interest and the interested reader is referred to [43] and [52].

Although the linear regression shares many properties with its multivariate equivalent, again there are several important differences. Especially, we note that the linear operatorβ : H₁ → H₂ has infinite dimensional, which considerably complicates the estimation. If we approach the problem in the classical way by multiplying both sides of (3) byX_tand taking the expectation we get

C_XY =βC_X, (4)

whereC_XY is the cross-covariance operator of X and Y and C_X is the covariance of X. Now, the natural way to obtainβ is by applying the inverse of CX to both sides of the equation (4), which yields

β=C_XY(C_X)⁻¹.

The main problem is that the operator (C_X)⁻¹ is no longer bounded. Indeed, the domain of C_X is only a subsetD, say, of H1. To see this, note that formally C_X⁻¹(x) =P

k≥0λ⁻¹_k he_k, xie_k, where λ_k and e_k are the eigenvalues (tending to zero) and eigenfunctions of C_X. Hence, D = {x ∈ H1: P

k≥1hx, e_ki²λ⁻²_k < ∞}. The problem can be approached by some regularization. E.g. one may replaceC_X⁻¹ by a finite dimensional approximation of the formP

k≤Kλ⁻¹_k e_k⊗e_k, where K is a tuning parameter. This is still quite delicate, when applied to the sample version. Then for for large values of K, if we underestimate one of the small eigenvalues, its reciprocal explodes and will lead to very instable estimators. On the other hand, for smallK we may get a very poor approximation ofβ.

(21)

Introduction Linear models This difficulty was addressed by Bosq [6], who gives an extensive survey on the problem. However, proposed results are based on strong assumptions on the rate of convergence of eigenvalues, which are impossible to check in practice. In Chapter 1 we present an alternative, data–driven approach.

Finally, note that exactly the same technique can be used for lagged linear regression. Consider Y_t=

m

X

k=0

β_k(Xt−k) +ε_t, (5)

wherem ∈N. Let us introduceZ_t= (X_t, Xt−1, ..., Xt−m) ∈H₁^m. Then, the model can be written as

Y_t=BZ_t+ε_t, (6)

whereB :H₁^m→H₂ is a linear operator such thatBZ_t=P_m

k=0β_k(Z_t^(k)).

Now, for estimating B in (6), we can apply the same estimation procedures as for (3). This method of estimation in lagged regressien models is efficient only for small dimensions and smallm, as opposed to the technique discussed in Chapter 2, which gives esitmates at any lag.

3.2 Filtering

The linear models that we are concerned in Chapters 2 and 3 are based on the concept of linear filtering, popular in multivariate time series as well as in signal processing. For the theory and survey on applications in this context we refer to the classical book of Oppenheim and Schafer [35].

Definition 4. We say that A={A_k}_k∈Z is a linear filter if for each k∈ Z, Ak ∈ H1 →H2 is a linear operator andPkA_tk²_H <∞.

Now, we can extend the model (3), so that it includes a possibility of the so–called lagged dependence, i.e.

Yt=X

k∈Z

AkXt−k+εt. (7)

In Chapter 2, we are concerned with estimation of A_k and testing the significance of these operators. Next, in Chapter 3 we consider a low-dimensional filters, i.e. such that dim(Im(Ak)) =d, trying to find a filter which accounts for the smallest information loss in terms of the mean squared error. In both cases, results are based on Fourier analysis and the seminal work of Brillinger [9].

3.3 Frequency domain methods

In time series analysis we often deal with periodical data. This feature motivates the Fourier-based methods which allow us to discover seasonal patterns.

Suppose we are given daily data from a univariate signal plus noise time series, where the signal comes from a sinusoidal curve with weekly periodicity. The periodogram, a key tool in the frequency

(22)

Introduction Objectives and structure of the thesis domain analysis of time series, is a simple tool which allows to detect such a seasonal pattern. In the given example it will be showing a spike at the frequency corresponding to the weekly period.

The Fourier transform has two important properties which simplify analysis of the process (3).

First, multiplication in the frequency domain is equivalent to convolution in the time domain.

Second, the Fourier transform is a bijection, so results in frequency domain are equivalent to these in the time domain.

To illustrate the use of these features let us multiply equation (3) by X_s for some s ∈ Z and take the expectation. By linearity we have

EYtXs= X

k∈Z

A_kEXt−kXs,

and by stationarity

EY_uX₀= X

k∈Z

A_kEXu−kX₀,

whereu=t−s. Now, noting that on left we haveC_u^{Y X} and on right we have the convolution of Ak

and C_u^{Y X}, the Fourier transform of both sides yields the cross-spectral operator between {Y_t} and {X_t}and can be obtained as

F_θ^{Y X} =A(θ)F_θ^X, (8)

where A(θ) = P

k∈ZAkeⁱ^kθ is the frequency response function of the series {A_k}_k∈Z and P

k∈Z =

1 2π

P

k∈ZC_k^Xe⁻ⁱ^kθ is thespectral density operator of{X_t}.

Relation (8) is fundamental for this work. In Chapter 2 we use it for the estimation of A, from which, by taking the inverse Fourier transform, we obtain estimates for operators in (7). In Chapter 3 we argue that A(θ) built from principal components of F_θ^X minimizes the information loss among all linear filters applied to Y.

4 Objectives and structure of the thesis

This work is organized in three chapters. Each chapter constitutes a reprint of a paper, published or submitted for publication.

The first chapter proposes a data–driven technique for estimation of dimension in functional AR(1) models. Although the regression problem was studied in great detail by Bosq [6], all techniques were built on very strong assumptions, impossible to check in practice. In our work we not only provide an alternative, data–driven technique but also prove its consistency without any assumptions on convergence rates of the spectrum. We support our technique by an extensive sim- ulation study, which reveals performance close to optimal. This chapter has been published in the Scandinavian Journal of Statistics [22].

In the second chapter we discuss the estimation functional lagged regression models. As discussed

(23)

Introduction Objectives and structure of the thesis in Section 3.1, the method described in Chapter 1 can be successfully adapted to the problem of lagged covariance. However, in practice the dimension of the problem can outgrow the number of observation and this may lead to misleading results. We investigate a frequency domain method which gives consistent estimators at arbitrary chosen lag. Moreover, we provide testing methodology addressing the significance of given lagged regression operators.

The third chapter extends the functional principal components to the time–dependent setup. In our work we concentrate on the diagonality of the covariance matrix – one of the most important features of the principal component analysis. In the time–dependent setup, lagged covariances of principal components may not be diagonal, which restrains the analysis of independent components.

We relax the setup from the orthogonal projection to convolution and, using the frequency domain approach, we find a time invariant linear mapping which gives a multivariate series with uncorrelated components at all leads and lags. Moreover, the resulting vector sequences explain more variance than the classical PCA with the same number of components. This chapter has been published in the Journal of the Royal Statistics Society: Series B.

(24)

Chapter I

A Note on Estimation in Hilbertian Linear Models

(25)

A note on estimation in Hilbertian linear models

^∗

Siegfried H¨ormann, Lukasz Kidzi´nski

Départment de Mathématique, Université libre de Bruxelles (ULB), Belgium

Abstract

We study estimation and prediction in linear models where the response and the regressor variable both take values in some Hilbert space. Our main objective is to obtain consistency of a principal components based estimator for the regression operator under minimal assumptions. In particular, we avoid some incon- venient technical restrictions that have been used throughout the literature. We develop our theory in a time dependent setup which comprises as important special case the autoregressive Hilbertian model.

Keywords: adaptive estimation, consistency, dependence, functional regression, Hilbert spaces, infinite-dimensional data, prediction.

1 Introduction

In this paper we are concerned with a regression problem of the form

Y_k= Ψ(X_k) +ε_k, k≥1, (I.1)

where Ψ is a bounded linear operator mapping from spaceH1 to H2. This model is fairly general and many special cases have been intensively studied in the literature. Our main objective is the study of this model when the regressor space H₁ is infinite dimensional. Then model (I.1) can be seen as a general formulation of a functional linear model, which is an integral part of functional data literature. Its various forms are introduced in Chapters 12–17 of Ramsay and Silverman [25].

A few recent references are Cuevas et al. [11], Malfait and Ramsay [23], Cardot et al. [6], Chiou et al. [8], M¨uller and Stadtm¨uller [24], Yao et al. [28], Cai and Hall [3], Li and Hsing [22], Hall and Horowotiz [15], Reiss and Ogden [26], Febrero-Bande et al. [13], Crambes et al. [10], Yuan and Cai [29], Ferraty et al. [14], Crambes and Mas [9].

From an inferential point of view, a natural problem is the estimation of the ‘regression operator’

Ψ. Once an estimator ˆΨ is obtained, we can use it in an obvious way for prediction of the responsesY. Both, the estimation and the prediction problem are addressed in this paper. In existing literature, these problems have been discussed from several angles. For example, there is the distinction between the ‘functional regressors and responses’ model (e.g., Cuevas et al. [11]) or the perhaps more widely studied ‘functional regressor and scalar response model’ (e.g., Cardot et al. [5]). Other papers deal with the effect when random functions are not fully observed but are obtained from sparse, irregular data measured with error (e.g., Yao et al. [28]). More recently, the focus was on establishing rates of consistency (e.g., Cai and Hall [3], Cardot and Johannes [7]). The two most popular methods

∗Manuscript has been accepted for publication inScandinavian Journal of Statistics

Inference for stationary functional time series: