Contributions to time series analysis

(1)

Thesis

Reference

Contributions to time series analysis

XU, Haotian

Abstract

This thesis consists of two parts. In the first part, we focus on parametric modeling, estimation and statistical inference of multivariate time series. We put forward a new multivariate modeling and estimation approach allowing to characterize and estimate complex latent dependence structures in a computationally efficient manner with respect to existing approaches. This approach is applied, in particular, to inertial sensor calibration. The statistical properties of this approach allows, in particular, to test dependence between sensors, to integrate their dependence within the navigation filter and to construct an optimal virtual sensor that can be used to improve navigation accuracy. In the second part, we focus on autocovariance estimation of high-dimensional time series, where the sample autocovariance matrix perform inappropriately. We propose and study two groups of robust estimation methods for autocovariance matrices of a high-dimensional heavy-tailed stationary time series. These proposed methods are shown to achieve optimal error bounds with respect to matrix max-norm and matrix spectral-norm respectively. Our [...]

XU, Haotian. Contributions to time series analysis. Thèse de doctorat : Univ. Genève, 2021, no. GSEM 100

URN : urn:nbn:ch:unige-1529203

DOI : 10.13097/archive-ouverte/unige:152920

Available at:

http://archive-ouverte.unige.ch/unige:152920

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Contributions to time series analysis

by

Haotian Xu

A thesis submitted to the

Geneva School of Economics and Management, University of Geneva, Switzerland,

in fulfillment of the requirements for the degree of PhD in Statistics

Members of the thesis committee:

Prof. OlivierScaillet, Chair, University of Geneva

Prof. Maria-Pia Victoria-Feser, Co-advisor, University of Geneva Prof. St´ephane Guerrier, Co-advisor, University of Geneva

Prof. YuanKe, University of Georgia Prof. Runze Li, Pennsylvania State University

Thesis No. 100 June 2021

(3)

(4)

Acknowledgements

Looking back on my Ph.D. study at the University of Geneva, I am most proud that I realized what I am good at and what I want to do. Without the supports from many people, my character and experience alone would not have made it possible.

First of all, I would like to thank my co-advisor Prof. St´ephane Guerrier, without whom I could not have begun my academic career. St´ephane introduces me to research with a great patient and helps me both academically and personally throughout the last seven years. I also thank him for offering me opportunities to visit other universities. It is my great pleasure and fortune to work with him. Second, I would like to thank my co-advisor Prof. Maria-Pia Victoria-Feser. She always tells me that she did not spend a lot of time supervising me. However, I appreciate the freedom she allowed for my research and the characters of an excellent researcher she showed me with her work style and personality. Third, I am grateful to Prof. Yuan Ke for giving me the opportunity to work with him on high-dimensional time series, for introducing me to various frontier research topics, and for offering constant guidance and valuable advice. In addition, I thank Prof. Runze Li and Prof. Olivier Scaillet for accepting being members of my thesis committee, and for their helpful comments and questions to improve this thesis.

Pursuing my Ph.D. at Research Center for Statistics is an unforgettable experience.

I am greatly indebted to Samuel, Roberto, and Mucyo for their valuable help in my Ph.D. study and for bringing care for my life in Geneva. Special thanks to Yuming for proofreading part of my thesis, to Younes for his great help on my teaching. In addition, I would like to thank my friends and colleagues: Benjamin, Guillaume, Gaetan, Lionel, Cesare, Alban, Julien M., Julien B., Nicola, Edoardo, Mattia, Marc-Olivier, Walid, Rami, Daniel for many memorable moments during sports, lunches, and discussions.

Finally, I thank my father Shuguan Xu, and my mother Xinrong Wang for their unconditional supports.

(5)

(6)

Abstract

This thesis consists of two parts. In the first part, we focus on parametric modeling, estimation and statistical inference of multivariate time series. We put forward a new multivariate modeling and estimation approach allowing to characterize and estimate complex latent dependence structures in a computationally efficient manner with respect to existing approaches. This approach is applied, in particular, to inertial sensor calibration. The statistical properties of this approach allows, in particular, to test dependence between sensors, to integrate their dependence within the navigation filter and to construct an optimal virtual sensor that can be used to improve navigation accuracy.

In the second part, we focus on autocovariance estimation of high-dimensional time series, where the sample autocovariance matrix perform inappropriately. We propose and study two groups of robust estimation methods for autocovariance matrices of a high- dimensional heavy-tailed stationary time series. These proposed methods are shown to achieve optimal error bounds with respect to matrix max-norm and matrix spectral-norm respectively. Our non-asymptotic results are based on concentration inequalities under dependence and allow to clarify the impact of high-dimensionality, heavy-tailedness and temporal dependence on various estimation procedures. We also provide several applications of our results in time series analysis, including the long-run covariance matrix estimation, the autocorrolation matrix estimation and the l∞-type Gaussian approximation for high-dimensional vector. The finite sample performance of our estimators are also justified by Monte Carlo simulations.

(7)

(8)

R´ esum´ e

Cette thèse se compose de deux parties. Dans la première partie, nous nous concentrons sur la modélisation paramétrique, l’estimation et l’inférence statistique de séries temporelles multivariées. Nous proposons une nouvelle approche de modélisation et d’estimation multivariée permettant de caractériser et d’estimer des structures de dépendance latentes complexes de manière efficace en termes de calcul par rapport aux approches existantes. Cette approche est appliquée, en particulier, à la calibration de capteurs in- ertiels. Les propriétés statistiques de cette approche permettent notamment de tester la dépendance entre les capteurs, d’intégrer leur dépendance dans les filtres de navigation et de construire un capteur virtuel optimal qui peut être utilisé pour améliorer la précision de la navigation.

Dans la deuxième partie, nous nous concentrons sur l’estimation de l’autocovariance des séries temporelles hautement dimensionnelles, où la matrice d’autocovariance de l’échantillon a une performance inappropriée. Nous proposons et étudions deux groupes de méthodes d’estimation robustes pour les matrices d’autocovariance d’une série temporelle stationnaire hautement dimensionnelle à queue épaisse. Nous montrons que les méthodes proposées atteignent des limites d’erreur optimales en ce qui concerne la norme maxi- male de la matrice et la norme spectrale de la matrice, respectivement. Nos résultats non asymptotiques sont basés sur les inégalités de concentration sous dépendance et permettent de clarifier l’impact de la haute dimensionnalité, de l’épaisseur des queues et de la dépendance temporelle sur diverses procédures d’estimation. Nous présentons

´

egalement plusieurs applications de nos résultats dans l’analyse des séries temporelles, notamment l’estimation de la matrice de covariance à long terme, l’estimation de la matrice d’autocorrélation et l’approximation gaussienne de type l_∞ pour un vecteur à haute dimension. Les performances de nos estimateurs en échantillon fini sont également jus- tifiées par des simulations de Monte-Carlo.

(9)

(10)

Chapter 1 Introduction

1.1 Motivations

Advances in science and technology continuously provide us with efficient ways to col- lect and store massive data. Indeed, the availability of massive data, containing large sample size and high-dimensionality, offer unprecedented opportunaties, especially from a statistical perspective, to reveal critical patterns for as-yet-undiscovered mechanisms in many fields of research, but it brings new challenges in analyzing such massive data with statistical guarantees. In this thesis, we focus on analyzing stationary multivariate time series, and attempt to provide answers to the challenges arised when the classical parametric assumptions are unrealistic and/or in the presense of heavy-tailedness and high-dimensionality.

Time series (temporal dependent processes) are encountered commonly in modern data collections. However, a majority of the classical tools in probability and statistics are developed by assuming the data are independent and identically distributed (i.i.d.). Thus, applying these classical tools to dependent processes without caution can potentially lead to problematic and misleading scientific results [see e.g. RT96; LF14]. For this reason, we relax the i.i.d. assumptions to the more general stationary assumptions (though, some of the tools we develop can be extended to the nonstationary settings).

Modeling multivariate time series is challenging due to the temporal dependence as well as the cross-sectional dependence among dimensions. A variety of existing models which account for these dependence structures are generally flexible. However, they frequently suffer from identifiability issue possibly due to overparametrization, or computational challenges due to the model complexity and the massive sample size. In Chapter 2, we extend and propose a new modeling framework which uses a composition of simple latent variables [see e.g. Gue+13; Har90; KD00] for univariate processes to multivariate settings. We refer to these models as multivariate latent models which are particularly relevant for the task of Inertial Sensor Calibration (ISC). We also propose, a new method that is able to estimate the multivariate latent models in a computationally efficient and numerically stable manner. This new method consists in an extension of a method called the Generalized Method of Wavelet Moments (GMWM) which was initially proposed to estimating complex latent models in univariate settings [see Gue+13]. More precisely, we allow the GMWM to make use of a quantity called Wavelet Cross-CoVariance (WCCV) which is the cross-covariance of the wavelet coefficients resulting from a wavelet decomposition of the mutivariate process to be analyzed [see Per95;WGP00]. Taking advantages of the multi-scale properties of the wavelet decomposition, this method overcomes the

(15)

identifiability issue, wile ensuring computational efficiency.

Besides the parametric modeling of mutivariate time series with fixed dimensionality, we are also motivated by the challenges brought by high-dimensionality. Sharp (exponential-type) concentration properties are particularly important in high-dimensional settings, where the dimensionality d is allowed to grow with the sample size n and possibly be much larger than n. Classical theories developed in these settings often rely on sub-Gaussian assumptions, and exclude many important applications where heavy-tailed data are frequently observed. At the same time, autocovariance matrices for stationary time series representing the linear structures of both cross-sectional and temporal dependence play a fundamental role in time series analysis. Moreover, in these settings, it is well known that the sample covariance matrix is not consistent (with respect to matrix spectral-norm). Even with assumptions on the covariance structure and/or sparsity, many popular regularized methods based on the sample covariance matrix rely on i.i.d. and sub-Gaussian assumptions in order to achieve the minimax optimality. In Chapter 3, we propose and study four tail-robust autocovariance matrix estimators for high-dimensional time series. Our nonasymptotic results show that these estimators achieve optimal finite sample performance without requiring i.i.d. and sub-Gaussian assumptions, thus providing statistical guarantees for these estimators in time dependent settings.

1.2 Mathematical background

1.2.1 Multivariate stationary time series

Throughout this thesis, we conside an R^d-valued stationary process {X_i}i∈Z of the form X_i =G(. . . , i−1, _i), (1.1) where {_i}i∈Z is a sequence of i.i.d. random elements and G(·) is a measurable function.

Time series of the above form are general enough to cover a large class of causal stationary processes, such as linear processes and nonlinear processes including the GARCH models and the iterated random functions [see Wu05; Wu11]. Besides its generality, this representation provides a theoretical framework to develop statistical tools for dependent data, some of which will be discussed in the rest of this chapter. In the seminal paper [Wu05], Wu proposed the functional dependence measure for stationary processes:

Definition 1.1 (Definition 1 in [Wu05]). Let {X_i}i∈Z be an R-valued process of the form (1.1) (with d = 1), such that kX_ik_q < ∞ for some q > 0. For i ≥ 0, we define the functional dependence measure as

δi,q:= kXi−X_i⁰kq, (1.2)

where X_i⁰ := G(. . . , ₋₁, ⁰₀, 1, . . . , i) is a coupled version of Xi with 0 being replaced by an independent copy ⁰₀. For q >0, we denote kX_ik_q :=E|X_i|^q^1/q.

Remark 1.2. The functional dependence measure (1.2) quantifies the degree of depen- dence (in terms of moments) for stationary processes. There also exist other dependence measures commonly used in the literature. In particular, the strong mixing coefficients [see e.g. Ded+07; Bra07] measure the dependence by imposing certain distances between the joint distribution function and the product of the marginal ones. For example, the

(16)

1.2. Mathematical background 3 total variation distance naturally induces the α-mixing coefficient [Ros56]. Comparing to the the functional dependence measure, the conditions associated with strong mixing coefficients are usually hard to be verified in practice, and often contain smoothness as- sumptions which exclude certain important processes from the discussion. For instance, the Bernoulli shift process [And84] is not strong mixing but can be studied by the functional dependence measure. In general, the functional dependence measure and strong mixing coefficients are not nested. The former one relies on the causal representation (3.1), while the latter approach has a more nonparametric spirit.

The following two assumptions impose decay rates (polynomially decay and exponentially decay respectively) on the functional dependence measure δ_k,q [see WW16; ZW17].

Assumption 1.3 (Dependence Adjust Norm (DAN)). There exists ν ≥0 such that kX_.k_q,ν := sup

m≥0

(m+ 1)^ν

∞

X

k=m

δ_k,q <∞.

Assumption 1.4 (Geometric Moment Contraction (GMC)). There exists ρ∈(0,1)such that

kX_.k_q := sup

m≥0

ρ^−m

∞

X

k=m

δ_k,q <∞.

Under this framework, one can easily construct a martingale approximation [see e.g.

Wu07] or an m-dependence approximation [see e.g.LXW13]) by the coupling techniques [see e.g. BHS09]. One can also quantify the approximation error, as well as seek well established theories for martingale difference sequences or i.i.d. sequences. We brifely review some of those tools in the rest of this subsection.

1.2.2 Moment and concerntration inequalities

Moment and concerntration inequalities, as they hold for any sample size n, are useful to characterize the nonasymptotic properties of a statistic. In this part, we consider the moment and concerntration inequalities of an univariate stationary process {X_i}i∈Z of form (1.1) and with mean zero. To start, we define the partial sum process

S_n :=

n

X

i=1

X_i.

In order to derive the concentration property of {X_i}i∈Z, a moment inequality for S_n, which provides an upper bound of some polynomial or exponential moment, is usually necessary. One way to obtain a (polynomial) moment inequality is introduced by [Wu05]

based on the following Burkholder’s inequality [see Rio09].

Lemma 1.5 (Burkholder’s inequality). Let q > 1, q⁰ := min(q,2). Let M_n := ^Pⁿ_i=1D_i, where {D_i}ⁿ_i=1 are martingale differences, such that kD_ik_q <∞. Then

kM_nk^q_q⁰ ≤K_q^q⁰

n

X

i=1

kD_ik^q_q⁰, where K_q := max((q−1)⁻¹,√

q−1).

(17)

Burkholder’s inequality considers the sum of a martingale difference sequence which is not the case for S_n. However, we can construct the martingale difference sequence by rewriting each summand as:

X_i =

∞

X

k=0

Pi−kX_i, (1.3)

where P_k· := E(·|F_k)−E(·|Fk−1) is the projection operator with F_k := (. . . , k−1, _k).

By construction, {Pi−kX_i}i∈Z is a martingale difference sequence. For q ≥ 2, applying Burkholder’s inequality, we have

n

X

i=1

Pi−kX_i

_q

≤(q−1)^1/2n^1/2kP₀X_kk_q ≤(q−1)^1/2n^1/2δ_k,q, and

kS_nk_q =

n

X

i=1

∞

X

k=0

Pi−kX_i

_q

≤

∞

X

k=0

n

X

i=1

Pi−kX_i

_q

≤(q−1)^1/2n^1/2

∞

X

k=0

δ_k,q. (1.4) By the moment inequality (1.4) withq = 2, we can bound the long-run covariance

σ∞ := lim

n→∞var(n^−1/2S_n)≤

∞

X

k=0

δ_k,2. (1.5)

If the quantity of interest is S_n^∗ := max1≤i≤n|S_i|, Theorem 1 in [Wu07] provides the following maximal inequality, which is based on Dood’s inequality in addition to the same martingale decomposition (1.3) and Burkholder’s inequality, as

kS_n^∗k_q≤ qB_q q−1n^1/2

∞

X

k=0

δ_k,q, (1.6)

where B_q = 18q^3/2(q −1)^−1/2 if q > 2 and B_q = 1 if q = 2. Note that the moment inequality (1.4) and the maximal inequality (1.6) have the upper bounds being equivalent up to a constant.

For the same purpose, [LXW13] generalizes the classical Rosenthal-type inequality to stationary settings. Their result is based on a different martingale decomposition:

X_i =

∞

X

j=n+1

(Xi|j−Xi|j−1) +

n

X

j=1

(Xi|j −Xi|j−1) +Xi|0, (1.7)

where Xi|j := E(X_i|i−j, . . . , _i), for j ≥ 0. By construction, {Xn−i|j −Xn−i|j−1}ⁿ⁻¹_i=0 are martingale differences with respect to the σ-algebra σ(_n−i−j, n−i−j+1,...), and X_i|0 are i.i.d.. Thus, the Burkholder’s and Doob’s inequalities lead to the following lemma:

Lemma 1.6 (Lemma 1 in [LXW13]). Assume kXikq <∞ for someq >2, then kS_n^∗k_q ≤n^1/2

"

87q logq

n

X

j=1

δ_k,2+ 3(q−1)^1/2

∞

X

k=n+1

δ_k,q+ 29q

logqkX₁k₂

#

+n^1/q

"

87q(q−1)^1/2 logq

n

X

k=1

k^1/2−1/qδ_k,q+ 29q

logqkX₁k_q

#

.

(1.8)

(18)

1.2. Mathematical background 5 Lemma1.6considers theq-th moment of S_n^∗, but it is also applicable tokSnkq. Compared to the inequality (1.6), Rosenthal-type inequality (1.8) has a different spirit, since it uses the variance of S_n as the benchmark and relates it with higher moments. Moreover, by further requiring the decay rate of the functional dependence measure δk,q, such as Assumptons 1.3 or 1.4, one can ensure the series appearing in the above inequalities to be summable.

Aside from the moment inequalities, the martingale decomposition (1.7) has also been used by [LXW13] and [WW16] to derive Nagaev-type concentration inequalities for stationary processes with only finite q-th moments and without requiring the boundedness.

Theorem 1.7 (Theorem 2 in [WW16]). Define

ω¯n := 2^ν ×











1, if ν >1/2−1/q, (logn)^1+2q, if ν = 1/2−1/q, n^{q/2−1−νq}, if ν <1/2−1/q.

Then, for all x >0, under Assumption 1.3, we have P(|S_n| ≥x)≤C₁ω¯_nnkX_.k^q_q,ν

x^q +C₂exp

− C₃x² nkX_.k²_2,ν

, where C₁, C₂, C₃ >0 are constants depending only on ν and q.

Moreover, a Nagaev-type inequality for S_n^∗ can be found in Theorem 2 of [LXW13].

Without assuming the finite exponential moment of S_n, Nagaev-type inequality, which provides a mixture of polynomial and exponential tail bounds, is probably the optimal one. However, if in addition one assumes that |X_i| ≤ M for all i ∈ Z and exponential decay of the functional dependence measure, [Zha18] obtains the following Bernstein-type inequality which gives an exponential tail bound for Sn.

Theorem 1.8 (Theorem 2.1 in [Zha18]). Assume there exists a constant M > 0, such that |X_i| ≤ M for all i, and for some q ≥ 2 and ρ ∈ (0,1) Assumption 1.4 holds. Also assume n≥4∨(log(ρ⁻¹)/2). For any t >0 such that t <(C₂M)⁻¹(logn)⁻², we have

logEexp(tS_n)≤ C₁t²(nkX_.k²₂+M²)

1−C₂tM(logn)² , (1.9) which further implies the Bernstein-type inequality. That is, for any x >0, we have

P(Sn ≥x)≤exp − x²

4C₁(nkX_.k²₂+M²) + 2C₂M(logn)²x

!

, where C₁, C₂ >0 are constants only depending on ρ.

The key step to prove the above Bernstein-type inequality is to obtain (1.9), the bound for a (log) Laplace transform ofS_n. This is not an easy task compared to the independent settings, where the exponential moment of S_n can be factorized as products. [MPR08]

originally provided an idea based on constructing the threefold recurrent Cantor sets for a stationary process. By removing the middle sets, which are of relatively small size, and thus negligible (in order), at each recurrence, this construction provides enough room to replace the remaining sets with mutually independent sets, which maintain the original

(19)

dependence structure within each set. The coupling technique can be applied to construct such sets and the impact of these removal and replacement can be quantified by the functional dependence measure. This idea has been used in [Zha18;BMY16;HL19] for similar purposes, and we also use it to prove a Bernstein-type inequality for sum of stationary matrices in Chapter 3. In addition, for unbounded stationary processes, exponential tail bounds can be found in [WW16; Kuc+18] based on the martingale decomposition (1.7), but stronger moment and/or dependence conditions (i.e. the dependence-adjusted sub- Gaussian norm in [WW16] or the Assumption DEP in [Kuc+18]) are needed.

The martingale decomposition (1.7) is effective, in the sense that it allows the application of many powerful probability tools for martingale differences, such as Burkholder’s inequality, Doob’s maximal inequality and Freedman’s martingale inequality [Fre75]. How- ever, this approach may fail in multivariate settings due to known limitations or lacking of theories regarding vector-valued martingales [see e.g. LL09]. To circumvent these po- tential problems, m-dependence approximation together with the blocking technique are also widely implemented. Recall Xi|m := E(X_i|i−m, . . . , _i) for m ≥ 0, then {Xi|m}ⁿ_i=1 is the m-dependence approximation of the original process {Xi}ⁿ_i=1. The process can be devided into consecutive blocks with size m, then we have Xi|m within interlacing blocks are mutually independent. The access to well established tools for independent data is thus permitted. Moreover, the approximation errors can be quantified by the functional dependence measure [see e.g. LL09; ZW17; ZC18].

1.2.3 Asymptotic tools

Asymptotic theory plays an important role in statistical inference. Specifically, the Central Limit Theorem (CLT) allows to approximate the distribution of a general estimator. The weak invariance principle (Functional CLT), focusing on the uniformly weak convergence of partial sum process in sample size n, is fundamental for simultaneous inference and sequential analysis. As an example, we give the following result from [Wu11].

Theorem 1.9 (Theorem 3 in [Wu11]). Consider an univariate stationary process {X_i}i∈Z

with zero mean and assume ^P^∞_i=0δi,2 <∞. Then, we have the weak invariance principle {n^−1/2S_bntc,0≤t≤1} {σ_∞B(t),0≤t≤1},

where σ²_∞ is the long-run variance defined in (1.5), {B(t),0 ≤ t ≤ 1} is a Brownian motion and denotes weak convergence.

When t = 1, the above weak invariance principle reduces to a CLT. Moreover, the strong invariance principle, on the basis of the weak invariance principle, gives additionally the convergence rate [see KW20, where an (unimprovable) optimal rate is provided for multivariate stationary processes with fixed dimensionality]. Aside from measuring the approximation error directly, the Gaussian approximation error can also be characterized in terms of the Kolmogorov–Smirnov distance by the Berry–Esseen theorem. [Jir16] proves a Berry–Esseen theorem for univariate stationary processes with the optimal convergence rate.

Theorem1.9 considers the weak convergence for partial sum processes. However some statistics and machine learning applications, such as the goodness-of-fit test and the empirical risk minimization, require the weak convergence of empirical processes, so that statistical inference can be performed. Consider random variables {X_i}i∈Z defined on the probability space (Ω,F,P). Let A be a set of R-valued measurable functions, such

(20)

1.3. Organization of this thesis 7 that Sn : (Ω,F,P) → l^∞(A). For a function g ∈ A, the empirical process is defined as S_n(g) := ^Pⁿ_i=1g(X_i). The goal is to show that S_n(g) for g ∈ A converges weakly to a Gaussian process W(g) for g ∈ A. Theorem 1.5.4 in [VDVW96] indicates that the weak convergence can be characterized as finite dimensional convergence in distribution and the asymptotic tightness (i.e. the asymptotic equicontinuity) of the empirical processS_n:

• (S_n(g₁), . . . , S_n(g_k)) is asymptotically normal for all choices of g₁, . . . , g_k ∈ A with k being finite.

• S_n is asymptotically equicontinuous under certain semimetric.

Since the finite dimensional convergence in distribution can be proved using the CLT, the key step is to show the asymptotical equicontinuity under dependence, which is provided in [Hag14; VS13].

1.2.4 High-dimensional inference

In the previous part, we brifely reviewed the asymptotic tools for univariate and multivariate stationary processes whose dimensionality is fixed. However, in high-dimensional settings, these tools cannot be applied. After the breakthrough achieved in [CCK13], where a Gaussian approximation is obtained for the maximum of a high-dimensional sample mean vector in i.i.d. settings, it has become more and more popular to conduct statistical inference based on the maximum-type statistic. [ZW17; ZC18] extend this result to stationary high-dimensional time series. However, these results require sub-Gaussian moment condition in order to allow ultra high-dimensionality, i.e. the dimensionality can grow exponentially with the sample size. In the i.i.d. case, the restrictive sub-Gaussian condition has been relaxed by [LW17]. The authors use the truncated mean estimator and show that ultra high-dimensionality is achievable under a very mild (2 +θ)-th mo- ment condition for some θ > 0. Inspired by the proof in the mentioned papers and using the big-and-small block technique as well as them-dependence approximation, we obtain similar results for stationary time series in Chapter 3, which enable statistical inference based on our element-wise truncated estimator.

1.3 Organization of this thesis

The main results of this thesis are contained in forthcoming two self-contained chapters.

In Chapter2, we focus on statistical modeling and estimation of a multivariate time series.

This work corresponds to [Xu+19], published inIEEE Transactions on Signal Processing and supervised by Prof. St´ephane Guerrier. In this work, we propose a new multivariate latent variable model which accounts for both the cross-sectional and temporal dependence for multivariate time series. In order to estimate this model with computational efficiency and statistical guarantees, we extend GMWM approach, which was initially proposed for estimating latent models in univariate settings, to multivariate settings and study its statistical properties. Our approach can be applied to analyze general multivariate time series data, and we focus on the inertial sensor calibration, which is an important application in the field of signal processing.

Chapter 3 corresponds to the manuscript [XKG21] supervised by Prof. Yuan Ke and Prof. St´ephane Guerrier. The main results of this chapter are the nonasymptotic theories of several tail-robust autocovariance matrix estimators. Our theories characterize the

(21)

adverse impacts of the dimensionality, the heavy-tailedness and the temporal dependence on autocovariance matrix estimation. We also clarify the beneficial effects of correspond- ing robustification tools in tackling the heavy-tailedness. We consider the problem in the high-dimensional settings, where the dimensionality of the time series grows with the sample size and possibly be much larger than the sample size.

In order to be consistent across this thesis, notations are adapted and may be different from the ones in the published paper or the manuscript.

1.4 Other research projects

During the whole process of this thesis, I am fortunate to have had opportunities to work on other research projects, which are not included in this thesis. These research projects are listed as follows:

1. Xu, H., Guerrier, S., Molinari, R., & Zhang, Y., “A study of the Allan variance for constant-mean nonstationary processes”, IEEE Signal Processing Letters, 24(8), 1257-1260, 2017. https://doi.org/10.1109/LSP.2017.2722222

2. Branca, M., Orso, S., Molinari, R., Xu, H., Guerrier, S., Zhang, Y., & Mili, N.

“Is Non-Metastatic Cutaneous Melanoma Predictable through Genomic Biomak- ers?” Melanoma Research, 28(1):21–29, February 2018. https://doi.org/10.1097/

CMR.0000000000000412

3. Zhang, Y., Xu, H., Radi, A., Molinari, R., Guerrier, S., Karemera, M., & El-Sheimy, N., “An optimal virtual inertial sensor framework using wavelet cross covariance”, In 2018 IEEE/ION Position, Location and Navigation Symposium (PLANS) (1342- 1350). https://doi.org/10.1109/PLANS.2018.8373525

4. Guerrier, S., Jurado, J., Khaghani, M., Bakalli, G., Karemera, M., Molinari, R., Orso, S., Raquet, J., Kabban, C.M.S., Skaloud, J., Xu, H., & Zhang, Y., “Wavelet- based moment-matching techniques for inertial sensor calibration”, IEEE Trans- actions on Instrumentation and Measurement, 69(10), 7542-7551, 2020. https:

//doi.org/10.1109/TIM.2020.2984820

5. Guerrier, S., Molinari, R., Victoria-Feser, M. P., & Xu, H., “Robust two-step wavelet- based inference for time series models”, Journal of the American Statistical Associ- ation (2021). https://doi.org/10.1080/01621459.2021.1895176

6. Xiao, D., Xu, H., Ke, Y., & Ahn, J., “Multiple change points detection problems for high-dimensional time series”, Working paper.

The first paper generalizes the Allan variance, which is a quantity related to the wavelet variance frequently used in sensor calibration, to second order nonstationary processes.

The second paper is unrelated to this thesis. It applies an heuristic variable selection algorithm proposed in [Gue+16] to a gene selection problem. The next three papers are closely related to Chapter 2. The third paper delivers a general and flexible framework to construct a virtual sensor, which considerably improves navigation accuracy. The fourth paper surveys a class of estimators based on moment-matching techniques and and compares them both theoretically and in the application of inertial sensor calibration. In parallel to the multivariate extension of GMWM in Chapter 2, the fifth paper robustifies

(22)

1.4. Other research projects 9 GMWM and investigates its statistical properties. The last working paper is related to Chapter 3, which puts forword an approach to detect and localize multiple change points on autocovariance structure of a high-dimensional time series.

(23)

References for Chapter 1

[And84] Donald Andrews. “Non-strong mixing autoregressive processes”. In:Journal of Applied Probability 21.4 (1984), pp. 930–934.

[BHS09] Istv´an Berkes, Siegfried H¨ormann, and Johannes Schauer. “Asymptotic results for the empirical process of stationary sequences”. In: Stochastic pro- cesses and their applications 119.4 (2009), pp. 1298–1324.

[BMY16] Marwa Banna, Florence Merlev`ede, and Pierre Youssef. “Bernstein-type inequality for a class of dependent random matrices”. In: Random Matrices:

Theory and Applications 5.02 (2016), p. 1650006.

[Bra07] Richard Bradley. Introduction to strong mixing conditions. Kendrick press, 2007.

[CCK13] Victor Chernozhukov, Denis Chetverikov, and Kengo Kato. “Gaussian ap- proximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors”. In: The Annals of Statistics 41.6 (2013), pp. 2786–2819.

[Ded+07] J´erˆome Dedecker et al. “Weak dependence”. In: Weak dependence: With examples and applications. Springer, 2007, pp. 9–20.

[Fre75] David Freedman. “On tail probabilities for martingales”. In: the Annals of Probability (1975), pp. 100–118.

[Gue+13] St´ephane Guerrier et al. “Wavelet-variance-based estimation for composite stochastic processes”. In: Journal of the American Statistical Association 108.503 (2013), pp. 1021–1030.

[Gue+16] St´ephane Guerrier et al. “A predictive based regression algorithm for gene network selection”. In: Frontiers in genetics 7 (2016), p. 97.

[Hag14] Andreas Hagemann. “Stochastic equicontinuity in nonlinear time series models”. In: The Econometrics Journal 17.1 (2014), pp. 188–196.

[Har90] Andrew Harvey. Forecasting, structural time series models and the Kalman filter. Cambridge university press, 1990.

[HL19] Fang Han and Yicheng Li. “Moment bounds for large autocovariance matrices under dependence”. In: Journal of Theoretical Probability (2019), pp. 1–

48.

[Jir16] Moritz Jirak. “Berry–Esseen theorems under weak dependence”. In: The Annals of Probability 44.3 (2016), pp. 2024–2063.

[KD00] Siem Koopman and James Durbin. “Fast filtering and smoothing for multivariate state space models”. In:Journal of Time Series Analysis21.3 (2000), pp. 281–296.

[Kuc+18] Arun Kumar Kuchibhotla et al. “A model free perspective for linear regression: Uniform-in-model bounds for post selection inference”. In: arXiv preprint arXiv:1802.05801 (2018).

[KW20] Sayar Karmakar and Wei Biao Wu. “Optimal gaussian approximation for multiple time series”. In: arXiv preprint arXiv:2001.10164 (2020).

[LF14] Dongyu Lin and D Foster. The Power of a Few Large Blocks: A credible assumption with incredible efficiency. Tech. rep. Working paper, 2014.

(24)

1.4. Other research projects 11 [LL09] Weidong Liu and Zhengyan Lin. “Strong approximation for a class of stationary processes”. In: Stochastic Processes and their Applications 119.1 (2009), pp. 249–280.

[LW17] Zhipeng Lou and Wei Biao Wu. “Simultaneous Inference for High Dimen- sional Mean Vectors”. In: arXiv preprint arXiv:1704.04806 (2017).

[LXW13] Weidong Liu, Han Xiao, and Wei Biao Wu. “Probability and moment inequalities under dependence”. In: Statistica Sinica (2013), pp. 1257–1272.

[MPR08] Florence Merlev`ede, Magda Peligrad, and Emmanuel Rio. “Bernstein inequality and moderate deviations under strong mixing conditions”. In: High dimensional probability V: the 5th International Conference (HDP V). In- stitute of Mathematical Statistics, Beachwood, OH. 2008, pp. 273–292.

[Per95] Donald Percival. “On estimation of the wavelet variance”. In: Biometrika 82.3 (1995), pp. 619–631.

[Rio09] Emmanuel Rio. “Moment inequalities for sums of dependent random variables under projective conditions”. In:Journal of Theoretical Probability22.1 (2009), pp. 146–163.

[Ros56] Murray Rosenblatt. “A central limit theorem and a strong mixing condition”.

In: Proceedings of the National Academy of Sciences of the United States of America 42.1 (1956), p. 43.

[RT96] Joseph Romano and Lori Thombs. “Inference for autocorrelations under weak assumptions”. In:Journal of the American Statistical Association91.434 (1996), pp. 590–600.

[VDVW96] Aad W Van Der Vaart and Jon A Wellner. “Weak convergence”. In: Weak convergence and empirical processes. Springer, 1996, pp. 16–28.

[VS13] Stanislav Volgushev and Xiaofeng Shao. “A general approach to the joint

asymptotic analysis of statistics from sub-samples”. In:arXiv preprint arXiv:1305.5618 (2013).

[WGP00] Brandon Whitcher, Peter Guttorp, and Donald Percival. “Wavelet analysis of covariance with application to atmospheric time series”. In: Journal of Geophysical Research 105.D11 (2000), pp. 941–962.

[Wu05] Wei Biao Wu. “Nonlinear system theory: Another look at dependence”. In:

Proceedings of the National Academy of Sciences 102.40 (2005), pp. 14150–

14154.

[Wu07] Wei Biao Wu. “Strong invariance principles for dependent random variables”. In: The Annals of Probability 35.6 (2007), pp. 2294–2320.

[Wu11] Wei Biao Wu. “Asymptotic theory for stationary processes”. In: Statistics and its Interface 4.2 (2011), pp. 207–226.

[WW16] Wei Biao Wu and Ying Nian Wu. “Performance bounds for parameter es- timates of high-dimensional linear models with correlated errors”. In: Elec- tronic Journal of Statistics 10.1 (2016), pp. 352–379.

[XKG21] Haotian Xu, Yuan Ke, and St´ephane Guerrier. “Nonasymptotic theories for tail-robust autocovariance matrix estimation methods”. In: (2021).

(25)

[Xu+19] Haotian Xu et al. “Multivariate signal modeling with applications to inertial sensor calibration”. In: IEEE Transactions on Signal Processing 67.19 (2019), pp. 5143–5152.

[ZC18] Xianyang Zhang and Guang Cheng. “Gaussian approximation for high dimensional vector under physical dependence”. In: Bernoulli 24.4A (2018), pp. 2640–2675.

[Zha18] Danna Zhang. “Robust Estimation of the Mean and Covariance Matrix for High Dimensional Time Series”. In: Statistica Sinica (2018).

[ZW17] Danna Zhang and Wei Biao Wu. “Gaussian approximation for high dimensional time series”. In: The Annals of Statistics 45.5 (2017), pp. 1895–1919.

(26)

Chapter 2 Multivariate Signal Modeling with Applications to Signal Processing

2.1 Introduction

The modeling of multivariate time series is an important as well as challenging task in many applied domains of data analysis. Indeed, it is extremely common to measure phenomena over time which not only manifest autocorrelation with their past but also show a form of dependence between them. This setting can be observed, among others, in a variety of longitudinal studies [see e.g. RL00; Cho16], in economic and financial research [see e.g. CLM97; Sim80; KD00; PY08] as well as in different fields of engineer- ing [see e.g. Don06; VZ17]. In the latter case, an increased attention has been given to the modeling of multivariate signals for the task of inertial sensor calibration where the error signals of the individual accelerometers and gyroscopes that compose the Inertial Measurement Units (IMU) are characterized by error signals that are dependent on each other [seeVZ17]. To date, these error signals have been modeled independently from each other and this task alone has already been challenging to deal with. In fact, the individual error signals of these instruments are characterized by deterministic and stochastic components where the first can be dealt with through physical models (which are em- ployed for calibration and compensation prior to use) while the latter are often described through complex stochastic models that are made by the sum of different underlying processes. The estimation of these stochastic models has always been complicated and many methods used for this purpose have either suffered from statistical limits and/or are computationally or numerically unstable. Indeed, in most latent variable modeling scenarios, the Maximum-Likelihood Estimator (MLE) cannot be directly applied since the latent stochastic components are unobservable and their marginal distribution is usually unknown. A feasible likelihood-based method is the Expectation-Maximization (EM) algorithm [see DLR77] whose implementation is nevertheless limited due to the complexity and, more importantly, the possible non-convexity of its associated optimization problems therefore severely limiting the use of its statistical properties in practice [see e.g.AHK12].

To avoid the problems of the likelihood-based methods, the linear regression based on the Allan Variance (AVLR) is widely used for estimation and prediction especially in the field of inertial sensor calibration [see e.g. ESHN08]. Although, the AVLR is a computationally feasible method for estimating many complex models, its statistical properties are not optimal as discussed in [GMS16] where the inconsistency of this method was shown for the majority of the latent models used for the purpose of sensor calibration.

(27)

Given the complexity of these models and the difficulties in estimating them, the signals have thus far been dealt with individually without taking into account the forms of dependence between them. As mentioned earlier, this approach is non-optimal since a multivariate approach to signal analysis would be more appropriate for different reasons.

Firstly, it is reasonable to assume that there exists a dependence between individual sensors along the three axes of an IMU and, when using an array of sensors, it is even more reasonable to assume that there is dependence between error signals of sensors placed in the same direction since they are measuring the same phenomenon [seeBP02]. If the multivariate dependence is not considered, this can deliver different problems which include parameter misestimation and flawed testing procedures leading to incorrect modeling and degraded navigation performance. Considering this, the literature in the area of multivariate time series analysis has collected a variety of models to describe these settings as well as proposing methods that can estimate them. One of the first models to be proposed was the Vector Auto-Regression (VAR) model which, in some sense, generalizes the Auto-Regression (AR) model in order to allow lagged observations from some signals to influence current observations of others [see e.g. Ham94]. This class of models has been extended to the more general class of Vector Auto-Regression Moving Average (VARMA) models which also allow for correlation between innovation processes in each time series [see e.g. Ham94]. Although these models are quite flexible in considering different forms of autocorrelation and correlation between time series, their estimation can sometimes be numerically challenging and furthermore they may not be able to capture more complicated forms of multivariate dependence [see e.g. L¨ut06; MS07]. Moreover, in practice, VARMA models are only used directly when the number of signals composing the multivariate time series is smaller than three since these models are often overparametrized thereby leading to identifiability issues [see e.g. CGY18]. In this case, a more appropriate approach could be represented by only considering a model (or some models) to describe the dependence between all or some subsets of the signals composing the multivariate time series. Indeed, the individual time series can often be characterized by a latent model structure in which different underlying processes are superposed and, within this setting, there can also be a dependence of these individual latent processes with those present in other signals [see e.g. Gue+13; Har90; KD00]. We refer to these models as multivariate latent models which are particularly relevant for the task of inertial sensor calibration.

Considering the above setting, this chapter proposes a new method that is able to estimate the mentioned multivariate latent models in a computationally efficient and numerically stable manner. This new method consists in an extension of the Generalized Method of Wavelet Moments (GMWM) which was initially proposed as a statistically consistent approach to estimating complex latent models in univariate settings, overcoming the theoretical and computational limitations of the existing methods mentioned earlier [see Gue+13]. To do so the GMWM takes advantage of a quantity called the Wavelet Variance (WV) which is the variance of the wavelet coefficients resulting from a wavelet decomposition of a signal [see Per95]. However, the WV does not take into account the variability in the signal that is explained by the dependence on other signals. For this reason, in this chapter we will make use of the Wavelet Cross-Covariance (WCCV), which was proposed in [WGP00] and whose asymptotic properties we develop further in this chapter by reducing and weakening the relative conditions. Based on the latter properties, we extend the GMWM to include this quantity and deliver the Mulitvariate GMWM (MGMWM) whose statistical properties are studied in order to perform the required inference procedures. In order to achieve the latter properties, we will also discuss the

(28)

2.2. Multivariate latent processes 15 identifiability of a wide class of latent multivariate time series models which is essential to obtain consistency of the MGMWM, thereby shedding light on what kind of latent multivariate models can be estimated and reducing the conditions necessary to deliver adequate inference procedures on them. Based on these properties, the proposed framework allows to model dependence between and within sensors as well as provide the tools to test the presence of multivariate dependence between stochastic error signals thereby facilitating sensor calibration and contributing to navigation accuracy.

This chapter is organized as follows. In Section 2.2 we introduce the class of multivariate latent processes that we are going to propose and study, justifying their relevance for the task of inertial sensor calibration. In Section 2.3 we briefly introduce the wavelet decomposition to then define and study the WCCV thereby delivering the asymptotic properties of the WCCV vector. The latter properties are essential, among others, in order to obtain the asymptotic properties of the MGMWM which is introduced in Section 2.4 where the identifiability of a large class multivariate latent models is studied. To high- light the good statistical properties of the new estimator as well as its usefulness, Section 2.5 presents a simulation study comparing the finite sample performance of the proposed estimator with a recently proposed alternative while Section 2.6 presents an application in the field of inertial sensor calibration where this new approach can deliver an important contribution. Finally, Section 2.7 concludes.

2.2 Multivariate latent processes

Let us define a R^I-valued multivariate process as {Xt}t∈Z, where I ∈ N. As mentioned in the introduction, one of the most common approaches to modeling these multivariate process is through the use of VARMA models which have the following structure:

Xt =

P

X

p=1

A_pXt−p+

Q

X

q=1

U_qt−q+t,

where P and Q are the maximum lags of dependence for the AR and MA processes respectively among the individual time series composing the multivariate process,A_p and U_q are the coefficient matrices for thep-th and q-th lag and

t i.i.d.

∼ F(0,Σ), (2.1)

are independently distributed innovation vectors with multivariate distribution F with mean zero and covariance matrix Σ.

These multivariate time series models are very useful to describe a wide variety of phenomena and, at an individual level, can often be represented as a latent process made by the sum of underlying first-order autoregressive and white noise models [see GM76].

However, these models hide the latent structure which can often be extremely useful for interpretation and, more importantly, are limited to a class of latent models in which many processes that are relevant to many domains cannot be included. For this reason, in this chapter we consider a different class of multivariate latent models (which can possibly include reparametrizations of the aforementioned VARMA models) in order for practitioners to use these in a variety of settings where they are of great importance (e.g.

inertial sensor calibration). To do so, let us define a multivariate time series composed by K latent processes as

X_t=

K

X

k=1

S_kX_k,t, (2.2)

(29)

where X_t = (X_t⁽ⁱ⁾)_i=1,...,I is an I-dimensional vector and X_k,t = (X_k,t^(j))_j=1,...,D_k is a D_k- dimensional vector with k being the index for the type of univariate/multivariate model underlying some or all of the latent processes and S_k is an I ×D_k matrix with either 1 or 0 as its elements that we define as

S_k,(i,j) =

( 1 if X_k,t^(j) appears inX_t⁽ⁱ⁾ 0 if otherwise.

Therefore, we have a set ofK multivariate time series models (which can characterize one, some or all of the univariate processes) that can be summed to compose a multivariate latent model. Given this, let us introduce the individual multivariate time series models, commonly used also for the task of inertial sensor calibration [see e.g.TWW04], that will be considered in this chapter to build a general multivariate latent model:

(T1) White Noise (WN) which corresponds to the independently distributed innovation vectors _t defined in (2.1). We denote the process characterized by this model as X_1,t, hence X_1,t ^i.i.d.∼ F(0,Σ).

(T2) Random Walk (RW) which we denote as X_2,t and is defined as X_2,t =X_2,t−1+ι_t,

where ι_t ^i.i.d.∼ F(0,Λ).

(T3) Quantization Noise (QN) (or rounding error, [see e.g. PP02]) which we denote as X_3,t. For this process we do not consider a multivariate model and therefore each univariate process is characterized by the parameter Q²_i ∈ {x∈R|x >0}.

(T4) Drift (DR) which we denote as X_4,t and, for each univariate process, is defined as X_4,t⁽ⁱ⁾=ω_it,

where ω_i ∈ {x∈R|x >0}.

(T5) First-Order Auto-Regressive Noise (AR1) which we denote as X_k,t,∀ k = 5, . . . , K, where K ∈N, and defined as

X_k,t =Φ_kXk,t−1 +ε_t,

where Φ_k is a diagonal matrix with diagonal elements φ⁽ⁱ⁾_k , 0 < |φ⁽ⁱ⁾_k | < 1, for i= 1, . . . , D_k and ε_t^i.i.d.∼ F(0,Z_k).

The covariance matricesΣ,ΛandZ_k defined above are positive definiteDk×Dkmatrices with k = 1,2,5, ..., K respectively, with their respective elements being (σ^(i,i⁰⁾)1≤i≤i⁰≤D₁, (λ^(i,i⁰⁾)1≤i≤i⁰≤D2 and (z_k^(i,i⁰⁾)1≤i≤i⁰≤D_k. As can be seen from the above definitions, processes (T3)and (T4)do not have a multivariate structure (i.e. there is no dependence between univariate signals based on these models). This is a reasonable assumption in practice since process (T3) can be interpreted as a form of rounding error which is intuitively independent from the rounding error made on another signal while the (T4)process is a non-stochastic process whose behavior in time is deterministic in nature (hence the even- tual dependence would be dealt with through deterministic models). Another aspect to

Contributions to time series analysis

Thesis

Reference

Contributions to time series analysis

Contributions to time series analysis

Haotian Xu

Acknowledgements

Abstract

R´ esum´ e

Contents

Chapter 1 Introduction

1.1 Motivations

1.2 Mathematical background

1.2.1 Multivariate stationary time series

1.2.2 Moment and concerntration inequalities

1.2.3 Asymptotic tools

1.2.4 High-dimensional inference

1.3 Organization of this thesis

1.4 Other research projects

References for Chapter 1

Chapter 2

Multivariate Signal Modeling with Applications to Signal Processing

2.1 Introduction

2.2 Multivariate latent processes