• Aucun résultat trouvé

Extension aux mod`eles de probabilit´es libres

1.4 Contributions principales

1.4.3 Extension aux mod`eles de probabilit´es libres

σ2 C σ2 C+ σ2  , , i∈ [[r + 1, N]] , (1.4.13)

o`u nous voyons que la solution optimale au probl`eme d’estimation de C revient `a nettoyer les valeurs propres de M par le ratio signal sur bruit. Ce r´esultat est un r´esultat connu en statistiques Bay´esienne et il est int´eressant de noter `a nouveau un lien entre cette th´eorie et l’estimateur RIE. Une analyse plus pouss´ee de ces r´esultats est donn´ee dans la Section 13 et sont issus de l’article [40].

1.4.3. Extension aux mod`eles de probabilit´es libres. La derni`ere section de cette introduction g´en´erale concerne l’extension d’une partie des r´esultats mentionn´es pr´ec´edemment dans le cadre des mod`eles g´en´eraux d’addition et de multiplication libres. Nous avons motiv´e – surtout pour le mod`ele multiplicatif – l’int´erˆet pratique d’´etudier ce type de mod`eles. La grande majorit´e des r´esultats suivants proviennent de l’article [40]. Durant toute cette section, nous supposons que dans la limite N → ∞:

lim

η→0gM(λ− iη) = hM(λ) + iπρM(λ) , (1.4.14) soit bien d´efinie pour toute matrice M caract´eris´ee par les mod`eles (1.2.4) ou (1.2.5). De plus, nous supposons dans toute cette partie qu’il n’y a pas de valeur propre isol´ee (r = 0). Nous reviendrons sur cette hypoth`ese `a la fin de cette section.

Int´eressons nous d’abord au mod`ele d’addition libre (1.2.5). Nous rappelons que dans ce cas pr´ecis, le bruit ext´erieur B est invariant par rotation, ind´ependant de C mais poss`ede une distribution des valeurs propres arbitraires. Le premier r´esultat que nous avons ´et´e en mesure de g´en´eraliser est le comportement asymptotique de la r´esolvante de M:

Il est important de pr´eciser que si B est une matrice du GOE, alors nous retrouvons le r´esultat attendu (voir le Chapitre13pour plus de d´etails). Encore une fois, nous voyons que la r´esolvante de M tend vers une limite d´eterministe, ce qui simplifie les calculs.

A partir de ce r´esultat, nous pouvons calculer l’esp´erance des overlaps entre un ´etat perturb´e et non perturb´e:

N Ehui, vji2 = β1(λ)

(λ− µ − αa(λ))2+ π2βa(λ)2ρM(λ)2, i, j∈ [[1, N]] (1.4.16) o`u nous avons d´efinis les fonctions:

   αa(λ) ..= Re[RB(hM(λ) + iπρM(λ))], βa(λ) ..= Im[RB(hM(λ) + iπρM(λ))] πρM(λ) . (1.4.17)

Cela nous permet d’en d´eduire une formule (formelle) pour la valeur asymptotique de l’estimateur oracle (1.4.1):

ξora.

i ∼ Fai) , Fa(λ) = λ− αa(λ)− βa(λ)hM(λ) . (1.4.18) Nous pouvons constater que le r´esultat reste toujours “observable” dans le sens o`u la connais-sance de C ne semble pas ˆetre un pr´e-requis pour utiliser cette formule. Par contre, cela suppose que nous connaissons au moins le spectre de la matrice B dans la limite des grandes matri-ces. Il n’est pas ´etonnant de retrouver la transform´ee R dans ces r´esultats ´etant donn´e qu’elle caract´erise justement l’addition d’op´erateurs non-commutatifs.

La mˆeme analyse peut ˆetre men´ee pour le mod`ele de multiplication libre (1.2.4). Posons M ..= C1/2ΩBΩC1/2 o`u B est une matrice al´eatoire sym´etrique, invariante par rotation et de taille N× N, et Ω est une matrice de rotation de taille N × N qui est distribu´ee selon la mesure de Haar. Pour ce mod`ele, la relation des r´esolvantes est donn´ee par

GM(z) = Z(z)GC(Z(z)), Z(z) ..= zSB(zgM(z)− 1) , (1.4.19) qui est bien une g´en´eralisation de la relation (1.4.2) [40]. De fa¸con analogue au cas additif, il n’est pas ´etonnant de rencontrer la transform´ee S dans ce cadre ci, ´etant donne qu’elle caract´erise le produit d’op´erateurs non-commutatifs. En utilisant ce r´esultat, nous pouvons en d´eduire l’overlap moyen:

N Ehui, vji2 = µβm(λ)

(λ− µαm(λ))2+ π2µ2βm(λ)2ρM(λ)2 , i, j∈ [[1, N]] , (1.4.20) o`u les fonctions αm and βm sont d´efinies comme suit:

       αm(λ) := lim z→λ−i0+Re  1 SB(zgM(z)− 1)  βm(λ) := lim z→λ−i0+Im  1 SB(zgM(z)− 1)  1 πρM(λ). (1.4.21)

Comme pour le mod`ele additif, l’estimateur oracle (1.4.1) converge vers une fonction qui ne n´ecessite pas explicitement la connaissance de la matrice C que l’on cherche `a estimer. En effet, supposons que la transform´eeS de la matrice B est analytique, alors nous avons

avec

lim

z→λ−i0+SB(zgM(z)− 1) := γB(λ) + iπρM(λ)ωB(λ) . (1.4.23) Les r´esultats du mod`ele multiplicatif sont expliqu´es dans les Chapitres3et 7.

En conclusion, nous sommes capables d’´etudier le comportement asymptotique de l’estimateur oracle dans un cadre assez g´en´eral de matrices al´eatoires. Cependant, les r´esultats font ap-paraˆıtre les transform´ees R et S, dont les structures ne sont pas simples `a analyser. Il serait donc int´eressant de voir s’il est possible de trouver des exemples concrets, comme ceux men-tionn´es dans la Section 1.2, pour lesquels nous pouvons obtenir des r´esultats explicites comme dans les deux sections pr´ec´edentes. De plus, l’extension de ces r´esultats en pr´esence de valeurs propres isol´ees est un probl`eme ouvert tr`es important aussi bien en th´eorie qu’en pratique.

Advances in large covariance matrices

estimation

Introduction

2.1 Motivations

This part, which is the bulk of the thesis, is dedicated to the estimation of large sample covariance matrices. Indeed, in the present era of “Big Data”, new statistical methods are needed to decipher large dimensional data sets that are now routinely generated in almost all fields – physics, image analysis, genomics, epidemiology, engineering, economics and finance, to quote only a few. It is very natural to try to identify common causes (or factors) that explain the joint dynamics of N quantities. These quantities might be daily returns of the different stocks of the S&P 500, temperature variations in different locations around the planet, velocities of individual grains in a packed granular medium, or different biological indicators (blood pressure, cholesterol, etc.) within a population, etc., etc. The simplest mathematical object that quantifies the similarities between these observables is an N× N correlation matrix C. Its eigenvalues and eigenvectors can then be used to characterize the most important common dynamical “modes”, i.e. linear combinations of the original variables with the largest variance. This is the well known “Principal Component Analysis” (or PCA) method. More formally, let us denote by y ∈ RN

the set of demeaned and standardized1 variables which are thought to display some degree of interdependence. Then, one possible way to quantify the underlying interaction network between these variables is through the standard, Pearson correlations:

Cij = Eyiyj, i, j∈ [[1, N]], (2.1.1)

We will refer to the matrix C as the population correlation matrix throughout the following. The major concern in practice is that the expectation value in (2.1.1) is rarely computable precisely because the underlying distribution of the vector y is unknown and is what one is struggling to determine. Empirically, one tries to infer the matrix C by collecting a large number T of realizations of these N variables that defines the input sample data matrix Y = (y1, y2, . . . , yT) ∈ RN ×T. Then, in the case of a sufficiently large number of realizations T , one tempting solution to estimate C is to compute that sample correlation matrix estimator E, defined as: Eij ..= 1 T T X t=1 YitYjt1 T (YY )ij, (2.1.2) 1

where Yit is the realization of the ith observable (i = 1, . . . , N ) at “time” t (t = 1, . . . , T ) that will be assumed in the following to be demeaned and standardized (see previous footnote).

Indeed, in the case where N  T , it is well known using result of classical multivariate statistics that E converges (almost surely) to C [179]. However, when N is large, the simultaneous estimation of all N (N− 1)/2 the elements of C – or in fact only of its N eigenvalues – becomes problematic when the total number T of observations is not very large compared to N itself. In the example of stock returns, T is the total number of trading days in the sampled data; but in the biological example, T would be the size of the population sample, etc. Hence, in the modern framework of high-dimensional statistics, the empirical correlation matrix E (i.e. computed on a given realization) must be carefully distinguished from the “true” correlation matrix C of the underlying statistical process (that might not even be well defined). In fact, the whole point of the present part is to characterize the difference between E and C, and discuss how well (or how badly) one may reconstruct C from the knowledge of E in the case where N and T become very large but with their ratio q = N/T not vanishingly small; this is often called the large dimension limit (LDL), or else the “Kolmogorov regime”.

There are numerous situations where the estimation of the high-dimensional covariance ma-trix is crucial. Let us give some well-known examples:

(i) Generalized least squares (GLS): Suppose we try to explain the vector y using a linear model

y= Xβ + ε, (2.1.3)

where X is a N× k design matrix (k > 1), β denotes the regression coefficients to these k factors, and ε denotes the residual. Typically, one seeks to find β that best explains the data and this exactly the purpose of GLS. Assume that E[ε|X] = 0 and V[ε|X] = C the covariance matrix of the residuals. Then GLS estimates β as (see [7] for a more detailed discussion):

b

β= (XCX)−1XC−1y. (2.1.4)

We shall investigate this estimator in Section8.

(ii) Generalized methods of moments (GMM): Suppose one wants to calibrate the parameters Θ of a model on some data set. The idea is to compute the empirical average of a set of k functions (generalized moments) of the data, which should all be zero for the correct values of the parameters, Θ = Θ0. The distance to zero is measured using the covariance of these functions. A precise measurement of this k× k covariance matrix increases the efficiency of the GMM – see [88]. Note that GLS is a special form of GMM.

(iii) Classification (LDA) [79]: Suppose that we want to classify the variables y between two Gaussian populations with different mean µ1and µ2, priors π1and π2, but same covariance matrix C. The LDA rule classifies y to class 2 if

xC−11− µ2) > 1

22+ µ1)

C−12− µ1)− log(π21) (2.1.5)

(iv) Large portfolio optimization [125]: Suppose we want to invest on a set of financial assets yin such a way that the overall risk of the portfolio is minimized, for a given performance target ν. According to Markowitz’s theory, the optimal investment strategy is a vector of weights w ..= (w1, . . . , wp) that can be obtained through a quadratic optimization program where we minimize the variance of the strategyhw , Cwi subject to a constraint

on the expectation value hw , gi > µ, with g a vector of predictors and µ fixed. (Other constraints can also be implemented). The optimal strategy reads

w= ν C

−1g

gC−1g. (2.1.6)

As we shall see in Chapter8, a common measure of the “risk” of estimation in high-dimensional problems like (i) and (iv) above is given by TrE−1/TrC−1, which turns out to be very close to unity T is large enough for a fixed N , i.e. when q = N/T → 0. However, when the number of observables N is also large, such that the ratio q is not very small, we will find below that TrE−1 = TrC−1/(1− q) for a wide class of processes. In other words, the out-of-sample risk TrE−1 can excess by far the true optimal risk TrC−1 when q > 0, and even diverge when q→ 1. Note that for a similar scenario when Value-at-Risk is minimized in-sample was elicited in [49] and in [53] for the Expected Shortfall. Typical number in the case of stocks is N = 500 and T = 2500, corresponding to 10 years of daily data, already quite a long strand compared to the lifetime of stocks or the expected structural evolution time of markets, but that corresponds to q = 0.2. For macroeconomic indicators – say inflation, 20 years of monthly data produce a meager T = 240, whereas the number of sectors of activity for which inflation is recorded is around N = 30, such that q = 0.125. Clearly, effects induced by a non zero value of q are expected to be highly relevant in many applications.

2.1.1. Historical survey. The rapid growth of RMT (Random Matrix Theory) in the last two

Documents relatifs