• Aucun résultat trouvé

Deep neural network-based CHARME models with infinite memory

N/A
N/A
Protected

Academic year: 2021

Partager "Deep neural network-based CHARME models with infinite memory"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02518028

https://hal.archives-ouvertes.fr/hal-02518028

Submitted on 24 Mar 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Deep neural network-based CHARME models with infinite memory

José Gómez-García, Jalal Fadili, Christophe Chesneau

To cite this version:

José Gómez-García, Jalal Fadili, Christophe Chesneau. Deep neural network-based CHARME models with infinite memory. Data Science Summer School (DS3), Jun 2019, Paris - Saclay, France. �hal- 02518028�

(2)

Deep neural network-based CHARME models with infinite memory

Jos´e G. G´ omez-Garc´ıa (1) , Jalal Fadili (2) and Christophe Chesneau (1) (1) Lab. of Mathematics Nicolas Oresme (LMNO), Universit´e de Caen Normandie (2) Ecole Nationale Sup´erieure d’Ing´enieurs de Caen (ENSICAEN)

Abstract

We consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts), a class of generalized mixture of nonlinear nonparametric

AR-ARCH times series. Under certain Lipschitz-type conditions on the autoregressive and volatility functions, we prove that this model is τ -weakly dependent in the sense of Dedecker & Prieur (2004) [1], and therefore, ergodic and stationary. This result forms the theoretical basis for deriving an asymptotic theory of the underlying

nonparametric estimation. As application, for the case of a single expert, we use the universal approximation property of the neural networks in order to develop an

estimation theory for the autoregressive function by deep neural networks, where the consistency of the estimator of neurons and bias are guaranteed.

The model

Let ( E , k · k) be a Banach space. The conditional heteroscedastic p −autoregressive mixture of experts (CHARME( p )) model, with values in E , is defined by:

X

t

=

K

X

k=1

ξ

t(k)

[ f

k

( X

t−1

, . . . , X

tp

) + g

k

( X

t−1

, . . . , X

tp

)

t

] t ∈ Z , (1) where

I f

k

: ( E

p

, E

p

) −→ ( E , E ) and g

k

: ( E

p

, E

p

) −→ ( R , B ( R )), with

k ∈ [ K ] := {1, 2, . . . , K }, are arbitrary unknown functions,

I (

t

)

t

are E −valued independent identically distributed (iid) zero-mean innovations, and

I ξ

t(k)

= I

{Qt=k}

, where ( Q

t

)

t

is an iid sequence with values in the finite set of states [ K ], which is independent of the innovations (

t

)

t

.

In particular, if p = ∞, we call this model CHARME with infinite memory (CHARME(∞)).

Weak dependence

Let ( E , k · k) be a Banach space and let h : E −→ R . We define k h k

= sup

xE

| h ( x )| and

Lip( h ) = sup

x6=y

| h ( x ) − h ( y )|

k xy k .

Moreover, we denote by Λ

1

( E ) := { h : E −→ R : Lip( h ) ≤ 1}.

The appropriate notion of weak dependence for the CHARME model was introduced in [1]. It is based on the concept of the coefficient τ defined below.

Def. Let (Ω, A, P ) be a probability space, M a σ -sub-algebra of A and X a

random variable with values in E such that k X k

1

< ∞. The coefficient τ is defined as

τ (M, X ) =

sup

Z

h ( x ) P

X|M

( dx ) − Z

h ( x ) P

X

( dx )

: h ∈ Λ

1

( E )

1

.

Using the definition of this τ coefficient with the σ-algebra M

p

= σ( X

t

, tp )

and the norm k xy k = k x

1

y

1

k + · · · k x

k

y

k

k on E

k

, we can assess the

dependence between the past of the sequence ( X

t

)

tZ

and its future k -tuples through the coefficients

τ

k

( r ) = max

1≤lk

1

l sup{τ (M

p

, ( X

j1

, . . . , X

jl

)) with p + r ≤ j

1

< · · · < j

l

}.

Finally, denoting τ ( r ) := τ

( r ) = sup

k>0

τ

k

( r ), the time series ( X

t

)

tZ

is called τ -weakly dependent if its coefficients τ ( r ) tend to 0 as r tends to infinity.

Deep neural networks (DNN)

Def. Let d , L ∈ N . A deep neural network (architecture) θ with input dimension d and L layers is a sequence of matrix-vector tuples

θ =

( A

(1)

, b

(1)

), ( A

(2)

, b

(2)

), . . . , ( A

(L)

, b

(L)

)

,

where A

(l)

is a N

l

× N

l−1

matrix and b

(l)

∈ R

Nl

, with N

0

= d and N

1

, . . . , N

L

∈ N , the number of neurons for each layer.

If θ is a deep neural network architecture as above and if ϕ : R −→ R is an

arbitrary function, then we define the deep neural network (DNN) associated to θ with activation function ϕ as the map f

θ,ϕ

: R

d

−→ R

NL

such that

f

θ,ϕ

( x ) = x

L

, where x

L

results from the following scheme:

x

0

:= x ,

x

l

:= ϕ( A

l

x

l−1

+ b

l

), for l = 1, . . . , L − 1,

x

L

:= A

L

x

L−1

+ b

L

,

where ϕ acts componentwise, i.e., for y = ( y

1

, . . . , y

N

) ∈ R

N

, ϕ( y ) = (ϕ( y

1

), . . . , ϕ( y

N

)).

Theorem (Stationarity of CHARME models)

Let E

:= {( x

k

)

k>0

E

N

: x

k

= 0 for k > N , for some N ∈ N

} endowed with its product σ −algebra E

N

.

Consider the CHARME(∞) model and denote π

k

= P ( Q

0

= k ), with k = 1, . . . , K . Assume that there exist non-negative real sequences ( a

(i k)

)

i≥1

and ( b

i(k)

)

i≥1

, for

k = 1, 2, . . . , K , such that for any x , yE

, k f

k

( x ) − f

k

( y )k ≤

X

i=1

a

(i k)

k x

i

y

i

k,

| g

k

( x ) − g

k

( y )| ≤

X

i=1

b

i(k)

k x

i

y

i

k, k = 1, . . . , K . (2 ) Denote a ( m ) = 2

m−1

P

K

k=1

π

k

A

mk

+ B

km

k

0

k

mm

, where A

k

= P

i=1

a

i(k)

and

B

k

= P

i=1

b

i(k)

. Then,

1. if a (1) < 1, there exists a τ −weakly dependent strictly stationary solution ( X

t

)

tZ

of (1, with p = ∞) which belongs to L

1

, and such that

τ ( r ) ≤ 2 µ

1

1 − a inf

1≤sr

 a

r/s

+ 1 1 − a

X

i=s+1

a

i

 −→

r→∞

0, (3 )

where µ

1

= P

K

k=1

π

k

(k f

k

(0)k + | g

k

(0)|k

0

k

1

) and

a

i

= P

K

k=1

π

k

a

(i k)

+ b

i(k)

k

0

k

1

.

2. if moreover a ( m ) < 1 for some m ≥ 1, the stat. solution belong to L

m

. Application-Example

Suppose that ( X

t

)

t

is a time series such that

X

t

= f

θ,ϕ

( X

t−1

, . . . , X

tp

) +

t

, (4) where f

θ,ϕ

: R

p

−→ R is a DNN with parameter

θ =

( A

(1)

, b

(1)

), ( A

(2)

, b

(2)

), . . . , ( A

(L)

, b

(L)

)

L

Y

l=1

M

Nl×Nl−1

( R ) × M

Nl×1

( R )

and Lipschitz activation function ϕ. Then, if k

0

k

1

< ∞ and

˜ a = (Lip(ϕ))

L−1

X

(j0,...,jL)∈QL

i=0[Ni] L

Y

l=1

| a

j(l)

ljl−1

| < 1, (5)

the time series ( X

t

)

t

is stationary and belongs to L

1

. Some conclusions

I Under the assumptions of Theorem, the stationary solution of the CHARME(∞) model can be represented as a causal Bernoulli shift (see [2]). Moreover, this solution is the

unique causal Bernoulli shift solution of the model. Therefore, the solution is automatically an ergodic process.

I Under the assumptions of Theorem and additional classical conditions, the least squares estimator for the DNN-based CHARME( p ) model (1) is consistent.

Future work

I To provide a central limit theorem for the least squares estimator of the DNN-based CHARME( p ) model (1) under flexible conditions.

References

[1] J. Dedecker and C. Prieur.

Coupling for τ −dependent sequences and applications.

Journal of Theoretical Probability, 17(4):861–885, October 2004.

[2] P. Doukhan and O. Wintenberger.

Weakly dependent chains with infinite memory.

Stochastic Processes and their Applications, 118:1997–2013, 2008.

[3] JP. Stockis, J. Franke, and J. Tadjuidje Kamgaing.

On geometric ergodicity of CHARME models.

Journal of Time Series Analysis, 31(3):141–152, 2010.

[4] DWK. Andrews.

Non strong mixing autoregressive processes.

Journal of Applied Probability, 21:930–934, 1984.

Acknowledgments

This work has been developed with the help of the Normandy Region.

Contact Information

I Web: https://gomezgarcia.users.lmno.cnrs.fr/

I Email: jose3g@gmail.com; jose-gregorio.gomez-garcia@unicaen.fr

Data Science Summer School (DS3) Saclay, June 24-28, 2019

Références

Documents relatifs

Une collection éditoriale de récits à la présentation standardisée (de Chair de poule à Harelquin) ; une série de romans à personnages récurrents (de Oui Oui à SAS) ;

Quelques-unes de ces unités – et, notamment, les common sayings and proverbs (She is like a rose), conversational routines (Excuse me), varieties of formulaic or

We now consider the response of the climatic oscillator described by eqs. Our purpose is to show that the forcing provides the synchroni- zing element that was

Figure 2: Top: ratio (expressed in %) between total columns retrieved from simulated spectra based on various vertical profiles and the total column obtained with the reference

(oscillation dans l ’espace vide + densité des électrons solaires ). Masse

As an application, from the universal approximation property of neural networks (NN), we de- velop a learning theory for the NN-based autoregressive functions of the model, where

We first present the level set method, and then we derive from the previous analysis an iterative scheme which converges provided that the initial guess is not too far from

In the case p &lt; m, the bound obtained for the numerical blow-up time is inferior to the estimate obtained for the blow-up time of the exact solution.. Properties of