Deep neural network-based CHARME models with infinite memory

(1)

HAL Id: hal-02518028

https://hal.archives-ouvertes.fr/hal-02518028

Submitted on 24 Mar 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Deep neural network-based CHARME models with infinite memory

José Gómez-García, Jalal Fadili, Christophe Chesneau

To cite this version:

José Gómez-García, Jalal Fadili, Christophe Chesneau. Deep neural network-based CHARME models with infinite memory. Data Science Summer School (DS3), Jun 2019, Paris - Saclay, France. �hal- 02518028�

(2)

Deep neural network-based CHARME models with infinite memory

José G. G´ omez-Garc´ıa ⁽¹⁾ , Jalal Fadili ⁽²⁾ and Christophe Chesneau ⁽¹⁾ (1) Lab. of Mathematics Nicolas Oresme (LMNO), Université de Caen Normandie (2) Ecole Nationale Supérieure d’Ingénieurs de Caen (ENSICAEN)

Abstract

We consider a model called CHARME (Conditional Heteroscedastic Autoregressive Mixture of Experts), a class of generalized mixture of nonlinear nonparametric

AR-ARCH times series. Under certain Lipschitz-type conditions on the autoregressive and volatility functions, we prove that this model is τ -weakly dependent in the sense of Dedecker & Prieur (2004) [1], and therefore, ergodic and stationary. This result forms the theoretical basis for deriving an asymptotic theory of the underlying

nonparametric estimation. As application, for the case of a single expert, we use the universal approximation property of the neural networks in order to develop an

estimation theory for the autoregressive function by deep neural networks, where the consistency of the estimator of neurons and bias are guaranteed.

The model

Let ( ^E , k · k) be a Banach space. The conditional heteroscedastic ^p −autoregressive mixture of experts (CHARME( ^p )) model, with values in ^E , is defined by:

X

_t

=

K

X

k=1

ξ

_t⁽^k⁾

[ ^f

_k

( ^X

_t₋₁

, . . . , ^X

_t₋_p

) + ^g

_k

( ^X

_t₋₁

, . . . , ^X

_t₋_p

)

_t

] ^t ∈ Z , (1) where

I ^f

k

: ( ^E

^p

, E

^⊗^p

) −→ ( ^E , E ) and ^g

_k

: ( ^E

^p

, E

^⊗^p

) −→ ( R , B ( R )), with

k ∈ [ ^K ] := {1, 2, . . . , ^K }, are arbitrary unknown functions,

I (

_t

)

_t

are ^E −valued independent identically distributed (iid) zero-mean innovations, and

I ξ

_t⁽^k⁾

= I

^{^Qt=k}

, where ( Q

_t

)

_t

is an iid sequence with values in the finite set of states [ ^K ], which is independent of the innovations (

_t

)

_t

.

In particular, if ^p = ∞, we call this model CHARME with infinite memory (CHARME(∞)).

Weak dependence

Let ( ^E , k · k) be a Banach space and let ^h : ^E −→ R . We define k ^h k

_∞

= sup

_x_∈_E

| ^h ( x )| and

Lip( ^h ) = sup

x6=y

| ^h ( ^x ) − ^h ( ^y )|

k ^x − ^y k .

Moreover, we denote by Λ

₁

( ^E ) := { ^h : ^E −→ R : Lip( ^h ) ≤ 1}.

The appropriate notion of weak dependence for the CHARME model was introduced in [1]. It is based on the concept of the coefficient τ defined below.

Def. Let (Ω, A, P ) be a probability space, M a σ -sub-algebra of A and ^X a

random variable with values in ^E such that k ^X k

₁

< ∞. The coefficient τ is defined as

τ (M, X ) =

sup

Z

h ( x ) P

^X^|M

( dx ) − Z

h ( x ) P

^X

( dx )

: h ∈ Λ

₁

( E )

1

.

Using the definition of this τ coefficient with the σ-algebra M

_p

= σ( ^X

_t

, ^t ≤ ^p )

and the norm k ^x − ^y k = k ^x

₁

− ^y

₁

k + · · · k ^x

_k

− ^y

_k

k on ^E

^k

, we can assess the

dependence between the past of the sequence ( ^X

_t

)

_t_∈_Z

and its future ^k -tuples through the coefficients

τ

_k

( r ) = max

1≤l≤k

1 l sup{τ (M

_p

, ( X

_j₁

, . . . , X

_j_l

)) with p + r ≤ j

₁

< · · · < j

_l

}.

Finally, denoting τ ( ^r ) := τ

_∞

( ^r ) = sup

_k_>0

τ

_k

( ^r ), the time series ( ^X

_t

)

_t_∈_Z

is called τ -weakly dependent if its coefficients τ ( ^r ) tend to 0 as ^r tends to infinity.

Deep neural networks (DNN)

Def. Let ^d , ^L ∈ N . A deep neural network (architecture) θ with input dimension ^d and L layers is a sequence of matrix-vector tuples

θ =

( A

⁽¹⁾

, b

⁽¹⁾

), ( A

⁽²⁾

, b

⁽²⁾

), . . . , ( A

⁽^L⁾

, b

⁽^L⁾

)

,

where ^A

⁽^l⁾

is a ^N

_l

× ^N

_l₋₁

matrix and ^b

⁽^l⁾

∈ R

^N^l

, with ^N

₀

= ^d and ^N

₁

, . . . , ^N

_L

∈ N , the number of neurons for each layer.

If θ is a deep neural network architecture as above and if ϕ : R −→ R is an

arbitrary function, then we define the deep neural network (DNN) associated to θ with activation function ϕ as the map ^f

_θ,ϕ

: R

^d

−→ R

^N^L

such that

f

_θ,ϕ

( ^x ) = ^x

_L

, where ^x

_L

results from the following scheme:

x

₀

:= ^x ,

x

_l

:= ϕ( ^A

_l

^x

_l₋₁

+ ^b

_l

), for ^l = 1, . . . , ^L − 1,

x

_L

:= A

_L

x

_L₋₁

+ b

_L

,

where ϕ acts componentwise, i.e., for y = ( y

₁

, . . . , y

_N

) ∈ R

^N

, ϕ( ^y ) = (ϕ( ^y

₁

), . . . , ϕ( ^y

_N

)).

Theorem (Stationarity of CHARME models)

Let ^E

^∞

:= {( ^x

_k

)

_k_>0

∈ ^E

^N

: ^x

_k

= 0 for ^k > ^N , for some ^N ∈ N

^∗

} endowed with its product σ −algebra E

^⊗^N

.

Consider the CHARME(∞) model and denote π

_k

= P ( ^Q

₀

= ^k ), with ^k = 1, . . . , ^K . Assume that there exist non-negative real sequences ( ^a

⁽_i ^k⁾

)

_i_≥1

and ( ^b

_i⁽^k⁾

)

_i_≥1

, for

k = 1, 2, . . . , ^K , such that for any ^x , ^y ∈ ^E

^∞

, k ^f

_k

( ^x ) − ^f

_k

( ^y )k ≤

∞

X

i=1

a

⁽_i ^k⁾

k ^x

_i

− ^y

_i

k,

| ^g

_k

( ^x ) − ^g

_k

( ^y )| ≤

∞

X

i=1

b

_i⁽^k⁾

k ^x

_i

− ^y

_i

k, ^k = 1, . . . , ^K . (2 ) Denote ^a ( ^m ) = 2

^m⁻¹

P

K

k=1

π

_k

^A

^m_k

+ ^B

_k^m

k

₀

k

^m_m

, where ^A

_k

= P

∞

i=1

a

_i⁽^k⁾

and

B

_k

= P

∞

i=1

b

_i⁽^k⁾

. Then,

1. if ^a (1) < 1, there exists a τ −weakly dependent strictly stationary solution ( ^X

_t

)

_t_∈_Z

of (1, with ^p = ∞) which belongs to L

¹

, and such that

τ ( r ) ≤ 2 µ

₁

1 − ^a inf

1≤s≤r



 a

^r^/^s

+ 1 1 − ^a

∞

X

i=s+1

a

_i



 −→

r→∞

0, (3 )

where µ

₁

= P

K

k=1

π

_k

(k ^f

_k

(0)k + | ^g

_k

(0)|k

₀

k

₁

) and

a

_i

= P

K

k=1

π

_k

a

⁽_i ^k⁾

+ ^b

_i⁽^k⁾

k

₀

k

₁

.

2. if moreover ^a ( ^m ) < 1 for some ^m ≥ 1, the stat. solution belong to L

^m

. Application-Example

Suppose that ( ^X

_t

)

_t

is a time series such that

X

_t

= ^f

_θ,ϕ

( ^X

_t₋₁

, . . . , ^X

_t₋_p

) +

_t

, (4) where ^f

_θ,ϕ

: R

^p

−→ R is a DNN with parameter

θ =

( A

⁽¹⁾

, b

⁽¹⁾

), ( A

⁽²⁾

, b

⁽²⁾

), . . . , ( A

⁽^L⁾

, b

⁽^L⁾

)

∈

L

Y

l=1

M

^Nl×N_l₋₁

( R ) × M

^Nl×1

( R )

and Lipschitz activation function ϕ. Then, if k

₀

k

₁

< ∞ and

˜ a = (Lip(ϕ))

^L⁻¹

X

(j₀,...,j_L)∈QL

i=0[N_i] L

Y

l=1

| ^a

_j⁽^l⁾

lj_l₋₁

| < 1, (5)

the time series ( X

_t

)

_t

is stationary and belongs to L

¹

. Some conclusions

I Under the assumptions of Theorem, the stationary solution of the CHARME(∞) model can be represented as a causal Bernoulli shift (see [2]). Moreover, this solution is the

unique causal Bernoulli shift solution of the model. Therefore, the solution is automatically an ergodic process.

I Under the assumptions of Theorem and additional classical conditions, the least squares estimator for the DNN-based CHARME( ^p ) model (1) is consistent.

Future work

I To provide a central limit theorem for the least squares estimator of the DNN-based CHARME( p ) model (1) under flexible conditions.

References

[1] J. Dedecker and C. Prieur.

Coupling for τ −dependent sequences and applications.

Journal of Theoretical Probability, 17(4):861–885, October 2004.

[2] P. Doukhan and O. Wintenberger.

Weakly dependent chains with infinite memory.

Stochastic Processes and their Applications, 118:1997–2013, 2008.

[3] JP. Stockis, J. Franke, and J. Tadjuidje Kamgaing.

On geometric ergodicity of CHARME models.

Journal of Time Series Analysis, 31(3):141–152, 2010.

[4] DWK. Andrews.

Non strong mixing autoregressive processes.

Journal of Applied Probability, 21:930–934, 1984.

Acknowledgments

This work has been developed with the help of the Normandy Region.

Contact Information

I Web: https://gomezgarcia.users.lmno.cnrs.fr/

I Email: [email protected]; [email protected]

Data Science Summer School (DS3) Saclay, June 24-28, 2019

Deep neural network-based CHARME models with infinite memory