Additional File 3 of ”Goodness-of-ﬁt tests and nonparamet- ric adaptive estimation for spike train analysis”: adaptive properties of the Lasso estimate for Hawkes processes Patricia Reynaud-Bouret

(1)

Additional File 3 of ”Goodness-of-fit tests and nonparamet- ric adaptive estimation for spike train analysis”: adaptive properties of the Lasso estimate for Hawkes processes

Patricia Reynaud-Bouret

^∗1

and Vincent Rivoirard

²

and Franck Grammont

¹

and Christine Tuleau- Malot

¹

1Univ. Nice Sophia Antipolis, CNRS, LJAD, UMR 7351, 06100 Nice, France

2CEREMADE UMR CNRS 7534, Universit´e Paris Dauphine, Place du Mar´echal De Lattre De Tassigny, 75775 PARIS Cedex 16, France

Email: Patricia Reynaud-Bouret^∗- [email protected]; Vincent Rivoirard - [email protected]; Franck Grammont - [email protected]; Christine Tuleau-Malot - [email protected];

∗Corresponding author

Theorem 1 of [1] does not explicit particular basis on which the interaction functions are expansed. It is stated in its general form as follows. Note that [1] is a proceedings and therefore no proof has been given of the result in [1]. Ifni.i.d. trials are recorded, each trialicorresponds to the observation ofNi= (N_i⁽¹⁾, ..., N_i^(M)), the multivariate Hawkes process whose intensity is given by the predictable transformation denoted ψi. Furthermore, to each triali, we can associate an intensity λ_i and a contrastγ⁽ⁱ⁾. The global least-squares contrast over the ntrials can also be seen as

γn(f) =

n

X

i=1

γ⁽ⁱ⁾(f). (1)

We use the following notation: for any predictable processes H = (H_i⁽¹⁾, ..., H_i^(M⁾)i=1,...n, K = (K_i⁽¹⁾, ..., K_i^(M))i=1,...,n, set

H•N =

n

X

i=1 M

X

m=1

Z T2

T₁

H_i,t^(m)dN_i^(m)(t), (2)

H⋄K=

n

X

i=1 M

X

m=1

Z T₂ T₁

H_i,t^(m)K_i,t^(m)dt, (3) andH^⋄2=H⋄H.

In general, we use a dictionary Φ of known functions ofHand we only consider linear combinations of functions of Φ for estimatingf^∗:

f_a=X

ϕ∈Φ

a_ϕϕ, for a∈R^Φ. (4)

1

(2)

Then, by linearity ofψ, one can rewrite (1) as

γn(fa) =−2a^′bn+a^′Gna, (5) where for anyϕand ˜ϕin Φ,

(bn)ϕ=ψ(ϕ)•N and (Gn)ϕ,ϕ˜=ψ(ϕ)⋄ψ( ˜ϕ).

Given a vector of positive weightsd, the Lasso estimate of f^∗ is ˜f_n :=fa˜n where ˜a_n is a minimizer of the followingℓ1-penalized least-square contrast:

˜

a_n ∈arg min

a∈R^Φ{−2a^′b_n+a^′G_na+ 2d^′|a|}. (6) Then Theorem 1 of [1] is stated as follows:

Theorem 1. We introduce the following two events:

ΩV,B ={∀ϕ∈Φ, sup

t∈[T₁,T₂],m,i

|ψ^(m)_i,t (ϕ)| ≤B_ϕ and(ψ(ϕ))²•N≤V_ϕ}, for positive deterministic constantsB_ϕ andV_ϕ and

Ωc=n

∀a∈R^Φ, a^′G_na≥c a^′ao

, (7)

for a positive constantc. Letxandεbe strictly positive constants and for all ϕ∈Φ, dϕ=

q

2(1 +ε) ˆVϕ^µx+Bϕx

3 , (8)

with

Vˆ_ϕ^µ= µ

µ−φ(µ) (ψ(ϕ))²•N+ B²_ϕx µ−φ(µ)

for a real number µsuch thatµ > φ(µ), whereφ(µ) = exp(µ)−µ−1. Then, with probability larger than

1−4X

ϕ∈Φ



 log

1 + ^µV_B2^ϕ ϕx

log(1 +ε) + 1



e^−x−P((ΩV,B∪Ωc)^c), the following inequality holds

[ψ( ˜f_n)−λ]^⋄2≤C inf

a∈R^Φ







[ψ(fa)−λ]^⋄2+1 c

X

ϕ∈S(a)

d²_ϕ





 ,

whereC is an absolute positive constant and whereS(a)is the support ofa, i.e. its coordinates with non-zero coefficients.

2

(3)

Proof. We use the notation of [2] and transposition of this notation. First by scaling the data, it is always possible to assume thatA= 1. We have at handn×M point processesNm⁽ⁱ⁾. In the more general case, we need to model eachλ^(m,i), intensity of N_m⁽ⁱ⁾, by a

ψ_f^(m,i)

n (t) =µ^(m,i)+X

ℓ,j

Z t−

−∞

g^(m,i)_ℓ,j (t−u)dN_j^(ℓ)(u),

wherefn belongs toHn which replaces the spaceH:

Hn= (R×L2((0,1])^nM)^nM=n fn=

(µ^(m,i),(g_ℓ,j^(m,i))ℓ=1,...,M, j=1,...,n)m=1,...,M, i=1,...n

:

g_ℓ,j^(m,i)with support in (0,1] andkfnk²=X

m,i

(µ^(m,i))²+X

m,i

X

ℓ,j

Z 1 0

g^(m,i)_ℓ,j (t)²dt <∞





 .

For everyf_n=

(µ^(m,i),(g^(m,i)_ℓ,j )ℓ=1,...,M, j=1,...,n)m=1,...,M, i=1,...n

in Hn, we denote for eachm andi, f_n^(m,i)= (µ^(m,i),(g^(m,i)_ℓ,j )ℓ=1,...,M, j=1,...,n.

In the same way for everyf =

(µ^(m),(g_ℓ,j^(m))ℓ=1,...,M, j=1,...,n

m=1,...,M inH,we denote for eachmandi, f^(m)= (µ^(m),(g_ℓ,j^(m))ℓ=1,...,M, j=1,...,n.

Now our dictionary Φ of Hcan be transformed into a dictionary Φn ofHn by stating that for anyϕin Φ we associate aϕ_n inHn such that for alli, m,ϕ^(m,i)n =ϕ^(m). Therefore it is easy to see that the vectorb of [2] associated to f_n is actually our vectorb_n and that the matrixGof [2] is our matrixG_n. The V and B are in the same way translated and the present result is a pure application of Theorem 2 of [2]

References

1. Reynaud-Bouret P, Rivoirard V, Tuleau-Malot C: Inference of functional connectivity in Neurosciences via Hawkes processes. In 1st IEEE Global Conference on Signal and Information Processing 2013, Austin Texas.

2. Hansen N, Reynaud-Bouret P, Rivoirard V: Lasso and probabilistic inequalities for multivariate point processes.to appear in Bernoulli 2013. [Http://arxiv.org/abs/1208.0570].

3