Efficient robust sequential analysis for autoregressive big data models

(1)

HAL Id: hal-03154295

https://hal.archives-ouvertes.fr/hal-03154295

Preprint submitted on 28 Feb 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Eﬀicient robust sequential analysis for autoregressive big data models

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenshchikov

To cite this version:

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenshchikov. Eﬀicient robust sequential analysis for

autoregressive big data models. 2021. �hal-03154295�

(2)

Efficient robust sequential analysis for autoregressive big data models ^∗

Ouerdia Arkoun ^† , Jean-Yves Brua ^‡ and Serguei Pergamenshchikov ^§

Abstract

In this paper we study high dimension models based on dependent observations defined through autoregressive processes. For such models we study the efficient robust estimation problem in adaptive settings, i.e. in the case when the nonparametric regularity is unknown. To this end we use the sequential model selection procedures proposed in [4]. First, through the Van Trees inequality, we obtain the sharp lower bound for robust risks in explicit form, i.e. the fa- mous Pinsker’s constant (see [28] for example). Then, through sharp non asymptotic oracle inequalities for robust risks, we show that the upper bound for the robust risk of the proposed model selection sequential procedure coincides with the obtained Pinsker constant, i.e.

this means that this procedure is efficient in the minimax sens.

MSC: primary 62G08, secondary 62G05

Keywords: Non-parametric estimation; Non parametric autoregresion; Non- asymptotic estimation; Robust risk; Model selection; Sharp oracle inequalities.

∗

This research was supported by RSF, project no 20-61-47043 (National Research Tomsk State University, Russia).

†

Sup’Biotech, Laboratoire BIRL, 66 Rue Guy Moquet, 94800 Villejuif, France and Laboratoire de Math´ ematiques Raphael Salem, Normandie Universit´ e, UMR 6085 CNRS- Universit´ e de Rouen, France, e-mail: ouerdia.arkoun@gmail.com

‡

Laboratoire de Math´ ematiques Raphael Salem, Normandie Universit´ e, UMR 6085 CNRS- Universit´ e de Rouen, France, e-mail : jean-yves.brua@univ-rouen.fr

§

Laboratoire de Math´ ematiques Raphael Salem, UMR 6085 CNRS- Universit´ e de Rouen Normandie, France and International Laboratory of Statistics of Stochastic Pro- cesses and Quantitative Finance, National Research Tomsk State University, e-mail:

Serge.Pergamenshchikov@univ-rouen.fr

(3)

1 Introduction

1.1 Problem and motivations

In this paper we consider the following autoregresive model y

_j

=

q

X

i=1

θ

_i

ψ

_i

(x

_j

)y

_j−1

+ ξ

_j

, x

_j

= a + (b − a)j

n , (1.1)

where the functions (ψ

_i

)

_1≤i≤q

are known linear independent functions, a < b are fixed known constants, 1 ≤ j ≤ n, the initial value y

₀

is a constant and the noise (ξ

_j

)

_j≥1

is i.i.d. sequence of unobservable random variables with Eξ

₁

= 0 and Eξ

²₁

= 1. In the sequel we denote by p the distribution density of the random variable ξ

₁

.

The problem is to estimate the unknown parameters (θ

_j

)

_1≤j≤q

in the high dimension setting, i.e. when the number of parameters q > n. Usually, in these cases for statistical models with independent observations one uses one of two methods: Lasso algorithm (see, for example, [27]) or the Dantzig selector method proposed in [8]. For dependent observations such models were studied, for example, in [11] and [12]. But in all these papers the number of parameters q is known and, therefore, unfortunately, these methods can’t be used to estimate, for example, the number of parameters q. In this paper similar to [18, 19] we study this problem in nonparametric setting, i.e. we consider the observations (1.1) defined as

y

_j

= S(x

_j

)y

_j−1

+ ξ

_j

, (1.2) where S(·) ∈ l

₂

[a, b] is unknown function. The nonparametric setting allows to consider the models (1.1) with unknown q or with q = +∞. Note that the case when the number of parameters q is unknown is one of challenging problems in the signal and image processing theory (see, for example, [7, 5]).

Therefore, the problem is now to estimate the function S(·) in the model on the basis of the observations (1.2) under the condition that the noise distribution p is unknown and belongs to some noise distributions class P . There is a number of papers which consider these models such as [9], [10]

and [6]. In all these papers, the authors propose some asymptotic (as n →

∞) methods for different identification studies without considering optimal

estimation issues. Firstly, minimax estimation problems for the model (1.2)

has been treated in [2] and [25] in the nonadaptive case, i.e. for the known

regularity of the function S. Then, in [1] and [3] it is proposed to use the

sequential analysis method for the adaptive pointwise estimation problem in

the case when the H¨ older regularity is unknown.

(4)

In this paper we consider the adaptive estimation problem for the quadratic risk defined as

R

_p

( S b

_n

, S) = E

_p,S

k S b

_n

− Sk

²

, kSk

²

= Z

b

a

S

²

(x)dx , (1.3) where S b

_n

is an estimator of S based on observations (y

_j

)

_1≤j≤n

and E

_p,S

is the expectation with respect to the distribution law P

_p,S

of the process (y

_j

)

_1≤j≤n

given the distribution density p and the function S. Moreover, taking into account that the distribution p is unknown, we use the robust nonparametric estimation method developed in [13]. To this end we set the robust risk as

R

^∗

( S b

_n

, S) = sup

p∈P

R

_p

( S b

_n

, S) , (1.4) where P is a family of the distributions defined in Section 2.

1.2 Main tools

To estimate the function S in model (1.2) we make use of the model selection procedures proposed in [4]. These procedures are based on the family of the optimal pointwise truncated sequential estimators from [3] for which a sharp oracle inequality is shown, using the model selection method developed in [14]. In this paper, using this inequality, we show that the model selection procedure is efficient in adaptive setting for the robust quadratic risk (1.4).

To this end, first of all, we have to study the sharp lower bound for the

these risks, i.e. we have to provide the best potential accuracy estimation

for the model (1.2) which is called the Pinsker constant. For this we use the

approach proposed in [15, 16] which is based on the Van-Trees inequality. It

turns out that for the model (1.2) the Pinsker constant has the same form as

for the filtration signal problem in the ”signal - white noise” model studied

in [28] but with new coefficient which equals to the optimal variance given

by the Hajek-Le Cam inequality for the parametric model (1.2). This is the

new result in the efficient non parametric estimation theory for the statistical

models with dependent observations. Then, using the oracle inequality from

[4] and the weight least square estimation method we show that, for the

model selection procedure with the Pinsker weight coefficients, the upper

bound asymptotically coincides with the obtained Pinsker constant without

using the regularity properties of the unknown functions, i.e. it is efficient in

adaptive setting with respect to the robust risk (1.4).

(5)

1.3 Plan of the paper

The paper is organized as follows. In Section 2 we give all conditions and construct the sequential pointwise estimation procedures which allows us to pass from the autoregression model to the corresponding regression model.

We construct in Section 3 the model selection procedure based on the sequential estimators from Section 2. Then we announce the main results in Section 4. In Section 5 we show the Van-Trees inequality for the model (1.2).

We obtain the lower bound for the robust risk in section 6 and we get, in Section 7, the upper bound for the robust risk of the constructed sequential estimator. In Appendix A we give the all auxiliary and technic tools.

2 Sequential procedure

As in [3] we assume that in the model (1.2) the i.i.d. random variables (ξ

_j

)

_j≥1

have a density p (with respect to the Lebesgue measure) from the functional class P defined as

P :=

p ≥ 0 : Z

+∞

−∞

p(x) dx = 1 ,

Z

+∞

−∞

x p(x) dx = 0 , Z

+∞

−∞

x

²

p(x) dx = 1 and sup

l≥1

R

+∞

−∞

|x|

^2l

p(x) dx ς

^k

(2l − 1)!! ≤ 1

)

, (2.1) where ς ≥ 1 is some fixed parameter, which may be a function of the number observation n, i.e. ς = ς (n), such that for any b > 0

lim

n→∞

ς (n)

n

^b

= 0 . (2.2)

Note that the (0, 1)-Gaussian density belongs to P . In the sequel we denote this density by p

₀

. It is clear that for any b > 0

m

^∗_b

= sup

p∈P

E

_p

|ξ

₁

|

^b

< ∞ , (2.3) where E

_p

is the expectation with respect to the density p from P . To obtain the stable (uniformly with respect to the function S ) model (1.2), we assume that for some fixed 0 < < 1 and L > 0 the unknown function S belongs to the -stability set introduced in [3] as

Θ

_,L

= n

S ∈ C

₁

([a, b], R ) : |S|

_∗

≤ 1 − and | S| ˙

_∗

≤ L o

, (2.4)

(6)

where C

₁

[a, b] is the Banach space of continuously differentiable [a, b] → R functions and |S|

_∗

= sup

_a≤x≤b

|S(x)|.

We will use as a basic procedure the pointwise procedure from [3] at the points (z

_l

)

_1≤l≤d

defined as

z

_l

= a + l

d + 1 (b − a) , (2.5)

where d is an integer value function of n, i.e. d = d

_n

, such that lim

n→∞

d

_n

√ n = 1 . (2.6)

So we propose to use the first ι

_l

observations for the auxiliary estimation of S(z

_l

). We set

S b

_l

= 1 A

_ι

l

ι_l

X

j=1

Q

_l,j

y

_j−1

y

_j

, A

_ι

l

=

ι_l

X

j=1

Q

_l,j

y

_j−1²

, (2.7) where Q

_l,j

= Q(u

_l,j

) and the kernel Q(·) is the indicator function of the interval [−1; 1], i.e. Q(u) = 1

_[−1,1]

(u). The points (u

_l,j

) are defined as

u

_l,j

= x

_j

− z

_l

h , (2.8)

where the bandwidth h will be chosen later such that a < z

₁

− h < z

_d

+h < b, i.e. h < (b − a)/d.

Note that to estimate S(z

_l

) on the basis of the kernel estimate with the kernel Q we use only the observations (y

_j

)

_k

1,l≤j≤k_2,l

from the h-neighborhood of the point z

_l

, i.e.

k

_1,l

= bn e z

_l

− ne hc and k

_2,l

= [n z e

_l

+ ne h] , (2.9) where bxc is the smallest integer greater than x ∈ R , [x] is the integer part of x, z e

_l

= (z

_l

− a)/(b − a) and e h = h/(b − a).

Moreover, we assume here that

n > e h

⁻¹

. (2.10)

Now we chose ι

_l

in (2.7) as

ι

_l

= k

_1,l

+ q and q = q

_n

= [(ne h)

^µ⁰

] (2.11)

(7)

for some 0 < µ

₀

< 1. In the sequel for any 0 ≤ k < m ≤ n we set A

_k,m

=

m

X

j=k+1

Q

_l,j

y

²_j−1

and A

_m

= A

_0,m

. (2.12) Next, similarly to [1], we use some kernel sequential procedure based on the observations (y

_j

)

_ι

l≤j≤n

To transform the kernel estimator in the linear function of observations and we replace the number of observations n by the following stopping time

τ

_l

= inf{ι

_l

+ 1 ≤ k ≤ k

_2,l

: A

_ι

l,k

≥ H

_l

} , (2.13) where inf {∅} = k

_2,l

and the positive threshold H

_l

will be chosen as a positive random variable measurable with respect to the σ-field {y

₁

, . . . , y

_ι

l

}.

Now we define the sequential estimator as

S

_l^∗

= 1 H

_l





τ_l−1

X

j=ι_l+1

Q

_l,j

y

_j−1

y

_j

+ κ

l

Q(u

_l,τ

l

) y

_τ

l−1

y

_τ

l



 1

_Γ

l

, (2.14) where Γ

_l

= {A

_ι

l,k_2,l−1

≥ H

_l

} and the correcting coefficient 0 < κ

l

≤ 1 on this set is defined as

A

_ι

l,τ_l−1

+ κ

_l²

Q(u

_l,τ

l

)y

_τ²

l−1

= H

_l

. (2.15)

We set here A

_ι

l,τ_l−1

= 0 if τ

_l

= ι

_l

+ 1. Note that, to obtain the efficient kernel estimate of S(z

_l

), we need to use the all k

_2,l

− ι

_l

observations.

Similarly to [21], one can show that

τ

_l

≈ (1 − S

²

(z

_l

)) H

_l

as H

_l

→ ∞ . (2.16) Therefore, it will be natural to choose H

_l

as (k

_2,l

− ι

_l

)/(1 − S

²

(z

_l

).

Since the value S(z

_l

) is unknown, we define the threshold H

_l

as H

_l

= 1 − e

1 − S e

_l²

(k

_2,l

− ι

_l

) and e = 1

2 + ln n , (2.17) where S e

_l

is the projection of the estimator S b

_l

in ] − 1 + e , 1 − e [, i.e.

S e

_l

= min(max( S b

_l

, −1 + e ), 1 − e ) . (2.18) To obtain the uncorrelated stochastic terms in the kernel estimators for S(z

_l

) we choose the bandwidth h as

h = b − a

2d . (2.19)

As to the estimator S b

_l

, we can show the following property.

(8)

Proposition 2.1. The convergence rate in probability of the estimator (2.18) is more rapid than any power function, i.e. for any b > 0

lim

n→∞

n

^b

max

1≤l≤d

sup

S∈Θ_,L

sup

p∈P

P

_p,S

| S e

_l

− S(z

_l

)| >

₀

= 0 , (2.20) where

₀

=

₀

(n) → 0 as n → ∞ such that lim

_n→∞

n

^δ^ˇ

₀

= ∞ for any δ > ˇ 0.

Now we set

Y

_l

= S

_l^∗

1

_Γ

and Γ = ∩

^d_l=1

Γ

_l

. (2.21) Using the convergence (2.20) we study the probability properties of the set Γ in the following proposition.

Proposition 2.2. For any b > 0 the probability of the set Γ satisfies the following asymptotic equality

lim

n→∞

n

^b

sup

S∈Θ_,L

P

_p,S

(Γ

^c

) = 0 . (2.22) In view of this proposition we can negligible the set Γ

^c

. So, using the estimators (2.21) on the set Γ we obtain the discrete time regression model

Y

_l

= S(z

_l

) + ζ

_l

and ζ

_l

= ξ

_l^∗

+ $

_l

, (2.23) in which

ξ

_l^∗

=

P

τ_l−1

j=ι_l+1

Q

_l,j

y

_j−1

ξ

_j

+ κ

l

Q(u

_l,τ

l

) y

_τ

l−1

ξ

_τ

l

H

_l

(2.24)

and $

_l

= $

_1,l

+ $

_2,l

, where

$

_1,l

=

P

τ_l−1

j=ι_l+1

Q

_l,j

y

_j−1²

∆

_l,j

+ κ

_l²

Q(u

_l,τ

l

) y

_τ²

l−1

∆

_l,τ

l

H

_l

, ∆

_l,j

= S(x

_j

) − S(z

_l

) and

$

_2,l

= ( κ

l

− κ

_l²

) Q(u

_l,τ

l

) y

_τ²

l−1

S(x

_τ

l

)

H

_l

.

Note that in the model (2.23) the random variables (ξ

_j^∗

)

_1≤j≤d

are defined only on the set Γ. By the technical reasons we need the definitions for these variables on the set Γ

^c

was well. To this end for any j ≥ 1 we set

Q ˇ

_l,j

= Q

_l,j

y

_j−1

1

_{j<k

2,l}

+ p

H

_l

Q

_l,j

1

_{j=k

2,l}

(2.25) and ˇ A

_ι

l,m

= P

m

j=ι_l+1

Q ˇ

²_l,j

. Note, that for any j ≥ 1 and l 6= m

Q ˇ

_l,j

Q ˇ

_m,j

= 0 , (2.26)

(9)

and ˇ A

_ι

l,k_2,l

≥ H

_l

. So we can modify now the stopping time (2.13) as ˇ

τ

_l

= inf{k ≥ ι

_l

+ 1 : ˇ A

_ι

l,k

≥ H

_l

} . (2.27) Obviously, ˇ τ

_l

≤ k

_2,l

and ˇ τ

_l

= τ

_l

on the set Γ for any 1 ≤ l ≤ d. Now similarly to (2.15) we define the correction coefficient ˇ κ

l

as

A ˇ

_ι

l,ˇτ_l−1

+ ˇ κ

_l²

Q ˇ

²_l,ˇ_τ

l

= H

_l

. (2.28)

It is clear that 0 < κ ˇ

l

≤ 1 and ˇ κ

l

= κ

l

on the set Γ for 1 ≤ l ≤ d. Using this coefficient we set

η

_l

=

P

ˇτ_l−1

j=ι_l+1

Q ˇ

_l,j

ξ

_j

+ ˇ κ

l

Q ˇ

_l,_τ_ˇ

l

ξ

_τ_ˇ

l

H

_l

. (2.29)

Note that on the set Γ for any 1 ≤ l ≤ d the random variables η

_l

= ξ

_l^∗

. Moreover (see Lemma A.2 in [4]), for any 1 ≤ l ≤ d and p ∈ P

E

_p,S

(η

_l

|G

_l

) = 0 , E

_p,S

η

_l²

|G

_l

= σ

_l²

and E

_p,S

η

_l⁴

|G

_l

≤ mσ ˇ

_l⁴

, (2.30) where σ

_l

= H

_l^−1/2

, G

_l

= σ{η

₁

, . . . , η

_l−1

, σ

_l

} and ˇ m = 4(144)

⁴

m

^∗₄

. Note that

σ

_0,∗

≤ min

1≤l≤d

σ

_l²

≤ max

1≤l≤d

σ

_l²

≤ σ

_1,∗

, (2.31) where

σ

_0,∗

= 1 −

²

2(1 − e )nh and σ

_1,∗

= 1

(1 − e )(2nh − q − 3) .

Now, taking into account that |$

_1,l

| ≤ Lh for any S ∈ Θ

_,L

, we obtain that sup

S∈Θ_,L

E

_p,S

1

_Γ

$

²_l

≤

L

²

h

²

+ υ ˇ

_n

(nh)

²

, (2.32)

where ˇ υ

_n

= sup

_p∈P

sup

_S∈Θ

,L

E

_p,S

max

_1≤j≤n

y

⁴_j

. As it is shown in [4] (The- orem 3.3) for any b > 0

lim

n→∞

n

^−b

υ ˇ

_n

= 0 . (2.33)

Remark 2.1. It should be noted that the property (2.33) means that the asymptotic behavior of the upper bound (2.32) approximately almost as h

⁻²

when n → ∞.

Remark 2.2. Note that to estimate the function S in (1.2), we use the

approach developed in [16] for the diffusion processes. To this end we use

the efficient sequential kernel procedures developed in [1, 2, 3]. It should be

emphasized that to obtain an efficient estimator, i.e. an estimator with the

minimal asymptotic risk, one needs to take only indicator kernel as in (2.14).

(10)

Remark 2.3. It should also be noted that the sequential estimator (2.14) has the same form as in [3], but except the last term, in which the correction coefficient is replaced by the square root of the coefficient used in [20]. We modify this procedure to calculate the variance of the stochastic term (2.24).

3 Model selection

Now to estimate the function S we use the sequential model selection procedure from [4] for the regression (2.23). To this end, first we chose the trigonometric basis (φ

_j

)

_j≥₁

in l

₂

[a, b], i.e.

φ

₁

= 1

√ b − a , φ

_j

(x) = r 2

b − a Tr

_j

(2π[j/2]l

₀

(x)) , j ≥ 2 , (3.1) where the function Tr

_j

(x) = cos(x) for even j and Tr

_j

(x) = sin(x) for odd j , and l

₀

(x) = (x − a)/(b − a). Moreover, we chose the odd number d of regression points (2.5), for example, d = 2[ √

n/2] + 1. Then the functions (φ

_j

)

_1≤j≤d

are orthonormal for the empirical inner product, i.e.

(φ

_i

, φ

_j

)

_d

= b − a d

d

X

l=1

φ

_i

(z

_l

)φ

_j

(z

_l

) = 1

_{i=j}

. (3.2) It is clear that, the function S can be represented as

S(z

_l

) =

d

X

j=1

θ

_j,d

φ

_j

(z

_l

) and θ

_j,d

= S, φ

_j

d

. (3.3)

We define the estimators for the coefficients (θ

_j,d

)

_1≤j≤d

as

θ b

_j,d

= b − a d

d

X

l=1

Y

_l

φ

_j

(z

_l

) . (3.4) From (2.23) we obtain immediately the following regression on the set Γ

b θ

_j,d

= θ

_j,d

+ ζ

_j,d

with ζ

_j,d

=

r b − a

d η

_j,d

+ $

_j,d

, (3.5) where

η

_j,d

=

r b − a d

d

X

l=1

η

_l

φ

_j

(z

_l

) and $

_j,d

= b − a d

d

X

l=1

$

_l

φ

_j

(z

_l

) .

(11)

Through the Bounyakovskii-Cauchy-Schwarz we get that

|$

_j,d

| ≤ k$k

_d

kφ

_j

k

_d

= k$k

_d

:= $

^∗_n

n . (3.6)

Moreover, from (2.32) and (2.33) it follows that for any b > 0 lim

n→∞

1 n

^b

sup

p∈P

sup

S∈Θ_,L

E

_p,S

$

^∗_n

1

_Γ

= 0 . (3.7) To construct the model selection procedure we use weighted least squares estimators which for the points (2.5) defined as

S b

_λ

(z

_l

) =

d

X

j=1

λ(j) θ b

_j,d

φ

_j

(z

_l

) 1

_Γ

, 1 ≤ l ≤ d , (3.8) where the weight vector λ = (λ(1), . . . , λ(d))

⁰

belongs to some finite set Λ ⊂ [0, 1]

^d

, the prime denotes the transposition. We set for any a ≤ t ≤ b

S b

_λ

(t) =

d

X

l=1

S b

_λ

(z

_l

)1

_{z

l−1<t≤z_l}

. (3.9)

Denote by ν the cardinal number of the set Λ, for which we impose the following condition.

A

₁

) Assume that the number of the weight vectors ν is a function of n, i.e.

ν = ν

_n

, such that for any b > 0 lim

n→∞

ν

_n

n

^b

= 0 . (3.10)

To choose a weight vector λ ∈ Λ in (3.8) we will use the following risk Err

_d

(λ) = k S b

_λ

− Sk

²_d

= b − a

d

X

l=1

( S b

_λ

(z

_l

) − S(z

_l

))

²

. (3.11) Using (3.3) and (3.8) it can be represented as

Err

_d

(λ) =

d

X

j=1

λ

²

(j)b θ

²_j,d

− 2

d

X

j=1

λ(j)b θ

_j,d

θ

_j,d

+

d

X

j=1

θ

²_j,d

. (3.12)

Since the coefficients θ

_j,d

are unknown we can’t minimize this risk directly to

obtain an optimal weight vector. To modify it we set

(12)

θ e

_j,d

= θ b

_j,d²

− b − a

d s

_j,d

with s

_j,d

= b − a d

d

X

l=1

σ

²_l

φ

²_j

(z

_l

) . (3.13) Note here that in view of (2.31) - (3.2) the last term can be estimated as

σ

_0,∗

≤ s

_j,d

≤ σ

_1,∗

. (3.14) Now, we modify the risk (3.12) as

J

_d

(λ) =

d

X

j=1

λ

²

(j)b θ

_j,d²

− 2

d

X

j=1

λ(j) θ e

_j,d

+ δP

_d

(λ) , (3.15)

where the coefficient 0 < δ < 1 will be chosen later and the penalty term is P

_d

(λ) = b − a

d

X

j=1

λ

²

(j )s

_j,d

(3.16)

Now using (3.15) we define the sequential model selection procedure as λ b = argmin

_λ∈Λ

J

_d

(λ) and S b

_∗

= S b

_b_λ

. (3.17) To study the efficiency property we specify the weight coefficients (λ(j))

_1≤j≤n

as it is proposed, for example, in [15]. First, for some 0 < ε < 1 introduce the two dimensional grid to adapt to the unknown parameters (regularity and size) of the Sobolev ball, i.e. we set

A = {1, . . . , k

^∗

} × {ε, . . . , mε} , (3.18) where m = [1/ε

²

]. We assume that both parameters k

^∗

≥ 1 and ε are functions of n, i.e. k

^∗

= k

^∗

(n) and ε = ε(n), such that



 

 

lim

_n→∞

k

^∗

(n) = +∞ , lim

_n→∞

k

^∗

(n) ln n = 0 , lim

_n→∞

ε(n) = 0 and lim

_n→∞

n

^b

ε(n) = +∞

(3.19)

for any b > 0. One can take, for example, for n ≥ 2 ε(n) = 1

ln n and k

^∗

(n) = k

₀^∗

+ [ √

ln n] , (3.20)

(13)

where k

^∗₀

≥ 0 is some fixed integer number. For each α = (β, t) ∈ A, we introduce the weight sequence

λ

_α

= (λ

_α

(j))

_1≤j≤p

with the elements

λ

_α

(j) = 1

_{1≤j<j

∗}

+ 1 − (j/ω

_α

)

^β

1

_{j

∗≤j≤ω_α}

, (3.21) where j

_∗

= 1 + [ln n] and

ω

_α

=

(β + 1)(2β + 1) π

^2β

β t n

1/(2β+1)

.

Now we define the set Λ as

Λ = {λ

_α

, α ∈ A} . (3.22)

Note that these weight coefficients are used in [23, 24] for continuous time regression models to show the asymptotic efficiency. It will be noted that in this case the cardinal of the set Λ is ν = k

^∗

m. It is clear that the properties (3.19) imply the condition (3.10). In [4] we showed the following result.

Theorem 3.1. Assume that the conditions (2.2) and (3.10) hold. Then for any n ≥ 3, any S ∈ Θ

_,L

and any 0 < δ ≤ 1/12, the procedure (3.17) with the coefficients (3.22) satisfies the following oracle inequality

R

^∗

( S b

∗

, S) ≤ (1 + 4δ)(1 + δ)

²

1 − 6δ min

λ∈Λ

R

^∗

( S b

_λ

, S) + B

^∗_n

δn , (3.23) where the term B

^∗_n

is such that for any b > 0

lim

n→∞

B

^∗_n

n

^b

= 0 .

Remark 3.1. In this paper we will use the inequality (3.23) to study efficiency properties for the model selection procedure (3.17) with the weight coefficients (3.22) in adaptive setting, i.e. in the case when the regularity of the function S (1.2) is unknown.

4 Main results

First, to study the minimax properties for the estimation problem for the

model (1.2) we need to introduce some functional class. To this end for any

(14)

fixed r > 0 and k ≥ 2 we set W

_k,r

=







f ∈ Θ

_,L

:

+∞

X

j=1

a

_j

θ

²_j

≤ r







, (4.1)

where a

_j

= P

k

l=0

(2π[j/2])

^2l

, (θ

_j

)

_j≥1

are the trigonometric Fourier coefficients, i.e.

θ

_j

= Z

b

a

f (x)φ

_j

(x)dx

and (φ

_j

)

_j≥1

is the trigonometric basis defined in (3.1). It is clear that we can represent this functional class as the Sobolev ball in Θ

_,L

, i.e.

W

_k,r

=







f ∈ Θ

_,L

:

k

X

j=0

kf

^(j)

k

²

≤ r







. (4.2)

Now, for this set we define the normalizing coefficeints l

_∗

(r) = ((1 + 2k)r)

^1/(2k+1)

k π(k + 1)

2k/(2k+1)

(4.3) and

ς

_∗

= ς

_∗

(S) = Z

b

a

(1 − S

²

(u))du . (4.4)

It is well known that for nonparametric adaptive estimation problems in regression models with the functions S ∈ W

_k,r

the minimax convergence rate is n

^−2k/(2k+1)

(see, for example, [15, 22] and the references therein). Our goal in this paper is to show the same property for the non parametric autoregressive models (1.2). To this end, first we obtain the lower bound for the normalyzed risk (1.4) in the class of all estimators Ξ

_n

, i.e. any measurable function with respect to the observations σ{y

₁

, . . . , y

_n

}.

Theorem 4.1. For the model (1.2) with the noise distribution from the class P defined in (2.1)

lim inf

n→∞

inf

Sb_n∈Ξ_n

n

^2k/(2k+1)

sup

S∈W_k,r

υ(S)R

^∗

( S b

_n

, S) ≥ l

_∗

(r) , (4.5)

where υ(S) = ς

_∗^−2k/(2k+1)

.

(15)

Now to study the procedure (3.17) we have to add some condition on the penalty coefficient δ which provides sufficiently small penalty term in (3.15).

A

₂

) Assume that the parameter δ is a function of n, i.e. δ = δ

_n

such that lim

n→∞

δ

_n

= 0 and lim

n→∞

δ

_n

n

^b

= 0 (4.6)

for any b > 0.

Theorem 4.2. Assume that the conditions A

₁

) – A

₂

) hold. The model selection procedure S b

_∗

defined in (3.17) with the penalty coefficient satisfying the hypothesis A

₂

) admits the following asymptotic upper bound

lim sup

n→∞

n

^2k/(2k+1)

sup

S∈W_k,r

υ(S) R( S b

_∗

, S) ≤ l

_∗

(r) . (4.7) Corollary 4.3. Assume that the conditions A

₁

) – A

₂

) hold. The model selection procedure S b

_∗

defined in (3.17) with the penalty coefficient satisfying the condition A

₂

) is efficient, i.e.

lim

n→∞

inf

_S_b

n∈Ξ_n

sup

_S∈W

k,r

υ(S) R

^∗

( S b

_n

, S) sup

_S∈W

k,r

υ(S)R( S b

_∗

, S) = 1 . (4.8) Moreover,

lim

n→∞

n

^2k/(2k+1)

sup

S∈W_k,r

υ(S) R( S b

_∗

, S) = l

_∗

(r) . (4.9) Remark 4.1. Note that the limit equalities (4.8) and (4.9) imply that the function l

_∗

(r)/υ(S) is the minimal value of the normalised asymptotic quadratic robust risk, i.e. the Pinsker constant in this case. We remind that the coefficient l

_∗

(r) is the well known Pinsker constant for the ”signal+standard white noise” model obtained in [28]. Therefore, the Pinsker constant for the model (1.2) is represented by the Pinsker constant for the ”signal+white noise” model in which the noise intensity is given by the function (4.4).

5 The van Trees inequality

In this section we consider the nonparametric autoregressive model (1.2) with the (0, 1) gaussian i.i.d. random variable (ξ

_j

)

_1≤j≤n

and the parametric linear function S, i.e.

S

_θ

(x) =

d

X

l=1

θ

_l

ψ

_l

(x) , θ = (θ

₁

, . . . , θ

_d

)

⁰

∈ R

^d

. (5.1)

(16)

We assume that the functions (ψ

_i

)

_1≤i≤d

are orthogonal with respect to the scalar product (3.2).

Let now P

ⁿ_θ

be the distribution in R

ⁿ

of the observations y = (y

₁

, . . . , y

_n

) in the model (1.2) with the function (5.1) and ν

_ξⁿ

be the distribution in R

ⁿ

of the gaussian vector (ξ

₁

, . . . , ξ

_n

). In this case the Radon - Nykodim density is given as

f

_n

(y, θ) = dP

⁽ⁿ⁾_θ

dν

_ξⁿ

= exp







n

X

l=1

S

_θ

(x

_l

)y

_l−1

y

_l

− 1 2

n

X

j=1

S

_θ²

(x

_l

)y

_l−1²







. (5.2) Let u be a prior distribution density on R

^d

for the parameter θ of the following form:

u(θ) =

d

Y

j=1

u

_j

(θ

_j

) ,

where u

_j

is some continuously differentiable probability density in R with the support ] − N

_j

, N

_j

[, i.e. u

_j

(z) > 0 for any −N

_j

< z < N

_j

and u

_j

(z) = 0 for all |z| ≥ N

_j

, such that the Fisher information is finite, i.e.

I

_j

= Z

N_j

−N_j

˙ u

²_j

(z)

u

_j

(z) dz < ∞ . (5.3) Now, we set

Y =] − N

₁

, N

₁

[ × . . . × ] − N

_d

, N

_d

[ ⊆ R

^d

. (5.4) Let g(θ) be a continuously differentiable Y → R function such that, for each 1 ≤ j ≤ d,

lim

|θ_j|→N_j

g(θ) u

_j

(θ

_j

) = 0 and Z

R^d

|g

_j⁰

(θ)| u(θ) dθ < ∞ , (5.5) where g

_j⁰

(θ) = ∂g(θ)/∂θ

_j

.

For any B(X )×B( R

^d

)-measurable integrable function H = H(y, θ) we denote E e H =

Z

Y

Z

Rⁿ

H(y, θ) dP

ⁿ_θ

u(θ)dθ

= Z

Y

Z

Rⁿ

H(y, θ) f

_n

(y, θ) u(θ)dν

_ξ⁽ⁿ⁾

dθ . (5.6)

Let F

_n^y

be the field generated by the observations (1.2), i.e. F

_n^y

= σ{y

₁

, . . . , y

_n

}.

Now we study the Gaussian the model (1.2) with the function (5.1).

(17)

Lemma 5.1. For any F

_n^y

-measurable square integrable R

ⁿ

→ R function b g

_n

and for any 1 ≤ j ≤ d, the mean square accuracy of the function g(·) with respect to the distribution (5.6) can be estimated from below as

E( e b g

_n

− g(θ))

²

≥ g

²_j

E e Ψ

_n,j

+ I

_j

, (5.7) where Ψ

_n,j

= P

n

l=1

ψ

_j²

(x

_l

) y

²_l−1

and g

_j

= R

Y

g

_j⁰

(θ) u(θ) dθ.

Proof. First, for any θ ∈ Y we set U e

_j

= U e

_j

(y, θ) = 1

f

_n

(y, θ)u(θ)

∂ (f

_n

(y, θ)u(θ))

∂θ

_j

.

Taking into account the condition (5.5) and integrating by parts we get E e

( b g

_n

− g(θ)) U e

_j

= Z

Rⁿ×Y

( b g

_n

(y) − g(θ)) ∂

∂θ

_j

(f

_n

(y, θ)u(θ)) dθ dν

_ξ⁽ⁿ⁾

= Z

Rⁿ×Y_j

Z

N_j

−N_j

g

⁰_j

(θ) f

_n

(y, θ)u(θ)dθ

_j

! 

 Y

i6=j

dθ

_i



 dν

_ξ⁽ⁿ⁾

= g

_j

, where

Y

_j

= Y

i6=j

] − N

_i

, N

_i

[ .

Now by the Bouniakovskii-Cauchy-Schwarz inequality we obtain the following lower bound for the quadratic risk

E( e b g

_n

− g(θ))

²

≥ g

²_j

E e U e

_j²

.

To study the denominator on the right hand of this inequality note that in view of the representation (5.2)

1 f

_n

(y, θ)

∂ f

_n

(y, θ)

∂θ

_j

=

n

X

l=1

ψ

_j

(x

_l

) y

_l−1

(y

_l

− S

_θ

(x

_l

)y

_l−1

) .

Therefore, for each θ ∈ R

^d

,

E

⁽ⁿ⁾_θ

1 f

_n

(y, θ)

∂ f

_n

(y, θ)

∂θ

_j

= 0

(18)

and

E

⁽ⁿ⁾_θ

1 f

_n

(y, θ)

∂ f

_n

(y, θ)

∂θ

_j

2

= E

⁽ⁿ⁾_θ

n

X

l=1

ψ

²_j

(x

_l

) y

_l−1²

= E

⁽ⁿ⁾_θ

Ψ

_n,j

. Using the equality

U e

_j

= 1 f

_n

(y, θ)

∂ f

_n

(y, θ)

∂θ

_j

+ 1 u(θ)

∂ u(θ)

∂θ

_j

,

we get

E e U e

_j²

= E e Ψ

_n,j

+ I

_j

,

where the Fisher information I

_j

is defined in (5.3). Hence Lemma 5.1.

Remark 5.1. It should be noted that in the definition of the prior distribution, the bound N

_j

may be equal to infinity either for some 1 ≤ j ≤ d or for all 1 ≤ j ≤ d.

6 Lower bound

First, taking into account that the (0, 1) gaussian density p

₀

belongs to the class (2.1), we obtain that

R

^∗

( S b

_n

, S) ≥ R

_p

0

( S b

_n

, S) . (6.1) Now, using the definition (4.3) we set

d = d

_n

=

"

k + 1 k

n b − a

1/(2k+1)

l

_∗

(r

_ρ

)

#

(6.2) where r

_ρ

= ρr, 0 < ρ < 1 and

l

_∗

(r

_ρ

) = ((1 + 2k)r

_ρ

)

^1/(2k+1)

k π(k + 1)

2k/(2k+1)

= ρ

^1/(2k+1)

l

_∗

(r) . For any vector κ = (κ

_j

)

_1≤j≤d

∈ R

^d

, we set

S

_κ

(x) =

d_n

X

j=1

κ

_j

φ

_j

(x) , (6.3)

(19)

where (φ

_j

)

_1≤j≤d

n

is the trigonometric basis defined in (3.1). To define the bayesian risk we choose a prior distribution on R

^d

as

κ = (κ

_j

)

_1≤j≤d

n

and κ

_j

= s

_j

η

_j^∗

, (6.4) where η

_j^∗

are i.i.d. random variables with the continuously differentiable density ρ

_n

(·) defined in Lemma A.1 with N = ln n and n > e

²

,

s

_j

= s

(b − a)s

^∗_j

n and s

^∗_j

= d

_n

j

k

− 1 .

In the sequel we will denote by µ

_κ

the distribution of the vector (6.4) in R

^d

. It is clear that almost sure the function (6.3) can be bounded as

max

a≤x≤b

|S

_κ

(x)| + | S ˙

_κ

(x)|

≤ c

_∗

ln n

√ n

d_n

X

j=1

j d

_n

j

k/2

:= δ

_n^∗

, (6.5) where

c

_∗

= r 2

b − a

b − a + π b − a .

One can, check directly, that δ

^∗_n

→ 0 as n → ∞ for k ≥ 2. Therefore, for sufficiently large n the function (6.3) belongs to the class (2.4).

Now, for any function f, we denote by h(f ) its projection in L

₂

[a, b] onto the ball W

_r^∗

= {f ∈ l

₂

[a, b] : kfk ≤ r}, i.e.

h(f) = r

max(r , kf k) f .

Since W

_k,r

⊂ W

_r^∗

, we obtain that for any function S ∈ W

_k,r

k S b

_n

− Sk

²

≥ kb h

_n

− Sk

²

with h b

_n

= h( S b

_n

) . Therefore, for sufficiently large n

sup

S∈W_k,r

υ(S) R

_p

0

( S b

_n

, S) ≥ Z

D_n

υ(S

_z

)E

_p

0,S_z

kb h

_n

− S

_z

k

²

µ

_κ

(dz)

≥ υ

_∗

Z

D_n

E

_p

0,S_z

kb h

_n

− S

_z

k

²

µ

κ

(dz) , (6.6) where

D

_n

=







z = (z

₁

, . . . , z

_d

)

⁰

∈ R

^d

:

d

X

j=1

a

_j

z

_j²

≤ r







(20)

and

υ

_∗

= inf

|S|_∗≤δ^∗

n

υ(S) → 1

(b − a)

^2k/(2k+1)

as n → ∞ . Using the distribution µ

_κ

we introduce the following Bayes risk

R e

₀

( S b

_n

) = E e

₀

k S b

_n

− S

_z

k

²

,

where for any random variable measurable ξ with respect to σ{y

₁

, . . . , y

_n

} E e

₀

ξ =

Z

R^d

E

_p

0,S_z

ξ µ

_κ

(dz) . (6.7) Now taking into account that kb h

_n

k

²

≤ r we get

sup

S∈W_k,r

υ(S) R

_p

0

( S b

_n

, S) ≥ υ

_∗

R e

₀

(b h

_n

) − 2 υ

_∗

R

_0,n

(6.8) with

R

_0,n

= Z

D^c_n

(r + kS

_z

k

²

) µ

_κ

(dz) = Z

D^c_n



r +

d

X

j=1

z

²_j



 µ

_κ

(dz) .

In Lemma A.3 we study the last term in this inequality. Now it is easy to see that for any z = (z

₁

, . . . , z

_d

)

⁰

∈ R

^d

kb h

_n

− S

_z

k

²

≥

d_n

X

j=1

( z b

_j

− z

_j

)

²

,

where z b

_j

= R

b

a

b h

_n

(t) φ

_j

(t)dt.

Now, we will use Lemma 5.1 with u

_j

(z) = ρ

_n

(z/s

_j

)/s

_j

and g(θ) = θ

_j

for 1 ≤ l ≤ d. It is clear that this is probability density with the support [−s

_j

ln n , s

_j

ln n] and the Fisher information

I

_j

= s

⁻²_j

J

_n

and J

_n

= Z

lnn

−lnn

( ˙ ρ

_n

(t))

²

ρ

_n

(t) dt .

Therefore, Lemma 5.1 implies that for any 1 ≤ j ≤ d and any F

_n^y

measurable random variable b κ

_j

E e

₀

( b κ

_j

− κ

_j

)

²

≥ 1

E e

₀

Ψ

_n,j

+ s

⁻²_j

J

_n

and Ψ

_n,j

=

n

X

l=1

φ

²_j

(x

_l

)y

_l−1²

.

(21)

Therefore, the Bayes risk can be estimated from below as

R e

₀

(b h

_n

) ≥

d_n

X

j=1

1 E e

₀

Ψ

_n,j

+ s

⁻²_j

J

_n

= b − a n

d_n

X

j=1

1 Ψ

_n,j

+ (s

^∗_j

)

⁻¹

J

_n

, where

Ψ

_n,j

= (b − a)

n E e

₀

Ψ

_n,j

= (b − a) n

n

X

l=1

φ

²_j

(x

_l

)e E

₀

y

²_l−1

. (6.9) Note here that in view of Lemma 5.1 and Lemma A.2 the term J

_n

→ 1 and uniformly over 1 ≤ j ≤ d the term Ψ

_n,j

→ 1 as n → ∞. Therefore, for any 0 < ρ

₁

< 1 and for sufficiently large n ≥ 1 we obtain that

max

1≤j≤d

Ψ

_n,j

≤ 1 + ρ

₁

and J

_n

≤ 1 + ρ

₁

, i.e.

R e

₀

(b h) ≥ b − a (1 + ρ

₁

)n

d_n

X

j=1

s

^∗_j

1 + s

^∗_j

= b − a (1 + ρ

₁

)n

d_n

X

j=1

1 − j

^k

d

^k_n

.

Note here, that lim

d→∞

1 d

d

X

j=1

1 − j

^k

d

^k

= Z

1

0

(1 − t

^k

)dt = k k + 1 . Therefore, for sufficiently large n

R e

₀

(b h) ≥ (b − a)(1 − ρ

₁

)d

_n

(1 + ρ

₁

)n

k k + 1 .

Therefore, using this bound in (6.8) and through the definition (6.2) and Lemma A.3 and the inequality (6.1) we obtain that for any 0 < ρ, ρ

₁

< 1

lim inf

n→∞

inf

Sb_n∈Ξ_n

n

^2k+1^2k

sup

S∈W_k,r

υ(S) R

^∗

( S b

_n

, S) ≥ ρ

^2k+1¹

(1 − ρ

₁

)

1 + ρ

₁

l

_∗

(r) .

Taking here limit as ρ → 1 and ρ

₁

→ 0 we come to the Theorem 4.1.

(22)

7 Upper bound

7.1 Known regularity

We start with the estimation problem for the functions S from W

_k,r

with known parameters k, r and ς

_∗

defined in (4.4). In this case we use the estimator from family (3.22)

S e = S b

αe

and α e = (k, e t

_n

) , (7.1) where e t

_n

= [r(S)/ε] ε, r(S) = r/ς

_∗

and ε = 1/ ln n. Note that for sufficiently large n, the parameter α e belongs to the set (3.18). In this section we obtain the upper bound for the empiric risk (3.11).

Theorem 7.1. The estimator S e constructed on the trigonometric basis satisfies the following asymptotic upper bound

lim sup

n→∞

n

^2k/(2k+1)

sup

S∈W_k,r

υ(S) E

_p,S

k S e − Sk

²_d

1

_Γ

≤ l

_∗

(r) . (7.2) Proof. We denote e λ = λ

αe

and ω e = ω

αe

. Now we recall that the Fourier coefficients on the set Γ

b θ

_j,d

= θ

_j,d

+ ζ

_j,d

with ζ

_j,d

=

r b − a

d η

_j,d

+ $

_j,d

, Therefore, on the set Γ we can represent the empiric squared error as

k S e − Sk

²_d

=

d

X

j=1

(1 − e λ(j ))

²

θ

_j,d²

− 2M

_n

− 2

d

X

j=1

(1 − e λ(j )) e λ(j )θ

_j,d

$

_j,d

+

d

X

j=1

λ e

²

(j) ζ

_j,d²

, where

M

_n

=

r b − a d

d

X

j=1

(1 − e λ(j)) e λ(j)θ

_j,d

η

_j,d

. Now for any 0 < ε < ˇ 1

2

d

X

j=1

(1 − e λ(j))e λ(j)θ

_j,d

$

_j,d

≤ ε ˇ

d

X

j=1

(1 − e λ(j ))

²

θ

²_j,d

+ ˇ ε

⁻¹

d

X

j=1

$

²_j,d

.

(23)

Taking into account here the definition (3.6), we can rewrite this inequality as

2

d

X

j=1

(1 − e λ(j))e λ(j)θ

_j,d

$

_j,d

≤ ε ˇ

d

X

j=1

(1 − e λ(j))

²

θ

_j,d²

+ $

^∗_n

ˇ εn . Therefore,

k S e − Sk

²_d

≤ (1 + ˇ ε)

d

X

j=1

(1 − e λ(j))

²

θ

²_j,d

− 2M

_n

+ $

^∗_n

ˇ εn +

d

X

j=1

e λ

²

(j)ζ

_j,d²

. By the same way we estimate the last term on the right-hand side of this inequality as

d

X

j=1

e λ

²

(j) ζ

_j,d²

≤ (1 + ˇ ε)(b − a) d

d

X

j=1

λ e

²

(j) η

_j,d²

+ (1 + ˇ ε

⁻¹

) $

^∗_n

n .

Thus, on the set Γ we find that for any 0 < ε < ˇ 1

k S e

_n

− Sk

²_d

≤ (1 + ˇ ε)Υ

_n

(S) − 2M

_n

+ (1 + ˇ ε)U

_n

+ 3$

_n^∗

ˇ

εn , (7.3) where

Υ

_n

(S) =

d

X

j=1

(1 − e λ(j))

²

θ

²_j,d

+ ς

_∗

d

²

d

X

j=1

e λ

²

(j ) (7.4) and

U

_n

= 1 d

²

d

X

j=1

λ e

²

(j)

d(b − a)η

_j,d²

− ς

_∗

.

We recall that the variance ς

_∗

is defined in (4.4). In view of Lemma A.8 E

_p,S

M

_n²

≤ σ

_1,∗

(b − a)

d

X

j=1

θ

²_j,d

= σ

_1,∗

(b − a)

d kSk

²_d

≤ σ

_1,∗

(b − a)

²

d ,

where σ

_1,∗

is given in (2.31). Moreover, using that E

_p,S

M

_n

= 0, we get

|E

_p,S

M

_n

1

_Γ

| = |E

_p,S

M

_n

1

_Γc

| ≤ (b − a) s

σ

_1,∗

P

_p,S

(Γ

^c

)

d .

Therefore, Proposition 2.2 yields lim

n→∞

n

^2k/(2k+1)

sup

S∈Θ_,L

|E

_p,S

M

_n

1

_Γ

| = 0 . (7.5)

Now, the property (3.7) Lemma A.7 and Lemma A.9 imply the inequality

(7.2). Hence Theorem 7.1.

(24)

7.2 Unknown smoothness

Theorem 3.1 and Theorem 7.1 imply Theorem 4.2.

Acknowledgements. This paper is supported by the Normandy RIN projects FuMa. The last author was partially supported by the Normandy RIN project MaSyComB and by the Russian Federal Professor program (project no. 1.472.2016/1.4, the Ministry of Education and Science of the Russian Federation)

A Appendix

A.1 Properties of the prior distribution (6.4)

Lemma A.1. For any N > 2 there exists a continuously differentiable probability density ρ

_N

(·) on R with the support on the interval [−N , N], i.e.

ρ

_N

(z) > 0 for −N < z < N and ρ

_N

(z) = 0 for |z| ≥ N, such that for any N > 2 the integral R

N

−N

zρ

_N

(z)dz = 0 and, moreover, Z

N

−N

z

²

ρ

_N

(z)dz → 1 and J

_N

= Z

N

−N

( ˙ ρ

_N

(z))

²

ρ

_N

(z) dz → 1 as N → ∞.

Proof. First we set V (z) = 1

v

_∗

e

⁻^1−z¹²

1

_{|z|≤1}

and v

_∗

= Z

1

−1

e

⁻^1−t¹²

dt . (A.1) It is clear that this function is infinitely times continuously differentiable, such that V (z) > 0 for |z| < 1, V (z) = 0 for |z| ≥ 1 and R

1

−1

V (z)dz = 1.

Using this function, for any N ≥ 2 we define χ

_N

(z) =

Z

1

−1

1

{|z+u|≤N−1}

V (u)du = Z

R

1

_{{|t|≤N−1}}

V (t − z)dt .

Using here the properties of the function (A.1) we can obtain directly that χ

_N

(z) = χ

_N

(−z) for z ∈ R , χ

_N

(z) = 1 for |z| ≤ N − 2, χ

_N

(z) > 0 for N − 2 < |z| < N and χ

_N

(z) = 0 for |z| ≥ N. Moreover, it is clear that the derivative

˙

χ

_N

(z) = − Z

R

1

_{|t|≤N_−1}

V ˙ (t − z)dt = − Z

1

−1

1

{|u+z|≤N−1}

V ˙ (u)du .

(25)

Note here that

| V ˙ (z)| = 2|z|

(1 − z

²

)

²

V (z) ≤ c

_∗

p

V (z) and c

_∗

= sup

|z|≤1

2 p V (z) (1 − z

²

)

²

. Now through the Bunyakovsky - Cauchy - Schwartz inequality we get that

˙

χ

²_N

(z) ≤ 2c

_∗

χ

_N

(z) for |z| < N . (A.2) Now, denoting by ϕ(z) the (0, 1) Gaussian density. we set

ρ

_N

(z) = ϕ(z)χ

_N

(z)

ρ

^∗_N

and ρ

^∗_N

= Z

N

−N

ϕ(t)χ

_N

(t)dt .

It is clear that ρ

_N

(z) is the the continuously differentiable probability density with the support [−N , N] such that for any N the integral R

N

−N

zρ

_N

(z)dz = 0 and for N → ∞

ρ

^∗_N

→ 1 and Z

N

−N

z

²

ρ

_N

(z)dz → 1 . Moreover, note that

J

_N

= 1 ρ

^∗_N

Z

N

−N

˙ ϕ

²

(t)

ϕ(t) dt + 1 ρ

^∗_N

∆

_N

,

where taking into account that the derivative ˙ χ

_N

(z) = 0 for |z| ≤ N − 2, the last term can be represented as

∆

_N

= 2 Z

N−2≤|z|≤N

˙

ϕ(z) ˙ χ

_N

(z)dz + Z

N−2≤|z|≤N

ϕ(z) χ ˙

²_N

(z) χ

_N

(z) dz .

Using here the bound (A.2), it ieasy to conclude that ∆

_N

→ 0 as N → ∞.

Hence Lemma A.1.

Lemma A.2. For any 1 ≤ j ≤ d lim

n→∞

max

1≤j≤n−1

|Ψ

_n,j

− 1| = 0 , (A.3)

where the term Ψ

_n,j

is given in (6.9).

(26)

Proof. First, note that for any l ≥ 1 y

_l

= y

₀

l

Y

i=1

S

_κ

(x

_i

) +

l

X

ι=1 l

Y

i=ι+1

S

_κ

(x

_i

)ξ

_ι

. Therefore,

E e

₀

y

_l²

= y

₀²

E e

₀

l

Y

i=1

S

_κ²

(x

_i

) +

l−1

X

ι=1

E e

₀

l

Y

i=ι+1

S

_κ²

(x

_i

) + 1

and due to (6.5) we obtain that for any n ≥ 1 for which δ

_n^∗

< 1 sup

l≥1

| E e

₀

y

_l²

− 1| ≤ (δ

_n^∗

)

²

y

₀²

+ 1

1 − (δ

^∗_n

)

²

→ 0 as n → ∞ . Taking into account that for any n ≥ 3 and 1 ≤ j ≤ n − 1

(b − a) n

n

X

l=1

φ

²_j

(x

_l

) = 1 , we get (A.3). Hence Lemma A.2.

Lemma A.3. For any b > 0 and 0 < ρ < 1 lim

n→∞

n

^b

R

_0,n

= 0 . (A.4)

where the term R

_0,n

is introduced in (6.8).

Proof. First note, that from the definition of term R

_0,n

in (6.8) through the Bunyakovsky - Cauchy - Schwartz inequality we get that

R

_0,n

≤



r + √ 3

d

X

j=1

s

²_j



 µ

_κ

(D

^c_n

) .

Therefore, to show (A.4) it suffices to check that for any b > 0 lim

n→∞

n

^b

µ

_κ

(D

^c_n

) = 0 . (A.5) To this end note, that from the definition of D

_n

in (6.6) we obtain that

µ

_κ

(D

^c_n

) ≤ P(ζ

_n

> r) ,

Efficient robust sequential analysis for autoregressive big data models

HAL Id: hal-03154295

https://hal.archives-ouvertes.fr/hal-03154295

Preprint submitted on 28 Feb 2021

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires

Eﬀicient robust sequential analysis for autoregressive big data models

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenshchikov

To cite this version:

Ouerdia Arkoun, Jean-Yves Brua, Serguei Pergamenshchikov. Eﬀicient robust sequential analysis for

autoregressive big data models. 2021. �hal-03154295�