Adaptive efficient estimation for generalized semi-Markov Big Data models

(1)

HAL Id: hal-03207874

https://hal.archives-ouvertes.fr/hal-03207874

Preprint submitted on 27 Apr 2021

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Adaptive eﬀicient estimation for generalized semi-Markov Big Data models

Vlad Stefan Barbu, Slim Beltaief, Serguei Pergamenchtchikov

To cite this version:

Vlad Stefan Barbu, Slim Beltaief, Serguei Pergamenchtchikov. Adaptive eﬀicient estimation for gen-

eralized semi-Markov Big Data models. 2021. �hal-03207874�

(2)

(will be inserted by the editor)

Adaptive efficient estimation for generalized semi-Markov Big Data models

Vlad Stefan Barbu · Slim Beltaief · Serguei Pergamenchtchikov

Received: date / Revised: date

Abstract In this paper we study generalized semi-Markov high dimension regression models in continuous time observed in fixed discrete time moments.

The generalized semi-Markov process has dependent jumps and, therefore, it is an extension of the semi-Markov regression introduced in Barbu, Beltaief and Pergamenshchikov (2019a). For such models we consider estimation problems in nonparametric setting. To this end we develop model selection procedures for which sharp non-asymptotic oracle inequalities for the robust risks are obtained. Moreover, we give constructive sufficient conditions which provide through the obtained oracle inequalities the adaptive robust efficiency property in minimax sense. It should be noted also that for these results we do not use either sparse conditions or the parameter dimension in the model. As examples, it is considered regression models constructed through spherical symmetric noise impulses and truncated fractional Poisson processes. Numeric Monte-Carlo simulations confirming the theoretical results are given in the supplementary materials.

This research was supported by RSF, project no 20-61-47043 (National Research Tomsk State University, Russia)

V. S. Barbu

Laboratoire de Mathématiques Raphaël Salem, UMR 6085 CNRS-Université de Rouen Nor- mandie, France

E-mail: [email protected] S. Beltaief

ALTEN de Toulouse (https://www.alten.fr), France E-mail: [email protected]

S. M. Pergamenchtchikov

Laboratoire de Mathématiques Raphael Salem, Université de Rouen, Avenue de l’Université, BP.12, 76801 Saint-Etienne-du-Rouvray, France and International Laboratory of Statistics of Stochastic Processes and Quantitative Finance, Tomsk State University, Tomsk, Russia E-mail: [email protected]

(3)

Keywords Regression model·Generalized semi-Markov processes·fractional Poisson processes· Non-asymptotic estimation ·Robust estimation · Model selection · Sharp oracle inequalities · Asymptotic efficiency · Adaptive estimation

1 Introduction

1.1 Motivations

In this paper we study the following linear regression model in continuous time

dy_t=





q

X

j=1

β_ju_j(t)



dt+ dξ_t, 0≤t≤T , (1) where the functions (u_j)_1≤j≤q are known linear independent 1-periodicR→R functions, the duration of observationsT is an integer number and (ξ_t)_t≥0 is an unobservable noise process defined in Section 2. The process (1) is observed only at the fixed time moments

(y_t

j)_0≤j≤n, t_j= j

p and n=pT , (2)

where the observations frequencypis some fixed integer number. We consider the model (1) in the case when the parameter dimension is grater than the number of observations, i.e., q > n. Such models are called big data or high dimension regression in continuous time (see, for example, in Fujimori (2019) for diffusion processes). The problem is to estimate the unknown parameters (β_j)_1≤j≤q on the basis of the observations (2). Usually for such problems one uses either the Lasso algorithm or the Dantzig selector method. It should be emphasized that to apply these methods one needs to assume sparsity conditions which provide the non large (“reasonable”) number of the non- zero unknown parameters and, moreover, the parameter dimension q must be known (see, for example, in Hasttie, Friedman and Tibshirani (2008)). It should be noted also that the case of unknown parameter dimensionqis one of the crucial points in important practical problems such as, for example, signal and image statistical processing (see, for example, Beltaief, Chernoyarov and Pergamenshchikov (2020) and the references therein). In this paper we study the model (1) in the nonparametric setting as the estimation problem for the function

S(t) =

q

X

j=1

β_ju_j(t), i.e.

dyt=S(t)dt+ dξ_t, 0≤t≤T , (3)

(4)

where S is an unknown 1-periodic R → R function from L₂[0,1]. Here, we assume neither sparsity conditions, nor the condition that the parameter dimension is known, i.e., in particular, we can assume that q= +∞. Now the problem is to estimate the unknown functionSin the model (3) on the basis of observations (2). Originally, such problems were considered in the framework

“signal+white noise” models (see, for example, Ibragimov and Khasminskii (1981); Kutoyants (1994); Pinsker (1981)). Later, it was extended to the “color noise” models defined through non Gaussian Ornstein-Uhlenbeck processes Barndorff-Nielsen and Shephard (2001); Konev and Pergamenshchikov (2012, 2015). The problem here is that the dependence defined on the basis of the Ornstein-Uhlenbeck processes disappears very fast, at a geometric rate. This means that such models are asymptotically equivalent to models with independent observations. To keep the dependence in the observations for large time periods for the estimation problem on the complete data in the paper Barbu, Beltaief and Pergamenshchikov (2019a) it is proposed to define the model (3) through semi-Markov processes with jumps. Such models considerably extend the potential applications of statistical results in many important practical fields such as finance, insurance, signals and image processing, reliability, biol- ogy (see, for example, Barbu, Beltaief and Pergamenshchikov (2019a); Barbu and Limnios (2008) and the references therein). In this paper we extend the semi-Markov regression models to the generalized semi-Markov processes by introducing an additional dependence in jump sizes of (ξ_t)_t≥0.

1.2 Methods

In this paper, in order to estimate the functionS, we develop model selection methods using the quadratic risks defined as

R_Q(Sb_T, S) =E_Q,SkSb_T−Sk², kfk²= Z 1

0

f²(s)ds , (4) where Sb_T(·) is some estimate (i.e. any periodical function measurable with respect to the observations σ{y_t

0, . . . y_t

n}) and E_Q,S is the expectation with respect to the distribution P_Q,S of the process (3) corresponding to the unknown noise distributionQin the Skorokhod spaceD[0, T] and to the function S. We assume that this distribution belongs to some distribution familyQ_T specified in Section 2. To study the properties of the estimators uniformly over the noise distribution (what is really needed in practice), we use the robust risk defined as

R^∗_T(Sb_T, S) = sup

Q∈Q_T

R_Q(Sb_T, S). (5) It should be noted that statistical procedures which are optimal in the sense of this risk possess stable mean square accuracy uniformly over all possible admissible noise distributions in the model (3). This means that the corresponding

(5)

statistical optimal algorithms have high noise immunity and, therefore, signif- icantly improve the quality and reliability of statistical inferences obtained on their basis.

To construct model selection procedures on the basis of the discrete data (2) we use the approach proposed in Konev and Pergamenshchikov (2015).

It should be noted that the main analytic tool in this paper is based on the exponential decrease rate of the dependence in Ornstein-Uhlenbeck models, and, therefore, we cannot apply these methods to semi-Markov models, which can retain a dependence in noises for a long time. So, in this paper, to study the estimation problem on the discrete observations (2) for the model (3) with noises defined through semi-Markov processes we develop new methods based on the special renewal theory from Barbu, Beltaief and Pergamenshchikov (2019a); based on these techniques we can analyse the approximation errors in the discrete observations and obtain non asymptotic sharp oracle inequalities.

Moreover, as a consequence, we found constructive sufficient conditions on the observations frequency which provide the robust efficiency for proposed model selection procedures in adaptive setting, i.e. in the case when the regularity properties of the functionS are unknown.

1.3 Main contributions of this paper

In this paper we use for the first time nonparametric adaptive methods for estimation problems in the framework of the big data generalized semi-Markov regression models. To this end we develop model selection procedures and corresponding analytical tools providing, under some constructive sufficient conditions, the optimality in the sharp oracle inequality sense and the robust adaptive efficiency in the minimax sense for the proposed estimators. It turns out that these conditions hold true for important practical cases such as, for example, regression models constructed through truncated fractional Poisson processes introduced in Barbu, Beltaief and Pergamenshchikov (2019b). More- over, in this paper, we extend for the first time the model from Barbu, Beltaief and Pergamenshchikov (2019a) using the generalized semi-Markov models obtained by introducing a dependence structure in the sizes of the jumps. As an example, we use spherically symmetric random variables, which play very important role in many practical applications (see, for example, Fourdrinier and Pergamenshchikov (2007) and the references therein).

1.4 Organization of the paper

The rest of the paper is organized as follows. In Section 2 we state the main conditions under which we consider the model (3). In Section 3 we represent fractional Poisson processes and its main properties. In Section 4 we construct model selection procedures on the basis of weighted least squares estimates.

In Section 5 we state the main results. In section 6 we develop the stochastic

(6)

calculus for the generalized semi-Markov processes. Section 7 gives the proofs of the main results. Some auxiliary tools are given in an Appendix.

2 Main conditions

First, we assume that the noise process (ξ_t)_t≥₀ in the model (3) is defined as ξ_t=%1w_t+%2L_t+%3z_t, (6) where%₁,%₂and%₃are unknown coefficients, (w_t)_t≥₀is a standard Brownian motion, L_t = Rt

0

R

R∗

x(µ(ds,dx)−µ(ds,e dx)), µ(dsdx) is the jump measure with deterministic compensator µ(dse dx) = dsΠ(dx), Π(·) is the L´evy measure onR∗=R\{0}(see, for example Liptser and Shiryaev (1989) for details), with

Π(x²) = 1 and Π(x⁸) < ∞. (7) Here we use the usual notations forΠ(|x|^m) =R

R|z|^mΠ(dz). Note thatΠ(|x|) may be equal to +∞. In this paper we assume that the “dependent part” in the noise (6) is modelled by the generalized semi-Markov process (z_t)_t≥₀ defined as

z_t=

N_t

X

i=1

ζ_i, (8)

where (ζ_i)_i≥₁ are random variables satisfying the following conditions:

C₁)∀i≥1 the expectationsEζ_i= 0,Eζ_i²= 1andsup_l≥1Eζ_l⁴<∞;

C₂)Eζ_iζ_j= 0 for anyi6=j;

C₃) For any 1 ≤ k₁ < k₂ < k₃ < k₄ the random variables (ζ_k

i)_1≤i≤4 are such that Eζ_k^ι¹

1ζ_k^ι²

2ζ_k^ι³

3ζ_k^ι⁴

4 = 0 for any ι₁, . . . , ι₄ ∈ {0,1,2,3} for which 3≤P4

i=1ι_i≤4 and at least one among them is equal to one.

Now we give some examples for the correlation conditions C₁) – C₃). To this end, we first remind the definition of spherically symmetric distribution (see, for example, in Fourdrinier and Pergamenshchikov (2007)). A random vectorζ= (ζ₁, . . . , ζ_d)⁰ is called spherically symmetric if its density inR^dhas the form g(| · |²) for some nonnegative function g. Here the prime denotes the transposition. Note that there is a very important particular case of the spherically symmetric vectors represented by Gaussian mixture distributions.

The vector ζ = (ζ₁, . . . , ζ_d)⁰ is called Gaussian mixture in R^d if it has the spherically symmetric distribution with

g(t) =E 1

(2πs)^d/2e⁻^2s^t² , (9) where s is a non negative random variable. It should be emphasized that in radio-physics such distributions are very popular for statistical signal processing (see, for example, Middleton (1979); Kassam (1988)). Using these defini- tions it is easy to see that the following random variables satisfy the conditions C₁) –C₃):

(7)

– (ζ_j)_j≥₁are i.i.d. random variables satisfying conditionC₁);

– for some d >1 the random vector (ζ₁, . . . , ζ_d)⁰ that has a spherically symmetric distribution in R^d, withEζ₁²= 1,Eζ₁⁴ <∞and the random variables (ζ_j)_{j> d} are independent and satisfying conditionC₁);

– for any d ≥ 1 a random vector (ζ₁, . . . , ζ_d)⁰ is a Gaussian mixture with mixture variable sfor whichE s²= 1 and E s⁴<∞.

In (8) the processN_tis a general counting process defined as N_t=

∞

X

k=1

1{^P^k_l=1^τl≤t} (10) with (τ_l)_l≥₁ an i.i.d. sequence of positive integrated random variables with the distribution η and mean τ = E_Qτ₁ > 0. We assume that the processes (N_t)_t≥0, (Y_i)i≥1 and (L_t)_t≥0 are independent. In the sequel we will use the renewal measure defined as

η=

∞

X

l=1

η^(l), (11)

whereη^(l)is thelth convolution power of the measureη.

Remark 1 Note that in the case when the random variables (ζ_j)_j≥1 are i.i.d.

random variables, then (8) is the semi-Markov process used in Barbu, Beltaief and Pergamenshchikov (2019a).

To use the renewal methods from Barbu, Beltaief and Pergamenshchikov (2019a) we assume that the distribution η has a densityg for which the following conditions hold true.

H₁) Assume that, for any x ∈ R, there exist the finite limits g(x−) = lim_z→x−g(z)andg(x+) = lim_z→x+g(z)and, for any∀K >0,∃δ=δ(K)>0 for which

sup

|x|≤K

Z δ 0

|g(x+t) +g(x−t)−g(x+)−g(x−)|

t dt < ∞.

H₂)∀γ >0 the upper boundsup_z≥0 z^γ|2g(z)−g(z−)−g(z+)| < ∞.

H₃)There existsβ >0 such that R

R+

e^βxg(x) dx <∞.

H₄)∃t^∗>0such that the Fourier transformationbg(θ−it)belongs toL₁(R) for any 0≤t≤t^∗, where bg(z) = (2π)⁻¹ R

Re^izvg(v)dv.

Moreover, to check these conditions we will use the following assumption.

H^∗₄) The density g is two time continuously differentiable on R+ with g(0) = 0and there existsβ >0such thatR+∞

0 e^βx

g(x) +|g⁰(x)|+|g⁰⁰(x)|

dx <

∞andlim_x→∞e^βx

g(x) +|g⁰(x)|

= 0.

(8)

It is clear that the conditions H₁)–H₃) hold true in this case. To obtain the conditionH₄) it suffices to calculate the integral inbg,integrating by parts two times. For example, one can take gamma distribution of orderm≥2

g(x) = a^mx^m−1

m! e^−ax1_{x≥0} and a>0. (12) It should be noted that in view of Proposition 5.2 from Barbu, Beltaief and Pergamenshchikov (2019a), ConditionsH₁)–H₄) imply that the renewal measure (11) has a continuous densityρsuch that

kΥk₁= Z +∞

0

|Υ(x)|dx <∞ and Υ(x) =ρ(x)−1

τ. (13)

Remark 2 It should be noted that ConditionH₄) does not hold for the exponential random variable (τ_j)_j≥1since its density is not continuous in zero. But for exponential random variables, i.e. in the case when (N_t)_t≥0 is a Poisson process, the renewal density can be calculated directly, i.e. ρ(x) ≡ 1/τ and Υ ≡0.

Now we describe the class of possible admissible noise distributions used in the robust risk (5). To this end we set

σ_Q=%²₁+%²₂+%²₃

τ . (14)

As to the parameters in (6), we assume that

ς_∗ ≤σ_Q≤ς^∗, (15)

where the unknown bounds 0< ς_∗ ≤ς^∗ can be functions of T, i.e. ς_∗=ς_∗(T) andς^∗=ς^∗(T), such that for anyb>0

lim

T→∞

T^bς_∗(T) = +∞ and lim

T→∞

ς^∗(T)

T^b = 0. (16)

We denote by Q_T the family of all distributions of the process (6) inD[0, T] satisfying the properties (15) – (16).

Remark 3 As we will see later, the parameter (14) is the limit of the Fourier transform of the noise process (6). This limit is called variance proxy (see Konev and Pergamenshchikov (2012)).

(9)

3 Truncated fractional Poisson processes

As an example of the process (10) satisfies the conditionsH₁) – H₄) we give the truncated fractional Poisson process introduced in Barbu, Beltaief and Pergamenshchikov (2019b). To this end, we remind the definition of the fractional Poisson process (see, for example, Biard and Saussereaur (2014); Laskin (2003)). The process (10) is called fractional Poisson process if the i.i.d. random variables (τ_j) have the Mittag-Leffler distribution which, for somea>0, is defined as

P(τ₁> t) =E_H(−at^H), (17) where 0< H≤1 is called the Hurst index,

E_H(z) =

∞

X

k=0

z^k

Γ(1 +Hk) and Γ(x) = Z +∞

0

t^x−1e^−tdt .

Note that, ifH = 1, then we obtain the exponential distribution with param- etera>0 and, therefore, the process (10) is a Poisson process. If 0< H <1, then the density of the distribution (17) (see, for example, Repin and Saichev (2000)) can be represented as

f_H(t) = asin(πH) π

Z +∞

0

x^He^−tx

x^2H+a²+ 2ax^Hcos(πH)dx . (18) Form here we can directly obtain that

f_H(t)∼t^H−1, f_H⁰ (t)∼t^H−2, f_H⁰⁰(t)∼t^H−3 as t→0 (19) and

f_H(t)∼t^−H−1, f_H⁰ (t)∼t^−H−2, f_H⁰⁰(t)∼t^−H−3 as t→ ∞. (20) In particular, this implies that the Mittag-Leffler distribution has a heavy tail, i.e.

P(τ₁> t) ∼ t^−H as t→ ∞, (21) i.e.Eτ₁= +∞. Therefore, the conditionH₃) does not hold for the distribution (17). To correct this effect, in Barbu, Beltaief and Pergamenshchikov (2019b) it is proposed to replace the Mittag-Leffler random variables in (10) with i.i.d.

random variables distributed as τ₁^∗ = min(X_∗^b, X^∗), where X_∗ is a Mittag- Leffler with 0< H <1, 0<b≤H/3 and X^∗ is a positive random variable satisfying the conditionH^∗₄). Such processes are called truncated Poisson processes. Using the asymptotic properties (19) and (20) one can check directly that the random variable τ₁^∗ satisfies the condition H^∗₄) and, therefore, the conditionsH₁) –H₄) hold true for this case.

(10)

Remark 4 It should be noted also that the process (10) with the Mittag-Leffler random variables has a “memory” in its increments (see, for example, Mahesh- wari and Vellaisamy (2016)) in the sense that, for anyδ > 0 and s >0,the correlation coefficient

Corr N_s+δ−N_s

, N_t+δ−N_t

∼ t⁻^3−H² as t→ ∞. It should be noted that this property is very important for many practical problems and allows essentially to expand the possible applications of statistical results. Unfortunately, we can’t use directly the fractional Poisson process in the regression model (3) since the impulse noise of the fractional Poisson processes will be very rare, since the time between jumps is not integrable, i.e. very large and, therefore, they have almost negligible influence in the ob- servation models. On the contrary, the truncated process has an exponential moment, i.e. the same property as Poisson processes, and, moreover, it keeps a dependence on large time intervals.

4 Model selection

In this section we construct a model selection procedure for estimating the unknown functionS given in (3) starting from the discrete-time observations (2) and we establish the oracle inequality for the associated risk. To this end, note that for any functionf : [0, T]→RfromL₂[0, T], the integral

I_T(f) = Z T

0

f(s)dξ_s (22)

is well defined, withE_QI_T(f) = 0. Moreover, as it is shown in Lemma 1 under the conditionsH₁)–H₄),

E_QI_T²(f)≤κQ

Z T 0

f_s²ds and κQ=%²₁+%²₂+%²₃|ρ|_∗ (23) where|ρ|_∗= sup_t≥0|ρ(t)|<∞.

In this paper we will use the trigonometric basis (φ_j)_j≥₁inL₂[0,1] defined as φ₁= 1, φ_j(x) =√

2Tr_j(2π[j/2]x), j≥ 2, (24) where the function Tr_j(x) = cos(x) for evenj and Tr_j(x) = sin(x) for oddj, [x] denotes the integer part of x. Note, that these functions are orthonormal on the points (t_j)_1≤j≤p, i.e. for any 1≤i, j≤p

(φ_i, φ_j)_p= 1 p

p

X

l=1

φ_i(t_l)φ_j(t_l) =1_{i=j}. (25)

(11)

In the sequel we denote bykxk²_p= (x, x)_p. Now note that, for any 1≤l≤p, S(t_l) =

p

X

j=1

θ_j,pφ_j(t_l) and θ_j,p= (S, φ_j)_p. (26) Using the approach from Konev and Pergamenshchikov (2015), we estimate the Fourier coefficientsθ_j,pas

θb_j,p= 1 T

Z T 0

ψ_j,p(t)dy_t, and ψ_j,p(t) =

n

X

l=1

φ_j(tl)1_{t_l−1_<t≤t

l}. (27) It is clear that the functions (ψ_j,p)1≤j≤p are orthonormal inL₂[0,1], i.e.

(ψ_j,p, ψ_i,p) = Z 1

0

ψ_j,p(t)ψ_i,p(t)dt = (φ_j, φ_i)_p =1_{i=j}. (28) The Fourier coefficients ofS in the basis can be represented as

θ_j,p= (S, ψ_i,p) = Z 1

0

S(t)ψ_i,p(t)dt =θ_j,p+h_j,p, (29) whereh_j,p(S) =Pp

l=1

Rt_l tl−1

φ_j(t_l)(S(t)−S(t_l))dt. Therefore, (27) implies bθ_j,p=θ_j,p+ 1

√Tξ_j,p and ξ_j,p= 1

√TI_T(ψ_j,p). (30) As in Barbu, Beltaief and Pergamenshchikov (2019a) we use the model selection procedures based on the following weighted least squares estimators

Sb_λ(t) =

p

X

j=1

λ(j)bθ_j,pψ_j,p(t), 0≤t≤1, (31) where the weight vectorλ= (λ(1), . . . , λ(p))⁰ belongs to some finite setΛfrom [0,1]^p. Here the prime⁰ denotes the transposition. Moreover, we set

m_∗= card(Λ) and Λ_∗= max

λ∈Λ p

X

j=1

1{λ(j)>0}, (32) where card(Λ) is the cardinal number of the set Λ. We assume that Λ_∗ ≤n.

Now we use the same criteria as in Barbu, Beltaief and Pergamenshchikov (2019a) to chose a weight vector inΛ, i.e.we minimize the empirical error

Err(λ) =kSbλ−Sk², (33)

which can be represented as Err(λ) =

p

X

j=1

λ²(j)bθ²_j,p−2

p

X

j=1

λ(j)bθ_j,pθ_j,p+kSk². (34)

(12)

Note that the Fourier coefficients (θ_j)_j≥₁ are unknown. Therefore, using the approach from Barbu, Beltaief and Pergamenshchikov (2019a) to minimize this function we replace the termsθb_j,pθ_j,pby their estimators

eθ_j,p=θb²_j,p−σ_Q T ,

where the proxy varianceσ_Q is defined in (15). In the case when this variance is unknown we use its estimator, i.e.

θe_j,p=θb_j,p² −bσ_T

T and σb_T =T p

p

X

j=[√ T]

θb_j,p² . (35) Now, using this estimator we define the penalty term as

Pb_T(λ) = bσ_T|λ|²

T and |λ|²=

p

X

j=1

λ²(j). (36)

In the case, when the varianceσ_Q is known we set P_T(λ) = σ_Q|λ|²

T . (37)

Finally, we define the cost function as J_T(λ) =

p

X

j=1

λ²(j)bθ_j,T² −2

p

X

j=1

λ(j)eθ_j,T+δPb_T(λ), (38)

where δ > 0 is some threshold which will be specified later. Now we set the model selection procedure as

Sb_∗=Sb

bλ and λb= argmin_λ∈ΛJ_T(λ). (39) In the case whenbλis not unique we take one of them.

5 Main results

5.1 Oracle inequalities

Firstly, we obtain the non asymptotic oracle inequality for the model selection procedure (39). To this end we need a condition for the observations frequency.

H₅)Assume that the frequencypis a function ofT, i.e.p=p_T, such that lim inf

T→∞

p_T

T^5/6 >0 and lim sup

T→∞

p_T

T <∞. (40)

(13)

Theorem 1 Assume that the conditions C₁)– C₃)and H₁)–H₅) hold true.

Then, there exists some constant c^∗ > 0 such that for any T ≥ 1 and any noise distribution Q ∈ Q_T and0 < δ ≤1/6, the procedure (39) satisfies the following oracle inequality

R_Q(Sb_∗, S)≤1 + 3δ 1−3δmin

λ∈ΛR_Q(Sb_λ, S) +c^∗m_∗

δT

1 +σ_Q² +Λ_∗E_Q|bσ_T −σ_Q|

. (41)

In the case whenσ_Q is known the inequality (41) has the following form.

Corollary 1 Assume that the conditionsC₁) –C₃) andH₁)–H₅)hold true and that the proxy variance σ_Q is known. Then there exists some constant c^∗ >0 such that for any T ≥ 1 and for any noise distribution Q∈ Q_T and 0 < δ≤1/6, the procedure (39) with bσ_T =σ_Q, satisfies the following oracle inequality

R_Q(Sb_∗, S)≤ 1 + 3δ 1−3δmin

λ∈ΛR_Q(Sbλ, S) +c^∗(1 +σ²_Q)m_∗

δT .

Now we study the estimatorσb_T defined in (35).

Proposition 1 Assume that the conditions C₁) – C₃) andH₁) –H₅) hold true for the model (3)and thatS(·)is continuously differentiable. Then, there exists a constantc^∗>0 such that for anyT ≥2,Q∈ Q_T andp >√

T, E_Q,S|σb_T−σ_Q| ≤ c^∗(1 +|S|˙ ²)(1 +σ_Q)²g^∗_{T ,p}, (42) whereg^∗_{T ,p}=√

T /p+ 1/√ p.

Now Theorem 1 and this proposition imply directly the following result.

Theorem 2 Assume that the function S is continuously differentiable and that the conditionsC₁)–C₃)andH₁)–H₅)hold true. Then there exists some constant c^∗ >0 such that for any continuously differentiable function S for any T ≥ 2, for any noise distribution Q∈ Q_T,p >√

T and0< δ≤1/6, R_Q(Sb_∗, S)≤1 + 3δ

1−3δmin

λ∈ΛR_Q(Sbλ, S) +c^∗m_∗

δT(1 +σ_Q)²

1 +|S|˙ ² 1 +Λ_∗g_{T ,p}^∗ .

To study robust properties of the procedure (39) we need a condition for weights.

H₆)The parametersm_∗andΛ_∗ defined in (32)can be functions ofT, i.e.

m_∗=m_∗(T)andΛ_∗=Λ_∗(T), such that for anyb>0 lim_T→∞T^−bm_∗(T) = 0 and lim_T→∞T^−1/3−bΛ_∗(T) = 0.

Now, Theorem 2 implies the following oracle inequality.

(14)

Theorem 3 Assume that the function S is continuously differentiable, the conditions the conditionsC₁)– C₃) andH₁)–H₆) hold true. Then, the procedure (39)for any T ≥2, p >√

T and 0 < δ <1/6 satisfies the following oracle inequality

R^∗(Sb_∗, S)≤1 + 3δ 1−3δmin

λ∈ΛR^∗(Sbλ, S) +U^∗_T T δ , where the term U^∗_T >0 is such that for anyr >0 andb>0,

lim

T→∞ sup

kSk≤r˙

T^−bU^∗_T = 0. (43)

In order to obtain the efficiency property, we specify the weight coefficients in the procedure (39). Consider, for some fixed 0 < ε <1, a numerical grid of the form

A={1, . . . , k^∗} × {ε, . . . , mε}, m= [1/ε²], (44) where k^∗ ≥ 1 andε are functions of T, i.e. k^∗ =k^∗(T) and ε= ε(T), such

that 





lim_T_→∞k^∗(T) = +∞, lim_T_→∞ k^∗(T) lnT = 0, lim_T_→∞ε(T) = 0 and lim_T_→∞T^bε(T) = +∞

(45) for anyb>0. One can take, for example, forT ≥2

ε(T) = 1

lnT and k^∗(T) =k₀^∗+√ lnT ,

wherek^∗₀≥0 is a fixed constant. For eachα= (k, r)∈ A, we set the vector λ_α= (λ_α(j))_1≤j≤p

through its components which are defined as λ_α(j) =1_{1≤j<ln_T_}+ 1−(j/ω_α)^k

1_{ln_T_≤j≤ω

α}, where

ω_α=

(k+ 1)(2k+ 1) π^2kk rυ_T

1/(2k+1)

, υ_T =T /ς^∗ andς^∗ is introduced in (15). Now we define the setΛas

Λ = {λ_α, α∈ A}. (46)

These weight coefficients are used in Konev Pergamenshchikov (2009a); Konev and Pergamenshchikov (2012, 2015) for continuous time regression models to show the asymptotic efficiency. Note also that in this case the cardinal of the setΛism_∗=k^∗m. Moreover, taking into account that fork≥1 the coefficient ω_α<(rυ_T)^1/(2k+1), we obtain that the norm of the setΛ defined in (32) can be bounded as Λ_∗ ≤ sup_α∈Aω_α ≤(υ_T/ε)^1/3. Therefore, the properties (45) imply the conditionH₆).

(15)

5.2 Robust asymptotic efficiency

Now we study the asymptotic efficiency properties for the procedure (39), (46) with respect to the robust risks (5) defined by the distribution family (15) – (16). To this end, we assume that the unknown function S in the model (3) belongs to the Sobolev ball

W_r,k=







f ∈ C_per^k [0,1] :

k

X

j=0

kf^(j)k²≤r







, (47)

wherer>0,k≥1 are some unknown parameters,C^k_per[0,1] is the set ofktimes continuously differentiable functionsf : [0,1]→Rsuch thatf⁽ⁱ⁾(0) =f⁽ⁱ⁾(1) for all 0≤i≤k. Note, that the class (47) is an ellipsoid, i.e.

W_r,k=







f =X

j≥1

θ_jφ_j :

∞

X

j=1

a_jθ²_j ≤r







(48)

where a_j = Pk

i=0(2π[j/2])²ⁱ. Similarly to Barbu, Beltaief and Pergamen- shchikov (2019a) we will show here that the asymptotic sharp lower bound for the normalized robust risk (5) is given by the well-known Pinsker constant defined as

l_∗=l_∗(r) = ((2k+ 1)r)^1/(2k+1)

k (k+ 1)π

2k/(2k+1)

. (49)

To study efficient properties we need to use the setΞ_T of all possible estimators Sb_T measurable with respect to the sigma-algebraσ{y_t,0≤t≤T}.

Theorem 4 For the risk (5)with the coefficient rate υ_T =T /ς^∗ lim inf

T→∞ υ_T^2k/(2k+1) inf

Sb_T∈Ξ_T

sup

S∈W_r,k

R^∗_T(Sb_T, S)≥l_∗. (50) Note that, if the radius r and the regularity k are known, i.e. for the non- adaptive estimation problem on the continuous observations (y_t)_0≤t≤T, in Barbu, Beltaief and Pergamenshchikov (2019a) it is proposed to use the esti- mateSb_λ

0 defined in (31) with the weights (46) λ₀=λ_α

0, α₀= (k, r₀) and r₀= [r/ε]ε . (51) Now, we show the same result for the discrete observations (2).

Proposition 2 Assume that the conditions the conditions C₁) – C₂) and H₁)–H₅) hold true. Then

lim

T→∞υ^2k/(2k+1)_T sup

S∈W_r,k

R^∗_T(Sbλ₀, S)≤l_∗. (52)

(16)

For the adaptive estimation we user the model selection procedure (39) with the parameterδdefined as a function ofT, i.e.δ=δ_T, such that

lim

T−→∞ δ_T = 0 and lim

T−→∞T^bδ_T = +∞ (53)

for anyb>0. For example, we can takeδ_T = (6 + lnT)⁻¹.

Theorem 5 Assume that the conditions C₁)– C₃)and H₁)–H₆) hold true.

Then the robust risk (5) for the procedure (39) with the coefficients (46)and the parameterδ=δ_T satisfying (53)has the following upper bound

lim sup

T→∞

υ_T^2k/(2k+1) sup

S∈W_r,k

R^∗_T(Sb_∗, S)≤l_∗. Theorem 4 and Theorem 5 imply the following result.

Then the procedure (39) with the weight coefficients (46) and the parameter δ=δ_T satisfying (53)is asymptotically efficient, i.e.

lim

T→∞

inf

Sb_T∈Ξ_T sup_S∈W

r,k

R^∗_T(Sb_T, S) sup_S∈W

r,k R^∗_T(Sb_∗, S) = 1 and

lim

S∈W_r,k

R^∗_T(Sb_∗, S) =l_∗.

Remark 5 It is well known that the optimal (minimax) risk convergence rate for the Sobolev ball W_r,k is T^2k/(2k+1) (see, for example, Pinsker (1981);

Konev Pergamenshchikov (2009b)). We see here that the efficient robust rate isυ^2k/(2k+1)_T , i.e. if the distribution upper boundς^∗→0 asT → ∞we obtain a faster rate with respect toT^2k/(2k+1), and ifς^∗→ ∞ asT → ∞we obtain a slower rate. In the case whenς^∗ is constant the robust rate is the same as the classical non robust convergence rate.

5.3 Big data analysis for the model (1)

Now we consider the estimation problem for the parameters (β_j)_1≤j≤q in (3) with unknownq. In this case we have to estimate the sequenceβ= (β_j)_j≥1 in whichβ_j = 0 forj≥q+ 1. To this end we assume that the functions (u_j)_j≥1 are orthonormal in L₂[0,1], i.e. (u_i,u_j) =1_{i6=j}. Indeed, we can use always the Gram-Schmidt orthogonalization procedure to provide this property. Thus, in this case we estimate the parameters β = (β_j)_j≥1 through the estimator (39) as βb_∗ = (βb_∗,j)_j≥1 and βb_∗,j = (u_j,Sb_∗). Similarly, using the weighted estimators (31) we define the basic estimators (βb_λ)_λ∈Λ asβb_λ= (bβ_j,λ)_j≥1 and βb_j,λ= (u_j,Sb_λ). Taking into account that in this case

|βb_∗−β|²=P∞

j=1(bβ_∗,j−β_j)²=kSb_∗−Sk²and|βb_λ−β|²=kSb_λ−Sk², Theorem 3 implies the following oracle inequality.

(17)

Theorem 7 Assume that the function (3) is continuously differentiable and the conditions C₁) – C₃), H₁)–H₅) and (15)–(16) hold true. Then, for any n≥ 1 and0< δ <1/6, the following oracle inequality holds true

sup

Q∈Q_T

E_Q,S|βb_∗−β|²≤1 + 3δ 1−3δmin

λ∈Λ sup

Q∈Q_T

E_Q,S|βb_λ−β|²+U^∗_T T δ ,

where the term U^∗_T >0 satisfies the property (43).

Moreover, Theorem 6 implies the following efficiency property.

Then the estimatorβb_∗ constructed through the procedure (39)with the weight coefficients (46) and the parameter δ = δ_T satisfying (53) is asymptotically efficient in the minimax sense, i.e.

lim

T→∞

inf

βb_T sup_S∈W

r,k

sup_Q∈Q

T

E_Q,S|βb_T −β|² sup_S∈W

r,k

sup_Q∈Q

T

E_Q,S|βb_∗−β|² = 1 (54) and

lim

S∈W_r,k

sup

Q∈Q_T

E_Q,S|βb_∗−β|²=l_∗,

where the infimum is taken over all possible estimators βb_T measurable with respect the field σ{y_t,0≤t≤T} and the lower boundl_∗ is defined in (49).

Remark 6 It should be emphasized that the efficiency properties (54) are obtained without sparse conditions on the number of non zero parametersβ_j in the model (1) (see, for example, in Hasttie, Friedman and Tibshirani (2008)).

Moreover, we do not use even the parameter dimensionqwhich can be equal to +∞.

6 Stochastic calculus for generalized semi-Markov processes

In this section we study some properties of the stochastic integrals (22). First, note that using the conditions C₁) and C₂) and the stochastic calculus de- veloped in Barbu, Beltaief and Pergamenshchikov (2019a) for semi-Markov processes we can show the following Lemmas 1 and 2.

Lemma 1 Assume that the conditions C₁) – C₂) and H₁)–H₄) hold true.

Then, for any non random functionsf andhfrom L₂[0, T]

E_QI_t(f)I_t(h) = (%²₁+%²₂) (f, h)_t +%²₃(f, hρ)_t, (55) where (f, h)_t = Rt

0f(s)h(s)ds and ρ is the density of the renewal measure (11).

(18)

It should be noted that this lemma implies directly that the stochastic integral (22) satisfies the properties (23).

Lemma 2 Assume that the conditions C₁) – C₂) and H₁)–H₄) hold true.

Then, for any bounded[0,∞)→Rfunctionsf andhand for anyk≥1,

E_Q I_t

k−(f)I_t

k−(h)| G

= (%²₁+%²₂)(f , h)_t

k+%²₃

k−1

X

l=1

f(t_l)h(t_l),

wheret_k=Pk

j=1τ_j andG=σ{t_l, l≥1}.

Lemma 3 Assume that the conditions C₁) – C₃) and H₁)–H₄) hold true.

Then, for any nonrandom bounded[0, T]→Rfunctionsf andh, the expecta- tionE_QRT

0 I_t−² (f)I_t−(h)h(t)dξ_t= 0.

Proof.Setting ˇL_t=%1w_t+%2L_t, we can represent the integral (22) as I_t(f) = ˇI_t(f) +%₃I_t^z(f), (56) where ˇI_t(f) = Rt

0f(u)d ˇL_u and I_t^z(f) = Rt

0f(u)dz_u. Note here, that using the condition (7) and the inequality for martingales from Novikov (1975) we can obtain that E_Q sup_0≤t≤T Iˇ_t⁸(f) <∞. Since ˇL_t and z_t are independent, we get E_QRT

0 I_t−² (f)I_t−(h)h(t)dLˇ_t = 0. Moreover, the conditions C₁) – C₃) yield, that for any non random (c_i,j) andk≥1E

Pk−1

j=1c_1,jζ_j²

ζ_k = 0 and E

Pk−1

j=1c_1,jζ_j2 Pk−1

j=1c_2,jζ_j

ζ_k = 0. Therefore, taking into account that the sequence (ζ_k)_k≥1does not depend on the moments (t_k)_k≥1and the process ( ˇL_t)_t≥0, and using the same method as in the proof of Lemma 8.4 from Barbu, Beltaief and Pergamenshchikov (2019a) we obtain

E_Q Z T

0

I_t−² (f)I_t−(h)h(t)dz_t=E_Q X

k≥1

1_{t

k≤T}I_t²

k−(f)I_t

k−(h)h(t_k)ζ_k= 0. This implies Lemma 3.

Now we study the integrals defined in (64) as functions off.

Proposition 3 Assume that the conditions C₁) – C₃) and H₁)–H₄) hold true. Then, for any[0,∞)→Rfunctionsf, hsuch that|f|_∗≤1 and|h|_∗≤1, one has

|E_QIe_T(f)Ie_T(h)| ≤12σ_Q²(1 +τ)² (f, h)²_T +Tec

, (57)

whereec= (2 +Π(x⁴) + 2|ρ|_∗)(1 +kΥk²₁)and|f|_∗= sup_t≥0|f(t)|.

(19)

Proof. First of all, note that in view of the Ito formula and using the fact that for the process (6) the jumps∆z_s∆L_s= 0 a.s. for any s≥0, we obtain that

dI_t²(f) = 2I_t−(f)dI_t(f) +%²₁f²(t)dt +%²₂d X

0≤s≤t

f²(s)(∆L_s)²+%²₃d X

0≤s≤t

f²(s)(∆z_s)². Note also that Lemma 1 yields E_QI_t²(f) = (%²₁+%²₂)kfk²_t +%²₃kf√

ρk²_t with kfk²_t =Rt

0 f²(t)dt. Therefore,

deI_t(f) = 2I_t−(f)f(t)dξ_t+f²(t)dme_t, me_t=%²₂mˇ_t+%²₃m_t, where ˇm_t=P

0≤s≤t(∆L_s)²−t andm_t=P

0≤s≤t(∆z_s)²−Rt

0ρ(s)ds. Thus, E_QIe_T(f)eI_T(h) =E_Q

Z T 0

Ie_t−(f)deI_t(h)+E_Q Z T

0

Ie_t−(h)deI_t(f)+E_Q[I(fe ),I(h) ]e _T. Using here Lemma 3 and, taking into account that ( ˇm_t)_t≥0 is a square integrated martingale, we get

E_Q Z T

0

Ie_t−(f)deI_t(h) =E_Q Z T

0

Ie_t−(f)h²(t)dme_t=ρ²₃E_Q Z T

0

I_t−² (f)h²(t)dm_t. The last integral can be represented as

E_Q Z T

0

I_t−² (f)h²(t)dm_t=J₁−J₂, (58) whereJ₁=E_QP

k≥1 I_t²

k−(f)h²(t_k)1_{t

k≤T}andJ₂=RT

0 E_QI_t²(f)h²(t)ρ(t)dt.

By Lemma 2 we get J₁=E_QX

k≥1

E_Q I_t²

k−(f)|G

h²(t_k)1_{t

k≤T}= (%²₁+%²₂)J_1,1+%²₃J_1,2, whereJ_1,1=E_QP

k≥1 kfk²_t

k

h²(t_k)1_{t

k≤T}=RT

0 kfk²_th²(t)ρ(t)dtand J_1,2=E_QX

k≥1 k−1

X

l=1

f²(t_l)h²(t_k)1_{t

k≤T}=E_QX

l≥1

f²(t_l) X

k≥l+1

h²(t_k)1_{t

k≤T}

= Z T

0

f²(x)

Z T−x 0

h²(x+t)ρ(t)dt

!

ρ(x)dx .

Moreover, using Lemma 1 for the last term in (58), we obtain that J₂= (%²₁+%²₂)

Z T 0

kfk²_th²(t)ρ(t)dt+%²₃ Z T

0

kf√

ρk²_th²(t)ρ(t)dt