A NNALES DE L ’I. H. P., SECTION B
E. E WEDA O. M ACCHI
Quadratic mean and almost-sure convergence of unbounded stochastic approximation algorithms with correlated observations
Annales de l’I. H. P., section B, tome 19, n
o3 (1983), p. 235-255
<http://www.numdam.org/item?id=AIHPB_1983__19_3_235_0>
© Gauthier-Villars, 1983, tous droits réservés.
L’accès aux archives de la revue « Annales de l’I. H. P., section B » (http://www.elsevier.com/locate/anihpb) implique l’accord avec les condi- tions générales d’utilisation (http://www.numdam.org/conditions). Toute uti- lisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute copie ou impression de ce fichier doit conte- nir la présente mention de copyright.
Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques
http://www.numdam.org/
Quadratic
meanand almost-sure convergence of unbounded stochastic approximation algorithms
with correlated observations
E. EWEDA
(*)
and O. MACCHI(**)
Vol. XIX, n° 3-1983, p. 235-255. Calcul des Probabilites et Statistique.
RESUME. - Dans ce travail nous demontrons la convergence presque sure et en moyenne
quadratique
d’unalgorithme
degradient stochastique
avec des pas decroissants
gouvernant
un estimateur lineaireadaptatif.
Nous n’utilisons aucune
hypothese
irrealiste telle quel’indépendance
desobservations successives ou la bornitude de
l’algorithme
comme dans lalitterature anterieure. Pour les observations
corrélées,
nous utilisons unmodele a memoire
finie,
cequi correspond
a une vaste classed’application,
etant donné que des observations suffisamment
separees
dans letemps peuvent generalement
etresupposees independantes.
Le modele a memoire finie al’avantage,
parrapport
a d’autres modelesergodiques
tels que lesmodeles a covariance
decroissante,
depermettre
uneanalyse
relativementsimple
de la convergence del’algorithme
dugradient.
Deplus,
le modelepermet
de démontrer la convergence en moyennequadratique,
cequi
n’avait pas ete fait
jusqu’ici
pour aucun modeleergodique.
ABSTRACT. - In this paper we prove the almost-sure
(a. s.)
and qua- dratic mean(q. m.)
convergence of a stochaticgradient algorithm
withdecreasing step
sizegoverning
anadaptive
linear estimator. We do not use unrealisticassumptions
such as theindependence
of successive obser- vations or the boundedness of thealgorithm,
as has beenpreviously
done(*) Military Technical College, Cairo, Egypt. Mailing address : 12 A Alexandre Zaki Street, Helmiat el-Zaitoun, Cairo, Egypt.
(**) C. N. R. S.-E. S. E., Plateau du Moulon, 91190 Gif-sur-Yvette, France.
Annales de l’lnstitut Henri Poincaré-Section B-Vol. XIX, 0020-2347/19831235/S 5,00/
in
literature.
The model we use for the correlated observations is a finite memory model. This model agrees with a wide class ofapplications
dueto the fact that
sufficiently
timeseparated
observations canusually
beassumed
independent.
The finite memory model has theadvantage,
withrespect to other
ergodic
models such as models withdecreasing covariance,
that it allows a
relatively simple
convergenceanalysis
of thegradient algorithm. Moreover,
itpermits
theproof
of the q. m. convergence which has not yet been attained with any kind ofergodic
models.I. INTRODUCTION
In this paper we prove both the almost-sure
(a. s.)
andquadratic
mean(q. m.)
convergence of the stochasticgradient algorithm
to the
optimum
vectorh~ given by
where
XT
and X* denoterespectively
thetranspose
andcomplex conju- gate
of the vector X. Thestep
size of thealgorithm
is adecreasing
sequenceof
positive
numberssatisfying
The
algorithm (1)
is intended to calculateiteratively
the vectorh*
thatminimizes the
quadratic
mean error12)
between the mes-sage aj and its linear estimate based on the observation vector
Xj.
It is assumed that the sequence
(aj, X~) ; j
=1, 2,
... isstationary
and thatthe covariance matrix R in
(2)
is invertible.The
algorithm (1)
has been used for many years inadaptive signal
pro-cessing.
Theexperimental
and simulation results have shown the conver-gence
of h j
toh* . However,
there is not yet asatisfactory
mathematicalproof
of that convergence undercompletely
realisticassumptions.
Inearlier
analysis
of similaralgorithms
e. g. in[1 ],
one finds twomajor
assump-tions, usually
not fulfilled in theapplications
encountered in the field ofsignal processing.
The first one is theindependence
of successivepairs (a~, X~).
Now insignal processing, X~
is often made of successive timeAnriales de l’Institut Henri Poincare-Section B
samples
of the sameanalog signal,
in such a way that successive obser- vation vectors share N - 1components
and are,therefore, strongly
correlated. The second unrealistic
assumption
often used in literature is that thevector h j
isbounded,
whichimplies
the use of areflecting
barrierthat
brings h~
back into acompact
set every time it leaves that set. Nowapplications
of thealgorithm
have shown that such a barrier is not neces-sary for the convergence.
Afterwards,
the boundednessassumption
hasbeen omitted in
[2-6 ]. However,
theindependence assumption
remainsin those papers. In a recent work
[7] Kushner
and Clark have overcomethe
independence problem
andproved
a. s. convergence ofalgorithm ( 1 )
with correlated observations. Other papers have dealt with the correlated
case e. g.
Ljung [8 ]. However,
the boundednessassumption
reappears in these papers.Namely,
it is assumedthat hj
will returnindefinitely
within a(random) compact
set. Moreover theproofs
areextremely
difficult to follow.In the
present
paper, we overcome both barrier andindependence problems
and for correlated observations we prove the a. s. and q. m.convergence of
h~
toh* .
Theassumption
we use to attain this result is that thepair (aJ, X~)
has a finite memory and finite moments. The finite memoryassumption
isplausible
in a wide class ofapplications
sincesufficiently
timeseparated
observations canusually
be assumed inde-pendent.
Also the finite momentsassumption
is satisfied in a wide class ofapplications.
Twopractical examples
in which the finite memory and finite momentsassumptions
are both satisfied are evoked in section III of this paper. It should be mentioned that we do not assume a linearfiltering relationship
between a~ and On thecontrary,
whenapplying Ljung’s
theorem
[8 ] to
thealgorithm (1),
one is foundobliged
to assumethat aj
and
X~
are relatedby
linearfiltering,
up to someindependent
additivenoise. This is due to the fact that
Ljung
uses a state space model in his theorem. In thepresent
work we do not use a state space model and the linearfiltering
relation is notneeded; ak
andXk
can be relatedthrough
any
specific non-linearity.
The restrictive
independence
and boundednessassumptions
have alsobeen
recently suppressed by
Farden[9] ] independently
and at the sametime as
by
our ownprevious
works[10-12 ].
In both works[9] and [10 ],
the convergence
proof
relies upon anassumption
of rationaldecay
ofthe kind for the covariance of
elements
such as )I akXk I
andother second order observed variables.
Admittedly,
the theorempresented
in this paper is on one hand more restrictive than those of
[9] ] and [10 ],
because it uses a less
general
model for the observations(i.
e. finitememory).
However,
it is on the other hand morepowerful
because it reaches the result ofquadratic
mean convergence in addition to almost-sure conver-gence
proved
in[9 ] , [10 ]. Actually,
the finite memory model derives its interest from the fact that it is a model withdependence
for which we prove the q. m. convergence in addition to the a. s. convergence, the formerbeing
itself based uponsimple
moment bounds. Similar bounds couldprobably
be derived for thedecaying
covariancemodel,
but at the expense of much more technical labor.The convergence
analysis presented
in this paper concerns the real version of thealgorithm (1),
i. e., we assume that a.bXk
andhk
are real-valued.However,
without any additionaldifficulty,
it can be extended to thecomplex
case. An
example
of the latter case in the context of data transmission is thequadrature amplitude
modulation in which the datasignal
ak modulates’ . theamplitude
andphase
of a carrier wave that is demodulated at the receiverby
two carriers inquadrature
togive
the observationsignal X~.
In the
following
we assumeimplicity
that the sequence(ak, Xk)
isstrictly stationary
and that its correlation matrix R is invertible.II. THEOREM
THEOREM. -
If there
exista finite integer
M’ >1,
anda finite positive
Msuch that
are
statistically independent,
then the vector
h~ defined by
thealgorithm (1)
tends toh~
in both the qua- dratic mean and almost-sure senses as k tends toinfinity.
The
validity
of theassumptions (A. 1)-(A. 3)
inapplications
isemphasized
in section III. The
assumption (A. 1)
states that the sequence(ak, X~)
hasa finite strong memory M’. As
already mentioned,
thisassumption
is arealistic one since
sufficiently
timeseparated observations,
canusually
in
practice
be assumedindependent.
On the otherhand,
from a theoreticalviewpoint
thisassumption
is very convenient because it allows the compu- tation of momentsof hk,
due to factorizationproperties.
This leadsnaturally
Annates de l’Institut Henri Poincaré-Section B
to a q. m.
type
convergence. Such acomputation
of moments is not easy in adecaying
covariancemodel,
and therefore with the latter model theergodicity
will rather be used to deal with individualsamples
and provea. s. convergence.
The boundedness
assumption (A. 2)
for the moments of observations is satisfied in a wide class ofpractical applications.
Notice that insignal processing applications,
whereXk represents
asliding
time-window of width N from asampled signal,
the memory M’ is at leastequal
to N.Hence in
assumptions (A. 2), (A. 3),
theinteger
M can be set at the value M’of the memory. Therefore we see that the
greater
the memory, thelarger
the order of
the Xk |-moments
that are assumed bounded. Such a relations-hip
is notsurprising.
Neither is it isolated in theliterature;
e. g. in condi- tion B. 6 of[9 ]
if appears that therequired
order of bounded moments increases as the rate ofdecay
of covariancedecreases,
i. e., as the memory increases. The order of momentsof I
that are assumed bounded isgreater
than(but
very nearto)
4 and less than 5. The reasonwhy
the ordersof the moments assumed bounded are
respectively
24 Mfor I
and4
1 +1
fora
is a technical one. It is made clear in the courseof the
proof
that follows.III. PRACTICAL APPLICATIONS
In this section we
give
twoimportant pratical applications
of the theo-rem in the field of data transmission. In the first
example
thealgorithm ( 1 )
is the
adaptation algorithm
of a transversalequalizer.
In such a case ak is the data transmitted at timek,
the observation vectorX~
is made of N successive timesamples
of the channeloutput,
thevector hk
iscomposed
of the
equalizer tap
coefficients at time kand h*
is theoptimum equaliza-
tion vector. Thus two successive observation vectors share N - 1 compo- nents and are,
therefore, strongly
correlated.Now, ak
is bounded because it is adigital
data. Thus(A. 3)
is satisfied. The observation vectorXk
resultsfrom
filtering
the sequence of databy
a stable filter~ , corresponding
tothe
filtering
effect of the transmissionchannel,
to which is added a Gaussian noise.Consequently,
all moments ofXk
are bounded.Thus,
the assump- tion(A. 2)
of the theorem is satisfied. The data sequence isusually
inde-pendent
and so also is the noise sequence.Hence,
the memory M’ in theassumption (A .1 ) depends
on the memory of ~ which can be assumedfinite without loss of
generality
inpractical applications. Consequently,
the theorem
presented
in this paper can be used to prove the a. s. and q. m.convergence of the classical
adaptive equalizer
to theoptimum
one, aproof
which seems to have never beencompleted
before.In the second
practical example,
thealgorithm (1)
is theadaptation algorithm
of an echo canceller infull-duplex
data transmission overtwo-wire
telephone
channels. In such a case ak is anoisy
copy of the echo at time k.Thus, apart
from the additive(Gaussian) noise,
ak results from the transmitteddigital
dataXk by
a stablefiltering
~’corresponding
to the reflection
properties
of the channel.Hence, both ak
andXk
arebounded and thus the
assumptions (A. 2, 3)
are satisfied.Again
thememory M’ in
(A .1) depends
on the memory of!F’ which can be assumed finite without loss ofgenerality
inpractical applications.
IV. PROOF OF THE THEOREM IV. 1. Notations
The
following
notations are used in this workThe norm of a matrix U is denoted
by
In
(4)
vk is the shiftbetween hk given by
thealgorithm ( 1 )
and theopti-
mum vector
To prove the convergence
of hk
toh*
we shall prove the convergenceof Vk
to the zero vector.Equations (2)
and(5) imply
that zk is zero mean,i. e.,
From
(1), (4)
and(5)
it follows thatAnnales de l’Institut Henri Poincaré-Section B
Let us
put (9)
in thefollowing form,
used in theproof
where
IV. 2. Basic idea of the
proof
The idea of the
proof
can befigured
outby
thefollowing
heuristic argu- ments. To prove thealgorithm
convergence, it is sufficient to show that both termsWk
andHk
tend to zero. In that caseU 0 , k
and all the terms insideHk
will tend to zero.Except
for the ratherspecial
casewhere the variable z~ is
identically
zero, z~ asgiven by (5)
is astationary
non-zero random variable. Hence the
product
will tend to zero also.For j
in thevicinity of k,
theproduct
matrixU j,k’ (6),
has very few factors and is not small. Thisrequires that ~ -~-
0as j
- oo, which is valid for the sequence(3).
Now for small valuesof j,
~u~ is notsmall ;
thus the conver-gence
requires
that « I orequivalently
thatIn this paper, an
inequality
similar to(13)
isproved on
the average.In
fact,
we show instep
3 of theproof,
that for agiven positive m
with fixed
positive
C andf3;
assume forsimplicity
that k= j
+nP ;
thendue to the
specific product
structure of the matrixU j,k
Now
when j
islarge, i
is small for i> j,
and it does notchange signifi- cantly
over the range[ j
+lP, j
+(l
+1)P].
Thus the first orderdevelop-
ment
is
suitable,
whenceA basic step of the
proof (Lemma 1)
is to show the «mixing »
propertythat 3P, 5o
>0,
such thatIt is
easily
conceivable thattaking
the suitableexpectation
of(15) along
the lines
(17) ( 18),
willyield
theproperty (14)
which expresses the decrease ofU~,~
to zero.The outline of the
proof
is thus. to prove
(18)
for the finite memory model(A .1) - step 1;
. to deduce from
( 18)
a bound for moments of certain orderof ~
- step
2 ;
. then to prove that moments of certain order
of II
decrease to zerowhen k -
oo, j « k, - step 3 ;
.
finally
to use these results to prove the q. m. convergence ofhk - step 4 ;
andlastly
its a. s. convergence -step
5.IV . 3. Proof of the theorem
As
just
mentioned theproof proceeds
in fivesteps.
Some of thesteps
are stated as lemmas. We use the notation
Step
1( LEMMA 1).
- Under theassumption (A . .1) of finite
memory, the condition(I8)
issatisfied, provided |Xk| has finite
momentsof
order greater, then 2N :As we have
shown,
theergodic
property(18) plays
a crucial role in the theorem. Therefore theproof
of Lemma 1-isreported
inAppendix
I,although already given by
the authors in a work[11 ] that
deals with the. Annales de l’Institut Henri Poincaré-Section B
constant
step-size algorithm.
Notice thatE ~
is anon-decreasing
function of P. Thus one can assume that P >
M’,
which is useful hereafter.Step
2. Thisstep
derives a bound to11m),
where the expo- nent m ispositive.
It is based upon thefollowing
lemma.LEMMA 2. a. -
If, for
agiven positive
eveninteger
m the sequenceXk satisfies
the multivariatefinite
momentsassumption (B) :
for all distinct indices
i 1, ... , iK
then, VP,
there exists atriple of positive
numbers rm,Fm
such thatThe order 2m of the moments will be choosen hereafter
according
totechnical
requirements.
This lemma iscompleted by
Lemma 2 . b : LEMMA 2. b. -If
the sequenceXk
hasfinite
memory M’according
to(A.I),
andfinite
univariante momentsaccording
to(A. 2)’ :
it
satisfies assumption (B).
Lemma 2 . a is
proved
inappendix
II. It uses thedecreasing
nature ofthe sequence To prove Lemma
2. b,
in theproduct
P’ 1.. :
we form M’ interlaced groups, each one
containing only factors Xi |p
whose indices i are
exactly separated by multiples
of M’.Then, using
forL =
M’,
the classicalinequality
which is a direct consequence of the Holder’s
inequality,
one obtains anupper bound for
which contains M’ factors such as
Due to
assumption (A .1),
thequantities (24)
can be factorized. There- fore(23)
isfinite, provided
In
particular assumption (A. 2’) implies
that(B)
holds.As a consequence of Lemmas 2 . a the moment bound
(21)
is fulfilled under
assumptions (A .1)
and(A . 2)’.
The combination of(21)
with the result
(18)
of Lemma 1provides
the desirable moment boundnamely
which was the purpose of step
2, subject
to condition(20).
Notice that(26)
can be rewritten
equivalently (due
to(3))
Step
3. - Thisstep
consists of theproof
of the boundfor a
given pair
ofpositive
numbers(C, (3).
The result(14)
is anexpression
of the fact
that II
~ 0 when t - oowith j « t,
as was discussedin the
previous
intuitive section. In order to prove(14),
let usorganize
the sequence of indices k within
U, t] ]
into three groupswhere n is the
integer
part of(t
-j)/2P.
The groupsFi
andI-’2
are inter-laced,
with an interval of P > M’ indices.According
to themultiplicative
property of the
product
matrix one getsAnnales de l’Institut Henri Poincaré-Section B
Using inequality (22)
with L =3,
it comesThanks to the
independence (A .1)
and to the structure(28)
of the sub- setsr1
andr2 ,
each of the first two moments in(30)
can besplit
into nfactors of the
type E( II 11m)
studied in Lemma 2 . a.Consider first the
case j
> rm for which(27)
is valid. A bound to the third factor in the RHSof’(30)
involves a finite number of multivariatemoments of order less or
equal
to2m,
withrespect
to thevariables
Due to Lemma 2. b all these moments are finite. Moreover the
step-sizes
Jlk that appears in are boundedby
Hence this third factor isuniformly
bounded. Thereforeinequality (30) implies
Using
the fact that 1 - for all real u, theinequality (14)
followsfrom
(31)
afterstraightforward
calculations.For the
case j
rm t, we use thedecomposition
and
apply
a similaranalysis
to the factor whichcorresponds
to anincreasing
number of indices. Theremaining
factorscorrespond
to a finitenumber of indices and can be dealt with like in the
previous
case.Finally,
and for the same reason, in the case t - rm, the momentsare bounded. This achieves the
proof
of(14).
Step
4. - In thisstep
we prove the q. m. convergenceof ht
to theopti-
mum vector
h~ ;
on the basis of(10),
it follows from the boundsInequality (32)
followsdirectly
from(11)
and the a. s. boundedness of the initial vector v 1,provided m/3 >
2.The detailed
proof
of(33)
isgiven
inappendix
III. It relies upon the fact thatThe
inequality (22)
with L = 4 isapplied
to individual terms in the sumof the RHS of
(34),
and then(14)
is used withm/3
= 4. When t islarge,
most of the terms
satisfy ! ~ 2014 7 ! #
M’ and due to theindependence (A. 1)
and to the zero-mean
property
of zk,they
get very small. At thispoint,
the reason
why
the order 24 M of the bounded momentsfor f
is atleast 24 M’ in the
theorem,
becomes apparent : q. m. convergence can beproved
with m = 12. Thusassumption (A . 2)’
of Lemma 2 . bimplies
M > M’ in
assumption (A. 2)
of the theorem.Step
5. In thisstep,
the a. s. convergence ofht
isproved.
It consistsof the
proofs
of the a. s. convergence of(i ) Wt
to zero and(ii ) Ht
to zero.For that let us assume, which is
possible, that f3
is less than one.i )
a. s. convergence to 01
Let q be a finite
integer
greater than - and let tn, n =1, 2,
... be a sub-sequence such that tn =
nq ;
then(32) implies
thatThe
equation (35)
is sufficient forDue to
(3),
-0, as j -
oo, whichimplies
forlarge enough indices j
Thanks to definition
(11)
and to(36), (37)
It follows from
(38)
thatAnnales de l’Institut Henri Poincaré-Section B
ii )
a. s. convergenceof Ht
to 0.~ , 2
Let q
1 be
a finiteinteger greater
than - and let tn, n =1,2, ...
be a sub-sequence such that
’
then
(33) implies
thatfrom which From
(12)
one hasAccording
to(3),
The
inequalities (37)
and(44) imply
The
ergodicity of
which results from theassumption (A.I), implies [14] ]
that
where the last
equation
in(46)
results from(40)
and the boundedness(A. 2, 3)
ofE( ~ ).
Theequations (42) (45)
and(46) imply
thatand the convergence
of ht
toh*
results from(39)
and(47).
End
of
theproof
In the above
proof,
severalsteps
make use of the finite memory model(A .1 ).
For instance instep 3,
the momenthas been factorized.
Although
notproved here,
it ispresumable
that thesame
steps
can begained
for moregeneral
models ofergodicity,
e. g. for thedecaying
covariance models.V. CONCLUSION
The convergence
analysis
of the stochasticgradient algorithm
withdecreasing step-size presented
in this paper concernes both the q. m. anda. s. convergence of the
algorithm
when it governs anadaptive
linear esti- mator. Unrealisticassumptions
such asindependence
of successive obser- vations and boundedness of thealgorithm
are avoided in this paper andreplaced by
the twoplausible assumptions
of bounded moments and finite memory for the observations. One of the contributions in this paper is the q. m. convergenceanalysis
ofalgorithms
without barrier and with correlated observations which iscompletely
new in the literature. The other contribution is thesimplicity
of theproof
of the a. s. convergence which does notappeal
to elaborate technicalarguments.
Annales de l’lnstitut Henri Poincaré-Section B
APPENDIX I
(Proof of
Lemma1 )
The purpose of this lemma is to establish that
under the assumption (A .1) of finite memory and some suitable assumption on the finiteness
of the ) Xk [-moments which will turn out to be
Using the notation (19), one has
and due to stationarity, the average is solely a function of P, which is non-decreasing starting
from zero. Thus we are searching for an integer P at which this function departs from zero.
To prove the lemma, we shall show that P = NM’ is suitable, i. e., that
Now due to definition (19)
N-1
Consider the determinant of the matrix
(I X’+1 +iM,XJ+ 1 + iM’
and denote by xn the nrht=o
component of the vector X1. Due to the multi-linearity of the determinant with respect
to each column one gets
where d(U l’ ..., UN) denotes the determinant of the matrix U with columns U 1, ..., U~.
If the indices ii, ..., iN are not all distinct, the latter determinant is zero. Consequently
one has
where ~ is the set of all permutations of [0, 1, 2, ..., N - 1 ]. Using the assumption (A .1 )
we can show that each term on the R. H. S. of (I. 7) is det (R). Therefore
c - v
N-1
Denoting 1, a 2, ..., 03BBN the eigenvalues of
03A3Xr+
1+iM’XTr+
1 +iM’, arranged in increasingorder, we get from (I . $) i = o
Obviously one has, Va E ]0, 1 [
It t follows from the Holder’s inequality that
Suppose in addition the validity of condition (1.2)-condition (20) in section IV. 3. Then the obvious bound
1
together with assumptions (A. 1), will result into the inequality
Combining the inequalities (I .10), (1.11), (1.13) we obtain E(d-1) > 0, i. e.,
Together with (1.5) this achieves the lemma proof.
Annales de l’lnstitut Henri Poincaré-Section B
APPENDIX II
(Proof of
Lemma 2 .a)
Remember that the sequence J.1i is decreasing and consider the development of j) Ur,r + P II I according to
~ , P - 1
Notice that
--
when r + 1 ~ i x ; + P .Thus (II.O) can be written ~~
where 1 p P and where i 1, ... , i p are distinct integers, with the agreement that the
product ... is 1 for p 2. Putting (ILl) to the power m and averaging, one gets
where 1 ~ k ~ m. Since
the bracketted average in the second term of (II.2) is bounded by a polynomial of the type
where 0 n m - k. Hence each coefficient of this polynomial is a moment of the type
According to assumption (B), all these moments are finite. Since 1 is bounded, the polynomial (II.4) itself is bounded. Hence, for some fixed positive constant 01
Let 03C9 and denote respectively an arbitrary point in the space Q of random events and the probability measure on that space. Then
Because m is even, one has
Using assumption (B) again for the moments of Y;, as was done for (II . 3), one gets
for some positive constant D2. It follows from (II.7), (II.8) that
+
pi+ I [E( 11
Y’~~2)
+ 2mE( 11 Y~))’")] .
(II. 10)The combination of (II. 6) and (II. 10) brings the result (21) of Lemma 2. a.
Annales de l’Institut Henri Poincaré-Section B
APPENDIX III
(A
bound toE( ] 2))
From (12) we have
Using the fact that
I Zj 1 ::s;
I + I h* I2
and the assumption (A. 2), we seethat a sufficient condition to achieve the inequality is that
In turn, according to the Holder’s inequality and to (A. 2), (111.2) and thus (III .1) will hold thanks to assumption (A. 3) on the moments of .
Then, applying the Schwarz inequality to individual terms in the first sum in the R. H. S.
of (111.0) and using (14) and the inequality f3 1, one obtains
for some finite positive constant G.
Now, consider the second term in the R. H. S. of (III. 0). We have
FIG. 1. - Illustration of the definitions in (III. 5).
where Si and S2 are defined by
The definitions of Si and S2 are illustrated by figure 1.
A bound to the first term in the R. H. S. of (111.4).
We have
Let
P~
be defined byBecause in Si, k >_ j + M’, then assumption (A .1) and the fact that Zj
in
(5) is zero-mean,both imply that
According to Lemma 2. b and to (3), there exists GZ > 0 such that
Using (III. 8) and (22) with L = 4 one obtains
Since Zj and zk are independent then it follows from (III.1, 9, 10) that
Using (14), (III.11) and (3) one obtains
From (III . 12) there exists a positive constant G3 such that
4 bound to the second term in the R. H. S. of (III .4)
Using the inequality (22) with L = 4, then (14) and (III. 1 ), one gets
Annales de l’Institut Henri Poincaré-Section B
Thus, there exists a positive constant G4 such that
From (III .4, 13, 15) we have
for some positive constant G 5. The result (33) follows immediately from (III .0, 3, 16).
[1] H. ROBBINS, S. MONRO, « A Stochastic Approximation Method », The Annals of
Mathematical Statistics, t. 22, 1951, p. 400-407.
[2] L. SCHMETTERER, « Stochastic Approximation », Proc. of the Fourth Berkeley Symp.
on Math. Stat. and Proba., p. 587-609.
[3] D. SAKRISON, « Stochastic Approximation, a Recursive Method for Solving Regression
Problems », Adv. in Comm. Systems, A. V. Balakrishnan
Éditeur,
Acad. Press, 1966, p. 51-106.[4] C. MACCHI, Itération stochastique et Traitements numériques adaptatifs, Thèse
d’État,
Paris, 1972.
[5] B. WIDROW, J. M. MCCOOL, M. G. LARIMORE, C. R. JOHNSON, « Stationary learning characteristics of the LMS adaptive filter » Proc. IEEE, t. 64, n° 8, 1976, p. 1151-1162.
[6] O. MACCHI, « Résolution adaptative de l’équation de Wiener-Hopf. Cas d’un canal
de données affecté de gigue », Ann. Inst. Henri Poincaré, t. 14, n° 3, 1978, p. 355-377.
[7] H. J. KUSHNER, D. S. CLARK, « Stochastic approximation methods for constrained and unconstrained systems », Appl. Math. Sci. Series, n° 26, Springer-Verlag, Berlin, 1978, ch. 1.
[8] L. LJUNG, « Analysis of a recursive stochastic algorithm », IEEE Trans. on Automatic Control, t. AC 22, 1977, p. 551-575.
[9] D. C. FARDEN, « Stochastic Approximation with Correlated Data », IEEE Trans.
on Information Theory, t. IT 27, n° 1, 1981, p. 105-113.
[10] E. EWEDA, O. MACCHI, « Convergence of an adaptive linear estimation algorithm », IEEE Trans. on Automatic Control, t. AC 28, n° 10, octobre 1983.
[11] O. MACCHI, E. EWEDA, « Second order convergence analysis of stochastic adaptive
linear filtering ». IEEE Trans. on Automatic Control, t. 28, n° 1, janvier 1983, p. 76-85.
[12] E. EWEDA, Egalisation adaptative d’un canal filtrant non stationnaire. Thèse Ingénieur- Docteur, Orsay, 19 mars 1980.
[13] E. EWEDA, O. MACCHI, « Poursuite adaptative du filtrage optimal non stationnaire ».
C. R. Acad. Sci., t. 293, série I, 1981, p. 497-500.
[14] H. CRAMER and M. R. LEADBETTER, Stationary and Related Stochastic Processes.
New York: Wiley, 1967.
(Manuscj-it reyu le 24 _f E’I’1-IE’1- 198]) >
Vol. XIX, n° 3-1983. 11 1