Multidimensional version of the results of Komlós, Major and Tusnády for vectors with finite exponential moments

(1)

MULTIDIMENSIONAL VERSION

OF THE RESULTS OF KOML

OS, MAJOR AND TUSN

ADY

FOR VECTORS WITH FINITE EXPONENTIAL MOMENTS

A.YU. ZAITSEV

Abstract. A multidimensional version of the results of Komlos, Ma- jor and Tusnady for the Gaussian approximation of the sequence of successive sums of independent random vectors with nite exponential moments is obtained.

1. Introduction

The statement of the problem is well known. It is required to construct on a probability space a sequence of independent random vectors

X

¹

:::X

n

(with given distributions) and a corresponding sequence of independent Gaussian random vectors

Y

¹

:::Y

n (this means that

Y

ihas the same mean and the same covariance operator as

X

i,

i

= 1

:::n

) so that the quantity

(

XY

) = max

1 k n k

P

i⁼¹

X

i^;^Pk

i⁼¹

Y

i (1.1)

would be so small as possible with large probability. Here^j

x

^j= max^j

x

_j^j, for

x

= (

x

¹

:::x

d) ² ^R^d. Sometimes we shall use another notation for summands and, by analogy with notation in Sakhanenko (1984), write (

X

^e

Y

^e)=

max

1 k n k

P

i⁼¹

X

ei^; ^Pk i⁼¹

Y

ei and so on. In some sense this problem is one of the most important in probability approximations because many well-known probability theorems can be considered as consequences of results about strong approximation of sequences of sums by corresponding Gaussian sequences. This is related to the law of iterated logarithm, to several theorems about large deviations, to the estimates for the rate of convergence of the Prokhorov distance in the invariance principles (see Prokhorov (1956), Skorokhod (1961), Borovkov (1973)) as well as to the Strassen-type (1964) approximations (see, for example, Csorg}o and Hall (1984)).

The rate of approximation in the invariance principle was studied by many authors (see, e.g., Prokhorov (1956), Skorokhod (1961), Borovkov (1973), Csorg}o and Revesz (1975) and the bibliography in Csorg}o and Revesz (1981),

URL address of the journal: http://www.emath.fr/ps/

Research supported by the Alexander von Humboldt Foundation, by the Russian Foun- dation of Basic Research, grants 93-11-01454 and 96-01-00672, by grant RFBR-DFG 96- 01-00096-ge, and by the International Science Foundation, grant R3600.

Received by the journal April 30, 1996. Revised November 18, 1997. Accepted for publication January 16, 1998.

c

Societe de Mathematiques Appliquees et Industrielles. Typeset by TEX.

(2)

Csorg}o and Hall (1984), Shao (1995)). Skorokhod (1961) worked out a powerful method known now as Skorokhod embedding. For a long time it seemed that it is impossible to propose a method which could be better.

However, in 1975, Komlos, Major and Tusnady (KMT) (1975{76) worked out a new method of dyadic approximation. With the help of this method they obtained an exhaustive solution of the problem mentioned above for sequences of independent identically distributed random variables. Sakha- nenko (1984) generalized and essentially improved KMT results for the case of non-identically distributed random variables.

The rst attempts to extend the KMT approximations to the multidimensional case (see Berkes and Philipp (1979), Philipp (1979), Berger (1982), Einmahl (1987a,b)) had a partial success only. Comparatively recently U. Einmahl (1989) obtained multidimensional analogs of KMT results which are close to optimal. However, in his results (in general case of vectors with nite exponential moments) there is an unnecessary logarithmic factor. In this paper we improve the result of Einmahl (1989) and obtain multivariate analogs of KMT results.

Notation 1.1. Writing

z

² ^R^d (resp. ^C^d), we always keep in mind that

z

= (

z

¹

:::z

d) =

z

¹

e

¹++

z

d

e

d, where

z

j ²^R¹ (resp. ^C¹) and the

e

j

are the basis vectors. The scalar product of vectors

xy

² ^R^d(resp. ^C^d) is denoted by

xy

=

x

¹

y

¹++

x

d

y

_d. For

z

²^R^d(or^C^d) we shall use the Euclidean norm ^k

z

^k =

zz

¹⁼² and the maximum norm ^j

z

^j= max

1 j k^j

z

j^j. For

b >

0 we denote

L

(

b

) = max1

log

b

. The distribution and the corresponding covariance operator of a random vector

will be denoted by ^L(

) and cov

(or cov

F

, if

F

=^L(

)). The letter

I

will be used for the iden- tity operator. The symbols

cc

¹

c

²

:::

(without arguments) will be used for absolute positive constants. The letter

c

can denote dierent constants when we do not need to x their numerical values. The numbering of constants starts in each new section again, except Sections 6{9, where we use the common enumeration.

Definition 1.2. (Zaitsev (1986)) Denote by ^Ad(

) a class of probability distributions depending on a parameter

0. The class ^Ad(

) consists of

d

-dimensional distributions

F

for which the function

'

(

z

) =

'

(

Fz

) = log

Z

Rd

e

^h^zxⁱ

F

^f

dx

^g (

'

(0) = 0) (1.2) is dened and analytic for^k

z

^k

<

1,

z

²^C^d, and

d

u

d

²_v

'

(

z

) ^k

u

^k

Dvv

(1.3) for all

u

,

v

²^R^dand ^k

z

^k

<

1, where

D

= cov

F

, and

d

u

'

is the derivative of

'

in the direction

u

.

In Section 2 we consider some properties of classes^Ad(

). As examples of distributions from ^Ad(

c

) one can consider the distributions concentrated on the ball ^B =

x

² ^R^d: ^k

x

^k

, more general distributions satisfying Bernstein-type inequality conditions and innitely divisible distributions

(3)

with spectral measures concentrated on ^B (see Zaitsev (1986), pp. 205{

207).

The following Theorem 1.3 is the main result of the paper. It was an- nounced in Zaitsev (1995a, 1996b, 1997) and, in somewhat weakened form, in Zaitsev (1995b).

Theorem 1.3. Suppose that

1,

>

0 and

¹

:::

nare random vectors with distributions^L(

k)²^Ad(

), ^E

k = 0, cov

k =

I

,

k

= 1

:::n

. Then one can construct on a probability space a sequence of independent random vectors

X

¹

:::X

n and a corresponding sequence of independent Gaussian random vectors

Y

¹

:::Y

n so that ^L(

X

k) =^L(

k),

k

= 1

:::n

, and

E exp

c

¹( )(

XY

)

d

³

L

(

d

)

exp^;

c

²( )

d

⁹⁼⁴⁺

L

^;

n=

²

(1.4) where

c

¹( ),

c

²( ) are positive quantities depending only on .

In a particular case, when

d

= 1 and all summands have a common variance, Theorem 1.3 is equivalent to the main result of Sakhanenko (1984).

One should take into account that, using the Holder inequality, we can trans- fer the factors under exponential sign in the right-hand side of (1.4) to the denominator of the fraction in the left-hand side of (1.4). Naturally, this makes the result weaker. Einmahl (1989) considered multidimensional vectors with identical covariance operators, satisfying multidimensional analogs of Sakhanenko's conditions.

One can easily verify that if a vector

has nite exponential moments

E

e

^h^hⁱ, for

h

²

V

, where

V

^R^d is some neighborhood of zero, then

F

=^L(

)²^Ad(

c

(

F

)). Therefore, Theorem 1.3 can be considered as a generalization and renement of the main result of KMT (1975{76). In particular, from Theorem 1.3 one can easily derive the following result, obtained by KMT (1975{76) in the one-dimensional case.

Corollary 1.4. Suppose that a random vector

has nite exponential moments ^E

e

^h^hⁱ, for

h

²

V

, where

V

^R^d is some neighborhood of zero.

Then one can construct on a probability space a sequence of independent random vectors

X

¹

X

²

:::

and a corresponding sequence of independent Gaussian random vectors

Y

¹

Y

²

:::

so that ^L(

X

k) = ^L(

),

k

= 1

2

:::

,

and _n

P

j⁼¹

X

j^; ^Pn

j⁼¹

Y

j =

O

(log

n

) a.s. (1.5) An analog of this result was obtained by Einmahl (1989) under additional smoothness-type restrictions on the distributions^L(

). Einmahl (1989, The- orem 10), has also proved an analog of Theorem 1.3 but only for suciently smooth distributions ^L(

). As it is noted in KMT (1975{76), from the results of Bartfai (1966) it follows that the accuracy of approximation in (1.5) is the best possible.

The following Theorem 1.5 (by means of its consequence, Theorem 9.1) will be used in the proof of Theorems 1.3 and 1.6. However, it is of independent interest because sometimes it can give a more exact dependence of constants on the characteristics of distributions of summands.

(4)

1,

>

0 and

^e¹

:::

^en are random vectors with distributions^L(

^ek)²^Ad(

), ^E

^ek= 0, cov

^ek=

I

,

k

= 1

:::n

. Assume that there exist random vectors^e

k which have the same moments of the rst three orders as the vectors

^ek,^L(

^ek)²^Ad(

) and ^P

^ek

= 1,

k

= 1

:::n

. Then one can construct on a probability space a sequence of independent random vectors

X

^e¹

::: X

^e_n and a corresponding sequence of independent Gaussian random vectors

Y

¹

:::Y

n so that ^L(

^ek) = ^L(

X

^ek),

k

= 1

:::n

, and

E exp

c

³( )(

XY

^e )

d

³

²

exp^;

c

⁴( )(

d

³⁼²

)³⁼²⁺

L

(

n

)

where

c

³( ),

c

⁴( ) are positive quantities depending only on .

The following Theorem 1.6 shows that if all moments of the third order of the vectors

¹

:::

nin Theorem 1.3 are equal to zero, then the dependence of constants on the dimension can be slightly sharpened.

1,

>

0 and

¹

:::

n are random vectors with distributions^L(

_k)²^Ad(

), ^E

_k= 0, cov

_k=

I

,

k

= 1

:::n

. Assume that for all

uvw

²^R^d we have

E

k

u

k

v

k

w

= 0

k

= 1

:::n:

(1.6) Then one can construct on a probability space a sequence of independent random vectors

X

¹

:::X

n and a corresponding sequence of independent Gaussian random vectors

Y

¹

:::Y

n so that ^L(

X

k) = ^L(

k),

k

= 1

:::n

, and

E exp

c

⁵( )(

XY

)

d

³

exp^;

c

⁶( )

d

⁹⁼⁴⁺

L

^;

n=

²

(1.7) where

c

⁵( ),

c

⁶( ) are positive quantities depending only on .

Remark 1.7. It is evident that the condition (1.6) is automatically fullled if the vectors

¹

:::

n have symmetric distributions.

In Theorems 1.3, 1.5 and 1.6 the random vectors

¹

:::

nand

^e¹

:::

^en

are, generally speaking, non-identically distributed. However, they have the same covariance operator

I

. Thus, the problem of obtaining an adequate multidimensional generalization of the main result of Sakhanenko (1984) remains open. By the author's opinion, the proof of Theorem 1.3 can be transformed in the proof of an analogous result for vectors with non-identical covariance operators. The conditions cov

j =

I

,

1 can be apparently changed by

B

_j²

²,

B

²_j

_j², where

B

_j² _j² are respectively the maximal and the minimal eigenvalues of cov

j,

j

= 1

:::n

. The constants will depend, naturally, on

, and

n

should be changed in the right-hand side of (1.4) by the maximal eigenvalue of the covariance operator of the last sum

¹ ++

n. The author hopes to consider this situation in a separate paper. Moreover, it is well known that the results similar to The- orems 1.3, 1.5 and 1.6 imply many useful consequences. For example, one can derive estimates in the invariance principle in the case when the summands

X

j satisfy less restrictive moment conditions (see, e.g., Shao (1995) and the bibliography in Shao (1995)). The author is going to devote to the consequences of Theorems 1.3, 1.5 and 1.6 also a separate publication.

(5)

In Theorems 1.3, 1.5 and 1.6 we consider the case

1. Obviously, the assertions of these theorems become stronger for small

. An analog of The- orem 1.3 in a particular case when the summands have smooth distributions and

can be arbitrarily small is obtained in Gotze and Zaitsev (1997). Note that for small

the distributions of summands are close to Gaussian ones (see Zaitsev (1986)).

The proof of Theorems 1.3, 1.5 and 1.6 consists of several steps, many of which are close to corresponding steps from the proofs of the main results of KMT (1975{76), Sakhanenko (1984) and Einmahl (1989). In Section 2 we introduce the necessary notation and formulate auxiliary results. In particular (see Denition 2.2), we dene the classes^Rd^;

"

^Ad(

)^Ad(

) of pairs of distributions with close cumulants of the second and third orders and discuss the properties of classes^Ad(

) and^Rd^;

0 . We dene as well the classes ^A_d(

)^Ad(

) of distributions satisfying some smoothness conditions which guarantee the existence of smooth densities (see Deni- tion 2.12).

The main tool for the proof of Theorems 1.3, 1.5 and 1.6 is the estimates for the closeness of quantiles of one-dimensional conditional distributions contained in Lemmas 3.1 and 4.1. For the proof of Lemmas 3.1 and 4.1 we use Lemmas 2.14 and 2.15 which are completely proved in Za- itsev (1996b), Lemmas 6.1 and 7.1. In Zaitsev (1996b) one can also nd a sketch of the proof of Lemmas 3.1 and 4.1. We give the complete proof of these lemmas in Sections 3 and 4. In Lemma 2.14 we give a non-uniform estimate for large deviations in the local CLT for conditional distributions of the last coordinates of vectors with smooth distributions from^Ad(

) under condition that the rst

d

^;1 coordinates are xed (see (2.17)). Analogously, Lemma 2.15 contains a non-uniform estimate for the closeness of conditional densities of smooth distributions from ^Rd^;

0 (see (2.23)). Moreover, Lemma 2.15 states that if

w >

0 is separated from zero and from inn- ity, then the derivative of the logarithm of conditional density at point

w

is negative. An estimate of this derivative is also presented (see (2.24)). This estimate is essentially used in the proof of Lemma 4.1.

In Lemma 3.1 we give the estimates for the closeness of quantiles in the CLT for conditional distributions of the last coordinates of vectors with smooth distributions from ^Ad(

) under condition that the rst

d

^;1 coordinates are xed (see (3.1)). Analogously, in Lemma 4.1 we provide an estimate for the closeness of quantiles of conditional distribution functions of smooth distributions from^Rd^;

0 (see (4.1)). The additional information about the closeness of cumulants leads to the better orders of estimates in Lemma 4.1 in comparison with Lemma 3.1.

In Section 5 we describe the KMT (1975{76) scheme of dyadic approximation. This scheme was essentially used by Sakhanenko (1984) and Ein- mahl (1989). Note, however, that the formulations and the proofs of KMT (1975{76, inequality (2.5)), Sakhanenko (1984, Lemma 1, p. 34) and Ein- mahl (1989, inequality (4.11)) are close but slightly dierent. Our approach practically coincides with Einmahl's one. As in KMT (1975{76), Sakha- nenko (1984) and Einmahl (1989), we suppose that some independent random vectors with smooth distributions are already constructed and dene in-

(6)

dependent random vectors with given distributions by successive construct- ing sums of blocks of summands, containing 2^N

2^N^;1

2^N^;2

:::

4

2

1 summands. For this we apply the so-called Rosenblatt (1952) quantile transformation which was already used by Einmahl (1989). This transformation is a natural generalization of the usual one-dimensional quantile transformation which have been applied by KMT (1975{76) and Sakhanenko (1984).

The scheme of the proof of Theorem 1.3 is very close to that of the main result of Sakhanenko (1984). At rst we suppose that the Gaussian vectors

Y

¹

:::Y

_n,

n

= 2^N, are already constructed and construct the independent vectors which are bounded with probability one, have suciently smooth distributions and the same moments of the rst, second and third orders as the needed independent random vectors

X

¹

:::X

n. Then we construct the vectors

X

¹

:::X

n in several steps. After each step the number of

X

j

which are not constructed becomes smaller in 2^ptimes, where

p

is a suitably chosen positive integer. In each step we begin with already constructed vectors which are bounded with probability one and have suciently smooth distributions and the needed moments up to the third order. Then we construct the vectors such that, in each block of 2^p summands, only the rst vector has the initial bounded smooth distribution. The rest 2^p^;1 vectors have the needed distributions^L(

_j). These 2^p^;1 vectors from each block will be chosen as

X

j and will be not involved in the next steps of the procedure. The coincidence of third moments will allow us to use more precise estimates of the closeness of quantiles of conditional distributions obtained in Section 4.

In Section 6 we realize the rst step of the procedure just described.

We estimate the rate of approximation in the case when we construct on a probability space the independent Gaussian vectors

Y

j and the independent vectors

X

_j with smooth distributions and the needed moments up to the third order. The main result of Section 6 is formulated in Theorem 6.4.

Sections 7 and 8 are devoted to the estimation of the rate of approximation in the case when we begin with the vectors

Y

^e¹

::: Y

^e_n,

n

= 2^N, having smooth distribution with bounded supports and with the same moments up to the third order as the needed vectors

X

^e¹

::: X

^en and construct the vectors

X

^e¹

::: X

^en in several steps, diminishing after each step the number of

X

^ej which are not constructed in 2^p times. The scheme of the proof is a modied version of that from Sakhanenko (1984). We use the induction with respect to the number of steps of the procedure and prove that this procedure provides the good estimate for

j(

N

) = _k^P^j

=1

Y

ek^; ^Pj k⁼¹

X

ek (1.8)

(see (8.28), (8.94), (8.97)). It is very important that in (1.8) there is no supremum over

j

= 1

:::

2^N. This supremum is taken in the last moment only (see (8.98)). At the end of Section 8 we give the proof of Theorem 1.5.

Theorems 1.3 and 1.6 are proved in Section 9. At rst we prove an auxiliary result (Theorem 9.1) which is in fact a simple consequence of The- orem 1.5. However, its formulation is more convenient than that of The- orem 1.5 when we investigate the character of dependence of constants of

(7)

the parameters. This is taken into account in the proofs of Theorems 1.3 and 1.6. For the proof of Theorem 1.3 we need also Lemma 9.2 which shows that there exist absolute positive constants

c

⁷

c

⁸ such that for any distribution

F

²^Ad(1) with mean zero and cov

F

=

I

there exist a distribution

G

²^Ad(

c

⁷) with the same moments up to the third order and such that

G

x

:^j

x

^j

c

⁸

L

(

d

)= 1. It can be veried that this statement is valid without

L

(

d

) if all the moments of the third order of the distribution

F

are equal to zero. Therefore, the factor

L

(

d

) from (1.4) is absent in the denominator of the fraction in the left-hand side of the inequality (1.7) of Theorem 1.6.

2. Notation and auxiliary results

Notation 2.1. Let ^Fd be the set of all

d

-dimensional probability distributions dened on the

-eld of Borel subsets of the Euclidean space ^R^d. Let ^Gd be the collection of all Gaussian distributions from ^Fd. Below

symbolizes dierent quantities not exceeding one in absolute value,

E

a is the distribution concentrated at a point

a

. By

F

^b(

t

),

t

²^R^d, we denote the characteristic function of a distribution

F

²^Fd. The product of measures is understood as their convolution:

F G

=

F G

. By ^N(0

I

)²^Gd we denote the Gaussian distribution with mean zero and covariance operator

I

. The notation (

F

) will be used for the Gaussian distribution whose mean and covariance operator are the same as those of a distribution

F

²^Fd. In one- dimensional case we denote by () the distribution function of the normal law with mean zero and variance

² and by

'

() the corresponding density.

We shall identify the covariance operators with corresponding covariance matrices.

Let

0,

F

= ^L(

) ² ^Ad(

), ^k

h

^k

<

1,

h

² ^R^d. Then the conjugate distribution

F

=

F

(

h

) is dened by

F

^f

dx

^g=^;^E

e

^h^hⁱ ^;1

e

^h^hxⁱ

F

^f

dx

^g (2.1) (sometimes it is called the Cramer transform). Denote by

(

h

) a random vector with^L^;

(

h

) =

F

(

h

). It is clear that

F

(0) =

F

and

if

U

¹

:::U

n²^Ad(

)

^k

h

^k

<

1

U

=_j^Qⁿ

=1

U

j

then

U

(

h

) =_j^Qⁿ

=1

U

j(

h

)

:

(2.2) Definition 2.2. For

"

0 denote by ^Rd^;

"

the collection of pairs of

d

-dimensional probability distributions (

F

¹

F

²) such that

F

k ² ^Ad(

),

k

= 1

2, and

d

u

d

²_v

g

¹(0)^;

d

u

d

_v²

g

²(0)

"

^k

u

^k

Dvv

d

²_v

g

¹(0)^;

d

_v²

g

²(0)

"

²

Dvv

^(2.3) for all

uv

²^R^d, where

D

=

D

¹+

D

²,

g

k(

z

) =

'

(

F

k

z

),

D

k= cov

F

k. Remark 2.3. If

"

= 0, then the relation (

F

¹

F

²)²^Rd^;

"

means that the distributions

F

¹

F

² ² ^Ad(

) have identical cumulants of second and third orders.

(8)

Classes^Ad(

) and^Rd^;

"

have properties which are very convenient for applying. Some of them were considered in Zaitsev (1986, 1988, 1996a).

It is evident that if

¹

<

², then^Ad(

¹)^Ad(

²). Moreover, it is easy to see that, for xed

, the class^Ad(

) is closed with respect to convolution: if

F

¹

F

²

:::F

n²^Ad(

), then

F

¹

F

²

F

n ²^Ad(

). The class^Ad(0) coincides with the class^Gdof all

d

-dimensional Gaussian distributions. The following inequality was proved in Zaitsev (1986) and can be considered as an estimate of stability of this characterization: if

F

²^Ad(

),

>

0, then

^;

F

(

F

)

cd

²

L

(

^;1), where

(

) is the Prokhorov distance. Other useful properties of classes ^Ad(

) and ^Rd^;

0 are contained in the following Lemmas 2.4, 2.5, 2.7, 2.8 and 2.11.

Lemma 2.4. Suppose that

0,

F

²^A_d(

),

h

²^R^d, ^k

h

^k

<

1,

F

=^L(

),

D

= cov

,

D

(

h

) = cov

(

h

). ^E

= 0. Then

a) for all

u

²^R^dthe following relations are valid:

D

(

h

)

uu

=

Duu

^;1 +

^k

h

^k

(2.4) log ^E

e

^h^hⁱ_{= 12}

Dhh

_{1 + 13}

^k

h

^k

(2.5) log^E

e

ⁱ^h^hⁱ=^; 1

2

Dhh

_{1 + 13}

^k

h

^k

(2.6) b) if ^k

h

^k

¹², then

F

(

h

)²^Ad(2

).

Lemma 2.4 is contained in Zaitsev (1986, Lemma 2.1).

0,

F

=^L(

)²^Ad(

),

y

²^R^m, ²^R¹, and

A

:^C^d ^!^C^m is a linear operator such that

A

(^R^d)^R^m. Let

^e²^R^kbe the vector consisting of a subset of coordinates of the vector

. Then

L(

A

+

y

)²^Am^;^k

A

^k

where ^k

A

^k= sup_x

2Rd^kx^k ¹^k

Ax

^k

L( )²^Ad(^j ^j

)

^L(

^e)²^Ak(

)

:

Proof. Put

(

z

) = log ^E

e

^h^zA⁺^yⁱ = log ^E

e

^h^{A z}ⁱ+

zy

(2.7) where

A

:^C^m ^!^C^d is the adjoint operator for

A

,

z

²^C^m. Let

D

= cov

,

M

= cov

A

. By virtue of (1.2), (1.3), (2.7), for all

uv

² ^R^m,

z

² ^C^m,

k

A

z

^k

<

1, the following relations are valid:

d

u

d

_v²

(

z

) =

d

A u

d

_{A v}²

'

(

Fw

) _w⁼_{A z}

k

A

u

^k

DA

vA

v

^k

A

^k^k

u

^k

^E

^; ^E

A

v

²

=^k

A

^k^k

u

^k

^E

A

^; ^E

Av

²=^k

u

^k^k

A

^k

M v v

:

(2.8) If ^k

z

^k ^k

A

^k

<

1 then ^k

A

z

^k

A

^k^k

z

^k

= ^k

z

^k ^k

A

^k

<

1 and the inequality (2.8) is also valid. From (2.7), (2.8) and from denition of classes ^Ad(

) it follows that ^L(

A

+

y

) ² ^Am^;^k

A

^k

. The relations

(9)

L( ) ² ^Ad(^j ^j

), ^L(

^e)²^Ak(

) follow directly from the statement just

proved. ^t^u

Corollary 2.6. Let the conditions of Lemma 2

:

5 be satised, ² ^R¹, (

F

¹

F

²) ²^Rd^;

0 ,

F

j = ^L(

j),

y

j ²^Rm,

j

= 1

2, and let

^e¹

^e² ² ^R^k be the vectors consisting of identical subsets of coordinates of the vectors

¹

² respectively. Then

;

L(

A

¹+

y

¹)

^L(

A

²+

y

²) ²^Rm^;^k

A

^k

0

;

L( ¹)

^L( ²) ²^Rd^;^j ^j

0

^;^L(

^e¹)

^L(

^e²) ²^Rk(

0)

:

Proof. It is sucient to use Lemma 2.5, the denition of classes^Rd^;

"

and the corresponding analogs of the relations (2.7), (2.8), taking into account

Remark 2.3. ^t^u

0,

F

k =^L^;

⁽^k⁾ ² ^Adk(

), and the vectors

⁽^k⁾,

k

= 1

2, are independent. Let

²^R^d¹⁺^d² be the vector with the rst

d

¹ coordinates coinciding with those of

⁽¹⁾ and with the last

d

² coordinates coinciding with those of

⁽²⁾. Then

F

=^L(

)²^Ad¹⁺d²(

).

Proof. Let

D

= cov

,

D

⁽^k⁾ = cov

⁽^k⁾,

k

= 1

2. Using the natural analogous notation, we see that for all

u

,

v

²^R^d¹⁺^d² and

z

²^C^d¹⁺^d²,^k

z

^k

<

1,

d

u

d

²_v

'

(

Fz

) =

d

u

d

_v²^;

'

(

F

¹

z

⁽¹⁾) +

'

(

F

²

z

⁽²⁾)

=

d

_u⁽¹⁾

d

_v²⁽¹⁾

'

(

F

¹

z

⁽¹⁾) +

d

_u⁽²⁾

d

_v²⁽²⁾

'

(

F

²

z

⁽²⁾)

k

u

⁽¹⁾^k

D

⁽¹⁾

v

⁽¹⁾

v

⁽¹⁾+^k

u

⁽²⁾^k

D

⁽²⁾

v

⁽²⁾

v

⁽²⁾

k

u

^k

Dvv

:

It remains to use the denition of classes^Ad(

). ^t^u Lemma 2.8. Suppose that

0, ^;^L(

¹⁽^k⁾)

^L(

²⁽^k⁾) ² ^Rdk(

0),

k

= 1

2, and the pairs of vectors

⁽¹⁾_j

_j⁽²⁾,

j

= 1

2, are the pairs of independent vectors. Let

j ²^Rd¹⁺d²,

j

= 1

2, be the vectors with the rst

d

¹ coordinates coinciding with those of

_j⁽¹⁾ and with the last

d

² coordinates coinciding with those of

_j⁽²⁾. Then ^;^L(

¹)

^L(

²) ²^Rd¹⁺d²(

0).

Proof. It suces to apply Lemma 2.7, taking into account Remark 2.3. ^t^u Remark 2.9. Naturally, Lemmas 2.7 and 2.8 can be evidently extended on the case of any nite number of independent vectors.

Remark 2.10. Lemma 2.7 can be easily derived from Lemma 2.5, using the completeness of classes^Ad(

) with respect to convolution.

Lemma 2.11. Let

0 and

X

¹

:::X

n be independent random vectors with ^L(

X

i) ²^Ad(

), ^E

X

i = 0,

S

i =

X

¹ ++

X

i, for

i

= 1

:::n

. Let

ht

² ^R¹, 0

h <

1, 0

t <

¹²,

= max

1 i n

S

i . Denote by

B

² the maximal eigenvalue of cov

S

n. Then

E

e

^h^j^Sⁿ^j2

de

^h²^B²

^E

e

^t 3

de

⁴^t²^B² and

P

x

2

d

maxexp^;^;

x

²4

B

²

exp^;^;

x

4

x

0

:

(2.9)

(10)

Proof. By Lemma 2.5, the distributions of the coordinates of the vectors

X

i

belong to the class^A¹(

) and it suces to prove the statement of the lemma for

d

= 1 (recall that^j^jmeans the maximum norm). Using the completeness of the classes^A¹(

) with respect to convolution, we see that by Lemma 2.4, inequality (2.5),

E

e

^h^j^Sⁿ^j ^E

e

^hSⁿ + ^E

e

^;^hSⁿ 2exp 1

2

h

²

B

²_{1 + 13}

h

2

e

^h²^B²

:

(2.10) Note that the random sequence exp^;

h

^j

S

i^j (indexed by

i

) is a positive submartingale. Therefore, by Doob's inequality (see Doob (1953), p. 314) and by (2.10),

P

x

= ^P

e

^h

e

^hx ^E exp^;

h

^j

S

n^j^;

hx

2exp^;

h

²

B

² ^;

hx :

(2.11) We then deduce (2.9) from (2.11) via the classical Bernstein{Cramer{Cher- no calculation. Integrating by parts and applying (2.11) with

h

= 2

t

, we get

E

e

^t= 1 +

Z

1

0

te

^tx^P

x

dx

3

e

⁴^t²^B²

:

t u

Definition 2.12. Let

0,

>

0,

>

0. Denote by ^A_d(

) the class of distributions

F

² ^Ad(

) such that for

h

² ^R^d, ^k

h

^k

<

1, and for all

v

²^R^dthe following inequalities hold:

(2

)^;^d

Z

^kt^kd¹

F

b_h(

t

)

dt

²

d

²

²(2

)^d=²(det

D

)¹⁼²

(2.12) (2

)^;^d

Z

^kt^kd¹

tv

F

^bh(

t

)

dt

D

^;1

vv

¹⁼²

(2

)^d=²(det

D

)¹⁼²

(2.13) where

F

h=

F

(

h

),

D

= cov

F

and

²

>

0 is the minimal eigenvalue of

D

.

According to (2.1), we have (if

F

=^L(

))

F

bh(

t

) = ^E

e

^h^it⁽^h^{) i}=^;^E

e

^h^hⁱ ^;1^E

e

^h^h⁺^itⁱ

:

(2.14) Therefore, the conditions (2.12), (2.13) play in this paper the role which is similar to that of Sakhanenko (1984, inequality (49), p. 9) or Einmahl (1989, inequality (1.5)). Note, however, that the condition (2.13) is needed for estimating the derivatives of densities (see (2.24)). Such estimates are necessary for obtaining more precise bounds for the closeness of conditional distributions when the compared distribution have the coinciding third moments.

Notation 2.13. Below for

x

= (

x

¹

:::x

d) ² ^R^d,

d

2, we shall denote by

x

⁰the vector

x

⁰= (

x

¹

:::x

d^;1) consisting of the rst

d

^;1 coordinates of

x

. Analogously, we shall use a prime to denote the matrix

D

⁰ of size

(11)

(

d

^;1)(

d

^;1) consisting of the rst

d

^;1 rows and rst

d

^;1 columns of a matrix

D

of size

d

. If

F

=^L(

) and

²^R^dwe shall write

F

⁰=^L(

⁰).

The following Lemmas 2.14 and 2.15 are completely proved in Zaitsev (1996b, Lemmas 6.1 and 7.1).

0,

F

=^L(

)²^A_d(

4

4) and, if

d

2

F

⁰=^L(

⁰)²^A_d^;1(

4

4)

:

(2.15) Assume that ^E

= 0 and the last coordinate

d of the vector

is not corre- lated with the previous ones. Let

² =

De

d

e

d= ^E

_d²

>

0 be the minimal eigenvalue of

D

= cov

F

. Denote by

p

(

x

),

x

²^R^d, the probability density of distribution

F

. Let

p

⁰(

x

⁰),

x

⁰²^R^d^;1,

d

2, be the density of the vector

⁰. Denote by

p

⁽^d⁾(

x

d^j

x

⁰) =

p

(

x

)

p

⁰(

x

⁰),

x

d²^R¹, the conditional density of the

d

^th coordinate

d of the vector

²^R^dfor a xed value of

⁰ =

x

⁰ (for

d

= 1 we dene

p

⁽¹⁾(

x

¹^j

x

⁰) =

p

(

x

)). Denote by

G

the corresponding (depending on

x

⁰) one-dimensional conditional distribution.

Then there exist absolute positive constants

c

¹

:::c

⁶ such that the following statements are true for

d

³⁼²

c

¹, ^;

D

⁰ ^;1⁼²

x

⁰

c

²

d

³⁼²

.

For

d

2 there exists a parameter

h

⁰ =

h

⁰(

x

⁰) ² ^R^d^;1 which gives the solution of the equation ^E

⁰(

h

⁰) =

x

⁰. Dene the parameter

h

²^R^d which has the rst

d

^;1 coordinates coinciding with corresponding coordinates of

h

⁰ and the last

d

^th coordinate

h

d = 0. If

d

= 1, we take

h

= 0. Denote

y

⁰ =

E

(

h

)

e

d. Then

j

y

⁰^j2

:

88

^;

D

⁰ ^;1⁼²

x

⁰² 2

:

88

^k

x

⁰^k²

²

(2.16)

and, if^j

w

^j ^c³_d² , then for the density

v

(

w

) of the distribution

GE

^;y⁰ the following representation is valid:

v

(

w

) =

'

(

w

)exp

c

⁴

d

³⁼²+

d

^;

D

⁰ ^;1⁼²

x

⁰1 +

w

²

+ ^j

w

^j³

³

(2.17)

and

2^p12

^exp

;

w

²

v

(

w

) 2

p2

^exp

;

w

² 4

²

:

(2.18)

Moreover, for all

w

²^R¹,

v

(

w

)

c

⁵

^exp

;minⁿ

w

²

4

²

c

⁶^j

w

^j

d

o

:

(2.19)

Note that for

d

= 1 we use in the formulations the natural agreement

;

D

⁰ ^;1⁼²

x

⁰ =^;

D

⁰ ^;1⁼²

x

⁰=^j

x

⁰^j=^k

x

⁰^k= 0.

0, (

F

¹

F

²)²^Rd^;

0 . Let, for

k

= 1

2,

F

k =^L(

k)²^A_d(

4

4) and, if

d

2

F

_k⁰ =^L(

_k⁰)²^A_d^;1(

4

4)

:

(2.20)