Optimal weighted least-squares methods for high-dimensional approximation Giovanni Migliorati

(1)

epfl-mox-logo

Optimal weighted least-squares methods for high-dimensional approximation

Giovanni Migliorati

Universit´e Pierre et Marie Curie, Paris, France joint work with Albert Cohen (UPMC)

Journ´ee scientifique du groupe SMAI-SIGMA

(2)

epfl-mox-logo

Outline

1 Motivations and example of applications

2 Notation and definitions

3 Stability and accuracy of standard least squares with evaluations at random points

4 Stability and accuracy of weighted least squares with evaluations at random points

5 Sampling algorithms for the optimal density

6 Conclusions

(3)

epfl-mox-logo Motivations and example of applications

4 Stability and accuracy of weighted least squares with evaluations at random points

6 Conclusions

(4)

epfl-mox-logo Motivations and example of applications

Fast solution to parametric / stochastic PDEs

PDE modelF(u,y) = 0 depending on a parameter vectory ∈Γ⊂R^d,d≫1.

For eachy ∈Γ the PDE model is well-posed in some Hilbert spaceV. Example of PDE model:

−∇ ·(a∇u) =f, inD⊂R²; u= 0 on∂D.

D= (0,1)²checkboard withd= 2^2k squaresD1, . . . ,D_d,k ≥1, and the diffusion coefficientais piece-wise constant onD1, . . . ,Dd with valuesa1, . . . ,ad

that define the parameter vector

y = (a1, . . . ,ad)∈Γ = [amin,amax]^d, 0<amin≤amax<+∞. Examples of goals: using the evaluationsu(y¹), . . . ,u(y^m) withy^j ∈Γ,

reconstruction of the solution mapy 7→u(y)∈V, or approximation of quantities of interest, likey 7→R

Du(x,y)dx.

Each evaluation ofu is computationally expensive.

Evaluations ofucould be affected by measurement and numerical errors.

(5)

epfl-mox-logo Notation and definitions

6 Conclusions

(6)

Notation and definitions

For any d ≥1, Γ⊆R^d,dρ probability measure on Γ, andu : Γ→R. dµsampling measure on Γ, such that

w dµ=dρ

for some w : Γ→R⁺ defined everywhere and with R

Γw⁻¹dρ= 1.

hf,gi:=

Z

Γ

f(y)g(y)dρ(y), hf,gi^m := 1 m

m

X

j=1

w(y^j)f(y^j)g(y^j),

k · k:=h·,·i^1/2, k · k^m :=h·,·i^1/2^m , withy¹, . . . ,y^m being i.i.d. according toµ.

Goal: approximation ofu in L²(Γ,dρ) using pointwise evaluations u(y^j).

(7)

Approximation space

Choose an orthonormal basis (Lj)_j≥1 of L²(Γ,dρ).

Assumption: ∀y ∈Γ there exists an indexk s.t. L_k(y)6= 0.

Define the linear space

Vn:=span{L₁, . . . ,Ln}, wheren=dim(Vn).

A minimal sufficient condition to satisfy this assumption is thatV_n contains the functions that are constant over Γ.

(8)

Observation models

Assumption: the functionu is well-defined at any point in Γ except eventually adρ-zero measure set, and u ∈L²(Γ,dρ).

•noiseless observation model:

zⁱ =u(yⁱ), i = 1, . . . ,m, y¹, . . . ,y^m ^i.i.d.∼ µ;

•noisy observation model:

zⁱ =u(yⁱ) +ηⁱ, i = 1, . . . ,m.

This talk: only noiseless model. Analogous results proven for the noisy observation model, with several different assumptions on the noise type.

(9)

Discrete least-squares approximation

Continuous and discreteL² projections ofu overV_n defined as argmin

v∈Vn

ku−vk,

uW := argmin

v∈Vn

ku−vk^m = argmin

v∈Vn

m

X

i=1

w(yⁱ)|v(yⁱ)−zⁱ|². Normal equations:

Gβ=b, with

[G]_ij =hL_i,L_jim, [b]_j =m⁻¹

m

X

i=1

w(yⁱ)zⁱL_j(yⁱ), andβ contains the coefficients of the expansionu_W =Pn

j=1βjLj. Standard least squares: w ≡1 and therefore dµ=dρ.

Weighted least squares: w 6≡1 plus previous conditions, thus dµ6=dρ.

(10)

For a given functionu : Γ→Rin L²(Γ,dρ) and a givenV_n with dim(Vn) =:n ≤m:

i)how stable is the weighted discrete least-squares approximation ofu fromV_n usingm evaluations at random points?

ii)how accurate is the weighted least-squares estimatoruW ofu?

Comparison of the approximation errorku−u_Wk with the best approximation error ofu onV_n.

(11)

epfl-mox-logo Least squares with evaluations at random points

6 Conclusions

(12)

The function

y 7→k_n(y) =

n

X

j=1

|L_j(y)|²

is the diagonal of the integral kernel of the projector on Vn, and depends only onVn anddρ.

In general we have the lower bound

Kn:=kknkL^∞(Γ)≥n.

First limitation: cannot address relevant situations like Γ =R^d,dρ Gaussian measure on Γ and (Lj)j Hermite polynomials.

(13)

Chernoff bound for random matrices [Tropp 2011]

G =m⁻¹Pm

i=1H(yⁱ) whereHjk=Lj(y)Lk(y).

Since|||H||| ≤Kn a.s., it holds

Pr_ρ(|||G−I|||> δ)≤2nexp

−mc(δ) Kn

,

wherec(δ) =δ+ (1−δ) ln(1−δ)>0.

Chooseδ = 1/2 such that c(1/2) = 0.15.

For any r>0, if

0.15 1 +r

m lnm ≥Kn

then

Pr_ρ(|||G−I|||>1/2)≤2m^−r.

(14)

Norm equivalence on V

_n

For someδ ∈(0,1) it holds

(1−δ)kvk² ≤ kvk²m ≤(1 +δ)kvk², ∀v ∈Vn. For any v∈Vn,v= (vj)j coefficients of the expansionv =Pn

j=1vjLj. Sincekvk²m=hGv,vi^Rⁿ andkvk²=hv,vi^Rⁿ, the matrixG satisfies

|||G|||= sup

v∈Vn\{v≡0}

kvk²m

kvk², |||G⁻¹|||= sup

v∈Vn\{v≡0}

kvk² kvk²m

.

Hence, norm equivalence onV_n w.h.p. iff concentration bounds 1−δ≤ |||G||| ≤1 +δ,

1

1 +δ ≤ |||G⁻¹||| ≤ 1 1−δ,

|||G−I||| ≤δ.

(15)

Γ⊂R^d bounded. Assume that|u| ≤τ almost surely w.r.t. dρand define T_τ(t) :=sign(t) min{τ,|t|}, u_T :=T_τ◦u_W

Theorem ( [CCMNT-ESAIM:M2AN 2015] ) In any dimension d , for any r>0 and any n≥1, if

0.15 1 +r

m

lnm ≥Kn, then it holds that

Pr_ρ(cond(G)≤3)≥1−2m^−r, Pr_ρ

ku−u_Wk ≤(1 +√ 2) inf

v∈Vnku−vkL^∞

≥1−2m^−r, E_ρ ku−u_Tk²

≤

1 + 0.6

(1 +r) lnm

v∈Vminnku−vk²+ 8τ²m^−r.

(16)

Second limitation: superlinear growth ofKnw.r.t. n.

Example: multivariate approximation with polynomials:

Γ = [−1,1]^d, dρ=⊗^dj=1(1−yj)^θ¹(1 +yj)^θ²dyj, θ₁, θ₂≥ −1/2, Λ⊂N^d

0 downward closed: ν ∈Λ andν^′ ≤ν =⇒ ν^′ ∈Λ, Vn=P_Λ:=span{y^ν, ν ∈Λ}with n =dim(P_Λ) = #(Λ).

Proven upper bounds ( [CCMNT-ESAIM:M2AN 2015], [M-JAT 2015] ) K_n≤

(n^{ln 3}^{ln 2}, ifθ₁=θ₂ =−1/2, n^{2 max{θ}¹^,θ²^}+2, ifθ1, θ2∈N₀.

Equality attained for index sets of anisotropic tensor product type.

(17)

epfl-mox-logo Weighted least squares with evaluations at random points

6 Conclusions

(18)

Two “limitations”: superlinear growth ofK_n w.r.t. n and Γ bounded.

How to circumvent them?

Back to the general setting: Γ⊆R^d, (Lj)j orthonormal basis in L²(Γ,dρ).

k_n,w(y) :=w(y)kn(y) =w(y)

n

X

j=1

|Lj(y)|²,

K_n,w :=kk_n,wkL^∞(Γ)≥n.

Pros: freedom of choice forw ≥0 (only need R

Γw⁻¹dρ= 1).

(19)

Γ⊆R^d. Assume that |u| ≤τ almost surely w.r.t. dρ and define T_τ(t) :=sign(t) min{τ,|t|}, uT :=T_τ◦uW

u_C :=u_W, if cond(G)<3; u_C := 0,otherwise.

Theorem ( [CM-SMAI JCM 2017] )

In any dimension d , for any r>0 and any n≥1, if 0.15

1 +r m

lnm ≥K_n,w, then it holds that

Pr_µ(cond(G)≤3)≥1−2m^−r, Pr_µ

ku−u_Wk ≤(1 +√ 2) inf

v∈Vnku−vk^L^∞

≥1−2m^−r, E_µ ku−uTk²

≤

1 + 0.6

(1 +r) lnm

v∈Vminnku−vk²+ 8τ²m^−r, E_µ ku−uCk²

≤

1 + 0.6

(1 +r) lnm

vmin∈Vnku−vk²+ 2kuk²m^−r.

(20)

Optimal weighted least squares

Choose the weight function as w = n

k_n = n Pn

j=1|L_j|², and thus

dµ=w⁻¹dρ= Pn

j=1|L_j|²

n dρ=:dµn. k_n,w ≡n =⇒ K_n,w =n.

In generaldµ_n is not a product measure on Γ.

(21)

From the previous theorem, using the optimal choice ofw we obtain:

Corollary ( [CM-SMAI JCM 2017] )

In any dimension d , for any r>0 and any n≥1, if 0.15

1 +r m lnm ≥n, then it holds that

Pr_µ(cond(G)≤3)≥1−2m^−r, Pr_µ

ku−u_Wk ≤(1 +√ 2) inf

v∈Vnku−vkL^∞

≥1−2m^−r, E_µ ku−u_Tk²

≤

1 + 0.6

(1 +r) lnm

v∈Vminnku−vk²+ 8τ²m^−r, E_µ ku−uCk²

≤

1 + 0.6

(1 +r) lnm

vmin∈Vnku−vk²+ 2kuk²m^−r.

(22)

epfl-mox-logo Sampling algorithms for the optimal density

6 Conclusions

(23)

Multivariate polynomial approximation

We use orthogonal polynomials, orthonormalized inL²(Γ,dρ).

Assume Γ has a Cartesian structure,e.g. Γ = [−1,1]^d or Γ =R^d. Given univariate orthonormal polynomials (φk)_k≥0 and a multi-index set Λ⊂N^d

0, for anyν∈Λ we define L_ν(y) :=

d

Y

i=1

φνi(yi), y ∈Γ,

P_Λ:=span{L_ν :ν∈Λ}, with dim(P_Λ) = #(Λ).

Then choose the approximation space asV_n=P_Λ.

(24)

Connections with equilibrium measure

In some specific settingsdµ_n converges in weak-star sense to the equilibrium measuredµ^∗.

Example: choose the uniform measure on Γ = [−1,1] and P_k =span{y^j : 0≤j ≤k−1}. Then

dµn_n→∞→ dµ^∗= 1 2πp

1−y²dλ.

Whenever asymptotic equivalences are available c dµ^∗ ≤dµn≤C dµ^∗,

the previous results on stability and accuracy carry over by choosingw such thatdµ=dµ^∗, but under the more demanding condition

0.15 1 +r

c C

m lnm ≥n.

(25)

How to sample efficiently the optimal density ?

Algorithm 1 Sequential conditional sampling forµn.

INPUT: m,d, Λ,ρi, (φj)_j≥0fori= 1, . . . ,d.

OUTPUT: y¹, . . . ,y^m^i.i.d.∼ µn. fork= 1 tomdo

αν←(#(Λ))⁻¹, for anyν∈Λ.

Sampley₁^kfromt7→ϕ1(t) =ρ1(t) P

ν∈Λ

α_ν|φ_ν₁(t)|². forq= 2 toddo

αν←

q−1Q

j=1

|φνj(x_j^k)|² P

e ν∈Λ

q−1Q

j=1

|φ_ν_e_j(x_j^k)|²

, for anyν∈Λ.

Sampley_q^k fromt7→ϕq(t) =ρq(t) P

ν∈Λ

αν|φνq(t)|². end for

y^k←(y₁^k, . . . ,y_d^k).

end for

Overall computational cost of generatingm independent samples fromµn

is linear in bothd and m.

(26)

Pr { cond ( G ) ≤ 3 } , d = 1: weighted LS vs LS

dρ uniform measure dρ Gaussian measure dρChebyshev measure

weightedLS n n n

m/lnm m/lnm m/lnm

LS n n n

(27)

Pr { cond ( G ) ≤ 3 } , d = 10: weighted LS vs LS

dρ uniform measure dρ Gaussian measure dρChebyshev measure

weightedLS n n n

m/lnm m/lnm m/lnm

LS n n n

m/lnm m/lnm m/lnm

(28)

method dρ d= 1 d= 2 d= 5 d= 10 d= 50 d= 100

weighted LS uniform 1 1 1 1 1 1

weighted LS Gaussian 1 1 1 1 1 1

weighted LS Chebyshev 1 1 1 1 1 1

standard LS uniform 0 0 0.54 1 1 1

standard LS Gaussian 0 0 0 0 0 0

standard LS Chebyshev 1 1 1 1 1 1

Table:Pr{cond(G)≤3}, with m= 26559 and n = 200.

method dρ d= 1 d= 2 d= 5 d= 10 d= 50 d= 100

weighted LS uniform 1.5593 1.4989 1.4407 1.4320 1.4535 1.4179 weighted LS Gaussian 1.5994 1.5698 1.4743 1.4643 1.4676 1.4237 weighted LS Chebyshev 1.5364 1.4894 1.4694 1.4105 1.4143 1.4216 standard LS uniform 19.9584 29.8920 3.0847 1.9555 1.7228 1.5862 standard LS Gaussian ∼10¹⁹ ∼10¹⁹ ∼10¹⁹ ∼10¹⁶ ∼10⁹ ∼10³ standard LS Chebyshev 1.5574 1.5367 1.5357 1.4752 1.4499 1.4625

Table:Average of cond(G), withm= 26559 and n= 200.

(29)

epfl-mox-logo Conclusions

6 Conclusions

(30)

Conclusions - analysis weighted least squares

RANDOM POINTS: analysis w.r.t. m, n, d,dρ, smoothness u: in any dimensiond, with any measure dρ (e.g. Jacobi or Gaussian), proven stability and accuracy w.h.p. and in expectation provided that

m

lnm ≥C n =Cdim(V_n), with C independent of d.

However:

results are for a given approximation space, adaptivity could be an issue.

we have developed efficient algorithms for sampling the optimal densityµ_n, but require dρ be a product measure.

(31)

Thank you for your attention!

(32)

epfl-mox-logo References

Some references

A.Cohen, G.Migliorati:Optimal weighted least-squares methods, SMAI Journal of Computational Mathematics, 2017.

A.Chkifa, A.Cohen, G.Migliorati, F.Nobile, R.Tempone:Discrete least squares polynomial approximation with random evaluations; application to parametric and stochastic elliptic PDEs.

ESAIM:M2AN, 2015.

G.Migliorati:Multivariate Markov-type and Nikolskii-type inequalities for polynomials associated with downward closed multi-index sets, J.Approximation Theory, 2015.

G.Migliorati, F.Nobile, R.Tempone:Convergence estimates in probability and in expectation for discrete least squares with noisy evaluations at random points, J. Multivariate Analysis, 2015.

A.Cohen, G.Migliorati, F.Nobile:Discrete least-squares approximations over optimized downward closed polynomial spaces in arbitrary dimension, Constructive Approximation, 2017.