• Aucun résultat trouvé

Studying convergence of gradient algorithms via optimal experimental design theory

N/A
N/A
Protected

Academic year: 2021

Partager "Studying convergence of gradient algorithms via optimal experimental design theory"

Copied!
25
0
0

Texte intégral

Loading

Figure

Table 1.2. Values of ξ ∗ (m), Φ(M (ξ ∗ )) and v(ξ ∗ ) 0 < ε ≤ m+M2m m+M2m ≤ ε ≤ m+M2M ε ≥ m+M2M ξ ∗ (m) 1 2M 2(M− ε(M+m) − m) 0 Φ(M (ξ ∗ )) m 2 (ε − 1) 1 4 ε 2 (m+M ) 2 − εmM M 2 (ε − 1) v(ξ ∗ ) (1 − ε) 2 R max (ε − 1) 2
Fig. 1.1. Asymptotic rate of convergence as a function of ε for the steepest-descent algorithm with relaxation ε
Fig. 1.2. Asymptotic rate of convergence as a function of ̺ for steepest descent with relaxation coefficients ε = 0.97 and ε = 0.99
Fig. 1.4. Log-rates − log(v(ξ k )) (750 < k ≤ 1000) for steepest descent with relax- relax-ation; varying ε, ̺ = 10
+7

Références

Documents relatifs

In this paper we study the convergence properties of the Nesterov’s family of inertial schemes which is a specific case of inertial Gradient Descent algorithm in the context of a

The two contexts (globally and locally convex objective) are introduced in Section 3 as well as the rate of convergence in quadratic mean and the asymptotic normality of the

Rates for time- discretization seem to be unknown except in some special cases (see below), and a fortiori no rates are known when the basic process is

We show the convergence and obtain convergence rates for this algorithm, then extend our method to the Roothaan and Level-Shifting algorithm, using an aux- iliary energy

Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on

Similar results hold in the partially 1-homogeneous setting, which covers the lifted problems of Section 2.1 when φ is bounded (e.g., sparse deconvolution and neural networks

Once proved that this (convex) functional is monotone decreasing in time (this property justifies the name given to the functional, on the analogy of the physical entropy), if some

With suitable assumptions on the function ℓ, the random case can be treated with well known stochastic approximation results [1, 5].. Various theoretical works [2, 3, 6] indicate