Studying convergence of gradient algorithms via optimal experimental design theory
Texte intégral
Figure
Documents relatifs
In this paper we study the convergence properties of the Nesterov’s family of inertial schemes which is a specific case of inertial Gradient Descent algorithm in the context of a
The two contexts (globally and locally convex objective) are introduced in Section 3 as well as the rate of convergence in quadratic mean and the asymptotic normality of the
Rates for time- discretization seem to be unknown except in some special cases (see below), and a fortiori no rates are known when the basic process is
We show the convergence and obtain convergence rates for this algorithm, then extend our method to the Roothaan and Level-Shifting algorithm, using an aux- iliary energy
Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on
Similar results hold in the partially 1-homogeneous setting, which covers the lifted problems of Section 2.1 when φ is bounded (e.g., sparse deconvolution and neural networks
Once proved that this (convex) functional is monotone decreasing in time (this property justifies the name given to the functional, on the analogy of the physical entropy), if some
With suitable assumptions on the function ℓ, the random case can be treated with well known stochastic approximation results [1, 5].. Various theoretical works [2, 3, 6] indicate