• Aucun résultat trouvé

Calculating Gauss-Newton Matrix-Vector Product by Vector-Jacobian Products

N/A
N/A
Protected

Academic year: 2022

Partager "Calculating Gauss-Newton Matrix-Vector Product by Vector-Jacobian Products"

Copied!
2
0
0

Texte intégral

(1)

Calculating Gauss-Newton Matrix-Vector Product by Vector-Jacobian Products

Pengrui Quan UCLA

November 5, 2019

1 Introduction

In a Newton method for training deep neural networks, at each conjugate gradient step a Gauss-Newton matrix-vector product is conducted. Here we discuss our implementation by using two vector-Jacobian products, a type of operations that are commonly available in packages like Tensorflow. Our derivation follows from the description at https://j-towns.

github.io/2017/06/12/A-new-trick.html [2], though we follow the notation in Wang et al [3].

2 Deriving right multiplication of a Jacobian matrix

The key procedure of calculatingGvis to derive the right multiplication of Jacobian matrices.

In most deep learning libraries, the product of left multiplication of a Jacobian matrix, denoted asvTJθ, is well established, i.e. Tensorflow, PyTorch. We can make use of techniques explained below to develop the right multiplication of the Jacobian matrix Jθv.

First we denote f(θ) : Rn → Rm with parameters θ ∈ Rn. Let v ∈ Rn be the right multiplication vector andu ∈Rm be a dummy variable.

The Jacobian matrix of f with respect to θ is:

Jθ =

∂f1

∂θ1

∂f1

∂θ2

∂f1

∂θ3 . . . ∂θ∂f1

∂f2 n

∂θ1

∂f2

∂θ2

∂f2

∂θ3 . . . ∂θ∂f2 .. n

. ... ... . .. ...

∂fm

∂θ1

∂fm

∂θ2

∂fm

∂θ3 . . . ∂f∂θm

n

(1)

The left multiplication of the Jacobian matrix can be calculated by back propagation, which is common in reverse mode automatic differentiation packages [1, p. 12]:

uT ∂f

θT =uTJθ (2)

1

(2)

To use (2) for calculating Jθv, we define g(u) = uTJθ. SinceuTJθ is a vector in Rn, the mapping of v can be defined as a function g(u) =uTJθ, whereg(u) :Rm →Rn. Hence, we can take derivative ofg(u) with respect to u, while providing the left multiplying vectorv:

vT ∂g

∂u =vT∂(uTJθ)

u (3)

=vTJθT

(4)

= (Jθv)T (5)

In practical implementation, u can be any dummy vector such as the vector of all ones.

3 Deriving Gauss-Newton matrix vector product Gv

According to notations in [3], the loss function is defined as ξ(zL+1(θ)), where zL+1 is the pre-softmax layer, and G equals:

G=JTBJ (6)

We first take derivative of loss ξ with respect zL+1 to obtain ∂z∂ξL+1.

Then we calculate the right multiplication of the Jacobian matrix of vector ∂z∂ξL+1 by v using the technique from section 1, where f(θ) is substituted with ∂z∂ξL+1:

∂(ξ/∂zL+1)

θT v = 2ξ

∂(zL+1)T∂zL+1 · ∂zL+1

∂θT v (7)

=BJv (8)

Finally we calculate Gv by the left multiplication of a Jacobian matrix, where we treat BJv as the left multiplying vector u and f(θ) as zL+1 in equation (2):

(BJv)T∂zL+1

∂θ = (BJv)TJ (9)

=vTJTBJ (10)

= (Gv)T (11)

References

[1] Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jef- frey Mark Siskind. Automatic differentiation in machine learning: a survey. Journal of Machine Learning Research, 18(153):1–43, 2018.

[2] Jamie Townsend. A new trick for calculating Jacobian vector products, 2017.

[3] Chien-Chih Wang, Kent Loong Tan, and Chih-Jen Lin. Newton methods for convolu- tional neural networks. ACM Transactions on Intelligent Systems and Technology, 2019.

To appear.

2

Références

Documents relatifs

Bien au fait des compétences techniques des monnayeurs locaux, ces étrangers au pays mosan espèrent probablement conclure de juteuses opérations financières en

In the first step the cost function is approximated (or rather weighted) by a Gauss-Newton type method, and in the second step the first order topological derivative of the weighted

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des

A Constrained Gauss-Newton Algorithm for Material Decomposition in Spectral Computed Tomography?. Tom Hohweiller, Nicolas Ducros, Françoise Peyrin,

Samples were sep- arated by SDS-15% PAGE and stained for the presence of proteins with Coomassie (a) or for covalently bound heme by peroxidase activity stain (heme stain) (b)..

Velocity model slices along x left and along y directions at one of the five well locations obtained with a global inversion using the constrained reflection tomography

We determined the convergence order and the radius of the method (3) as well as proved the uniqueness ball of the solution of the nonlinear least squares problem (1).. The method (3)

Les résultats de cette étude suggèrent que les deux techniques neurochirurgicales permettent une amélioration clinique équivalente des symptômes obsessionnels