Calculating Gauss-Newton Matrix-Vector Product by Vector-Jacobian Products

(1)

Calculating Gauss-Newton Matrix-Vector Product by Vector-Jacobian Products

Pengrui Quan UCLA

November 5, 2019

1 Introduction

In a Newton method for training deep neural networks, at each conjugate gradient step a Gauss-Newton matrix-vector product is conducted. Here we discuss our implementation by using two vector-Jacobian products, a type of operations that are commonly available in packages like Tensorflow. Our derivation follows from the description at https://j-towns.

github.io/2017/06/12/A-new-trick.html [2], though we follow the notation in Wang et al [3].

2 Deriving right multiplication of a Jacobian matrix

The key procedure of calculatingGvis to derive the right multiplication of Jacobian matrices.

In most deep learning libraries, the product of left multiplication of a Jacobian matrix, denoted asv^TJ_θ, is well established, i.e. Tensorflow, PyTorch. We can make use of techniques explained below to develop the right multiplication of the Jacobian matrix J_θv.

First we denote f(θ) : Rⁿ → R^m with parameters θ ∈ Rⁿ. Let v ∈ Rⁿ be the right multiplication vector andu ∈R^m be a dummy variable.

The Jacobian matrix of f with respect to θ is:

J_θ =







∂f1

∂θ1

∂f1

∂θ2

∂f1

∂θ3 . . . _∂θ^∂f¹

∂f2 n

∂θ1

∂f2

∂θ2

∂f2

∂θ3 . . . _∂θ^∂f² .. n

. ... ... . .. ...

∂fm

∂θ1

∂fm

∂θ2

∂fm

∂θ3 . . . ^∂f_∂θ^m

n







(1)

The left multiplication of the Jacobian matrix can be calculated by back propagation, which is common in reverse mode automatic differentiation packages [1, p. 12]:

u^T ∂f

∂θ^T =u^TJ_θ (2)

1

(2)

To use (2) for calculating J_θv, we define g(u) = u^TJ_θ. Sinceu^TJ_θ is a vector in Rⁿ, the mapping of v can be defined as a function g(u) =u^TJ_θ, whereg(u) :R^m →Rⁿ. Hence, we can take derivative ofg(u) with respect to u, while providing the left multiplying vectorv:

v^T ∂g

∂u =v^T∂(u^TJθ)

∂u (3)

=v^TJθT

(4)

= (J_θv)^T (5)

In practical implementation, u can be any dummy vector such as the vector of all ones.

3 Deriving Gauss-Newton matrix vector product Gv

According to notations in [3], the loss function is defined as ξ(z^L+1(θ)), where z^L+1 is the pre-softmax layer, and G equals:

G=J^TBJ (6)

We first take derivative of loss ξ with respect z^L+1 to obtain _∂z^∂ξL+1.

Then we calculate the right multiplication of the Jacobian matrix of vector _∂z^∂ξL+1 by v using the technique from section 1, where f(θ) is substituted with _∂z^∂ξL+1:

∂(ξ/∂z^L+1)

∂θ^T v = ∂²ξ

∂(z^L+1)^T∂z^L+1 · ∂z^L+1

∂θ^T v (7)

=BJv (8)

Finally we calculate Gv by the left multiplication of a Jacobian matrix, where we treat BJv as the left multiplying vector u and f(θ) as z^L+1 in equation (2):

(BJv)^T∂z^L+1

∂θ = (BJv)^TJ (9)

=v^TJ^TBJ (10)

= (Gv)^T (11)

References

[1] Atilim Gunes Baydin, Barak A. Pearlmutter, Alexey Andreyevich Radul, and Jef- frey Mark Siskind. Automatic differentiation in machine learning: a survey. Journal of Machine Learning Research, 18(153):1–43, 2018.

[2] Jamie Townsend. A new trick for calculating Jacobian vector products, 2017.

[3] Chien-Chih Wang, Kent Loong Tan, and Chih-Jen Lin. Newton methods for convolu- tional neural networks. ACM Transactions on Intelligent Systems and Technology, 2019.

To appear.

2