Mathematical derivation of LocalGaussian computation for Manifold Parzen, and errata

(1)

Mathematical derivation of LocalGaussian computation for Manifold Parzen, and errata

Pascal Vincent

Département d’Informatique et Recherche Opérationnelle Université de Montréal

P.O. Box 6128, Downtown Branch, Montreal, H3C 3J7, Qc, Canada vincentp@iro.umontreal.ca

Technical Report 1259

D´epartement d’Informatique et Recherche Op´erationnelle March 16, 2005

Abstract

The aim of this report is to correct an inconsistency in the mathematical formulas that appeared in our previously publishedManifold Parzenarticle [1], regarding the computation of the density of “oriented” high dimensional Gaussian “pancakes”

for which we store only the firstdleading eigenvectors and eigenvalues (rather than a fulln×ncovariance matrix, or its inverse). We give a detailed derivation leading to the correct formulas.

1 Detailed mathematical derivation of LocalGaussian evaluation

We consider, inIRⁿ, the multivariate Gaussian densityNµ,C parameterized by mean vectorµ ∈ IRⁿandn×ncovariance matrixC. The density at any pointx ∈ IRⁿis given by

Nµ,C(x) = 1

p(2π)ⁿ|C|e⁻¹²^(x−µ)⁰^C⁻¹^(x−µ) (1) where|C|is the determinant ofC.

Letx˜=x−µ

LetC = V DV⁰ the eigen-decomposition ofC where the columns ofV are the orthonormal eigenvectors andDis a diagonal matrix with the eigenvalues sorted in de- creasing order(λ1, . . . , λn).

1

(2)

We replaceDwithD˜which is a diagonal matrix containing modified eigenvaluesλ˜1..n

such thatall eigenvalues afterλ˜dare given the same valueσ², i.e.

˜λ1..n = (˜λ1, . . . ,λ˜d, σ², . . . , σ²)

UsingC˜ =VDV˜ ⁰instead ofCin equation 1, we get:

N˜µ,C(x) = 1 q

(2π)ⁿ|C|˜

e⁻¹²^˜^x⁰^C^˜⁻¹^˜^x

= e⁻¹²^log((2π)ⁿ^|^C|)^˜ e⁻¹²^˜^x⁰^C^˜⁻¹^˜^x

= e^−0.5(nlog(2π)+log(|C|)+˜˜ x⁰C˜⁻¹x)˜

(2)

In other words N˜_µ,C(x) =e^−0.5(r+q) (3)

with r=nlog(2π) +log(|C|)˜ (4)

and q= ˜x⁰C˜⁻¹x˜ (5)

Moreover, sinceV is an orthonormal basis, we have kV⁰xk˜ ² = kxk˜ ²

n

X

i=1

(V_i⁰x)˜ ² = kxk˜ ² (6)

whereV_i⁰x˜is the usual dot product between theith eigen-vector and centered inputx.˜

Proof : kV⁰xk˜ ² = (V⁰˜x)⁰(V⁰x)˜

= ˜x⁰V V⁰˜x

= ˜x⁰Ix˜

= k˜xk²

In addition, having the above eigendecomposition,

we have |C|˜ =

n

Y

i=1

˜λ_i

thus log(|C|)˜ =

n

X

i=1

log(˜λi)

2

(3)

log(|C|)˜ = (n−d)log(σ²) +

d

X

i=1

log(˜λ_i) (7)

Replacing 7 in 4, we get

r=nlog(2π) + (n−d)log(σ²) +

d

X

i=1

log(˜λ_i) (8)

In addition we have q = x˜⁰C˜⁻¹x˜

= x˜⁰(VDV˜ ⁰)⁻¹x˜

= x˜⁰VD˜⁻¹V⁰x˜

=

n

X

i=1

1 λ˜i

(V_i⁰x)˜ ²

=

n

X

i=1

(1

˜λi

− 1

σ²)(V_i⁰x)˜ ²

! + 1

σ²

n

X

i=1

(V_i⁰x)˜ ²

Since˜λi =σ²for alli > d,(_˜¹

λ_i −_σ¹2) = 0fori > d. As a consequence the first sum can be replaced by a sum from1tod(instead of from1ton). Also from equation 6 the second sum can be replaced byk˜xk². This yields:

q= 1

σ²kxk˜ ²+

d

X

i=1

(1 λ˜i

− 1

σ²)(V_i⁰x)˜ ² (9)

We have thus eliminated the need to store and compute with eigenvectorsVd+1. . . Vn.

2 Summing up: the correct formulas

To sum this all up, we can compute the density as follows:

N˜_µ,C(x) =e^−0.5(r+q) (10)

with r=nlog(2π) + (n−d)log(σ²) +

d

X

i=1

log(˜λi) (11)

and q= 1

σ²k˜xk²+

d

X

i=1

(1 λ˜i

− 1

σ²)(V_i⁰x)˜ ² (12)

3

(4)

3 Errata for Manifold Parzen article

Typo in the pseudo-code forMparzen::trainstep 4) should beλij = σ²+ ^s

2 j

k

instead ofσ²+^s

2 j

l.

Also, in our initial experiments, we actually considered two possible choices forλ˜i(for i≤d) andσ²:

a)(˜λ₁, . . . ,λ˜_d) = (λ₁, . . . , λ_d)andσ²=λ_d+1 which leads to:

r = nlog(2π) + (n−d) log(σ²) +

d

X

i=1

log(λ_i)

q = 1

σ²k˜xk²+

d

X

i=1

(1 λ_i − 1

σ²)(V_i⁰x)˜ ²

b)σ²is a user specified value and(˜λ1, . . . ,λ˜d) = (λ1+σ², . . . , λd+σ²) which leads to:

r = nlog(2π) + (n−d)log(σ²) +

d

X

i=1

log(λi+σ²)

q = 1

σ²k˜xk²+

d

X

i=1

( 1

λ_i+σ²− 1

σ²)(V_i⁰x)˜ ²

We mentioned only scenariob)in the Manifold Parzen article [1] (due to space con- straints). But somehow these two slightly different versions got mixed up in the write- up, leading to the somewhat inconsistent formulas that appear in the article (taking r from b) and q from a)). In addition, we mistakenly wrote dlog(2π)instead of nlog(2π).

However, after verification, the actualcodeused to perform the experiments reported in the article (implementing scenariob)) appears correct.

References

[1] Pascal Vincent and Yoshua Bengio. Manifold parzen windows. In S. Thrun S. Becker and K. Obermayer, editors,Advances in Neural Information Process- ing Systems 15, pages 825–832, Cambridge, MA, 2003. MIT Press.

4