Learning Phases - Jörg Walter

4.4 Learning Phases

As pointed out in the introduction, in the biological world we find many types of learning. The PSOM approach offers several forms by itself.

One PSOM's core idea is the construction of the continuous mapping manifold using a topologically ordered set of reference vectors^w^s. There-fore the first question of learning can be formulated: how to obtain the topological order of the reference vectors ^w^a? The second question is, how to adapt and improve the mapping accuracy in an on-line fashion, allowing to cope with drifting and changing tasks and target mappings?

The PSOM algorithm offers two principal ways for this initial learning phase:

PSOM Unsupervised Self-Organization Mode: One way is to employ the Kohonen adaptation rule Eq. 3.10 described in Sect. 3.7. The ad-vantage is that no structural information is required. On the other hand this process is iterative and can be slow. It might require much more data to recover the correct structural order (see also stimulus-sampling theory in the introduction). Numerous examples on the self-organizing process are given in the literature, e.g. (Kohonen 1990;

Ritter, Martinetz, and Schulten 1992; Kohonen 1995))

PSOM Supervised Construction (PAM: Parameterized Associative Map) In some cases, the information on the topological order of the train-ing data set can be otherwise inferred — for example generated — by ac-tively sampling the training data. In the robotics domain frequently this can be done by structuring the data gathering process — often without any extra effort.

Then the learning of the PSOM is sped up tremendously: the iterative self-organization step can be replaced by an immediate construction.

The prototypically sampled data vectors ^wâ are simply assigned to the node locations â in the set Â. In the following chapter, several examples shall illustrate this procedure.

Irrespective of the way in which the initial structure of the PSOM map-ping manifold was established, the PSOM can be continuously fine-tuned

following an error-minimization process. This is particularly useful to im-prove the a PSOM that was constructed with a noisy sampled training set or to adapt it to a changing or drifting system.

In the following we propose a supervised learning rule for the refer-ence vectors ^w^a, minimizing the obtained output error. Required is the target, the embedded input–output vector, here also denoted^x. The best-match ^{w (s}⁾ ²

M

for the current input ^x is found according to the cur-rent input sub-space specification (set of components

I

; distance metric^P).

Each reference vector is then adjusted, weighted by its contribution factor

H

⁽⁾, minimizing the deviation to the desired joint input-output vector^x

H

^(a

^s⁾

:

^(x^;^{w (s}⁾⁾ ^(4.14)

0 0.2 0.4 0.6 0.8 1

-1 -0.5 0 0.5 1

after training reference vectors W_a training data set before training

Figure 4.8: The mapping result of a three node PSOM before and after learning by the means of Eq. 3.10 with the “+” marked learning examples (generated by

x

² ⁼

x

²¹⁺

⁽⁰

⁰

:

¹⁵⁾; normal distributed noise

with mean 0 and standard devia-tion 0.15). The posidevia-tions of the reference vectors^w^aare indicated by the asterisks.

The same adaptation mechanism can be employed for learning the out-put components of the reference vectors from scratch, i.e. even in the ab-sence of initial corresponding output data. As a prerequisite, the input sub-space

X

ⁱⁿ of the mapping manifold must be spanned in a topolog-ical order to facilitate the best-match dynamic Eq. 4.4 to work properly.

4.4 Learning Phases 55 Assuming that a proper subset of input components

I

has been specified

(usually when the number of input components is equal to the mapping dimension, ^j

I

^j ⁼

m

), an input vector ^Px can be completed with a zero distance (

E

^(s⁾=0) best-match location^{Pw (s}⁾⁼^Pxin the mapping man-ifold

M

. Then, only the output components are effectively adapted by Eq. 4.14 (equivalent to the selection matrix¹¹^;^P). If the measurement of the training vector components is differently reliable or noisy, the learning parameter

might be replaced by a suitable diagonal matrix controlling the adaptation strength per component (we return to this issue in Sect. 8.2.) Fig. 4.8 illustrates this procedure for an

m

⁼ ¹dimensional three-node PSOM. We consider the case of a

d

⁼ ² dimensional embedding space

^(a

^{in the}

parameter manifold

S

; Initially

w

¹⁽

s

⁾is constructed here as a linear func-tion (degenerated parabola) in

s

, similar as Fig. 4.8).

-1 -0.5 0 0.5 1 0

50 100 0.5 1

x₁ x₂

Adaptation Steps

Figure 4.9: In contrast to the adaptation rule Eq. 4.14, the modified rule Eq. 4.15 is here instable, see text.

4.5 Implementation Aspects and the Choice of Basis Functions

As discussed previously in this chapter, the PSOM utilizes a set of basis functions

H

^(a

^s), which must satisfy the orthonormality and the division-of-unity condition. Here we explain one particular class of basis functions.

It is devoted to the reader also interested in the numerical and implemen-tation aspects. (This section is not a prerequisite for the understanding of the following chapters.)

A favorable choice for

H

^(a

^s)is the multidimensional extension of the well known Lagrange polynomial. In one dimension (

m

⁼

d

⁼¹⁾

a

^{is from}

a set

A

⁼^f

a

i^j

i

⁼¹

:::n

^gof discrete values and Eq. 4.1 can be written as the Lagrange polynomial interpolating through

n

support points ⁽

n ⁼^Xn

k⁼¹

l

k⁽

sA

⁾

w

k (4.16) where the Lagrange factors

l

i⁽

sA

⁾are defined as

l

i⁽

sA

⁾⁼ ^Yⁿ

j⁼¹ j⁶⁼i

s

a

i^;

a

:

^(4.17)

m >

¹, Eq. 4.16 and 4.17 can be generalized in a straightforward fashion, provided the set of node points form a

m

-dimensional

hyper-4.5 Basis Function Sets, Choice and Implementation Aspects 57

Dans le document Jörg Walter (Page 67-71)