• Aucun résultat trouvé

The Role of Internal Oscillators for the One-Shot Learning of Complex Temporal Sequences

N/A
N/A
Protected

Academic year: 2021

Partager "The Role of Internal Oscillators for the One-Shot Learning of Complex Temporal Sequences"

Copied!
11
0
0

Texte intégral

(1)

HAL Id: hal-00356671

https://hal.archives-ouvertes.fr/hal-00356671

Preprint submitted on 28 Jan 2009

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Learning of Complex Temporal Sequences

Matthieu Lagarde, Pierre Andry, Philippe Gaussier

To cite this version:

Matthieu Lagarde, Pierre Andry, Philippe Gaussier. The Role of Internal Oscillators for the One-Shot

Learning of Complex Temporal Sequences. 2007. �hal-00356671�

(2)

Learning of Complex Temporal Sequenes.

MatthieuLAGARDE,PierreANDRY,PhilippeGAUSSIER

ETIS,NeuroybernetiTeam,UMRCNRS8051

2,avenueAdolphe-Chauvin,UniversityofCergy-Pontoise,Frane

{lagarde,andry,gaussier}ensea. fr

Abstrat. Wepresentanartiialneuralnetworkusedtolearnonline

omplextemporalsequenesofgesturestoarobot.Thesystemisbased

on a simple temporal sequenes learning arhiteture, neurobiologial

inspiredmodelusing someof the properties ofthe erebellumand the

hippoampus, plus a diversity generator omposed of CTRNN osilla-

tors.The useof osillators allows toremove theambiguity of omplex

sequenes. The assoiations with osillators allow to build an internal

stateto disambiguatethe observable state.Tounderstandtheeetof

thislearningmehanism,we omparetheperformaneof(i)ourmodel

with (ii) simple sequene learning model and with (iii) the simplese-

quene learning model plus a ompetitive mehanism between inputs

andosillators.Finally,wepresentanexperimentshowingaAIBOrobot,

whihlearnsandreproduesasequeneofgestures.

1 INTRODUCTION

Ourlongtermgoalistobuild anautonomousrobotabletolearnsensorimotor

tasks. Suh a systemshould be(i) able to aquire new behaviors : gestures,

objetsmanipulation assequenes ombining multimodal elementsof dierent

levels.Todothis,anautonomoussystemmust(ii)takeadvantageofinformation

oftheassoiationsbetweenvisionandmotorapabilities.Thispaperfouseses-

sentiallyon the rst point : learning, prediting and reprodution of omplex

sensorimotorsequenes.

Inthissope,solutionsbasedonneuralnetworksareaninterestingsolution.Neu-

ralnetworksareabletolearnsequenesusingassoiativemehanisms.Moreover,

thesenetworksoeralevelofoding(neuron)thattakesintoaountinforma-

tionaboutthelowersensorimotorsystem;suhsystemsavoidtheuseofsymbols

or information that ould separate the sequene learningomponent from the

building of assoiations between sensation and ation. Networks are adapted

to online learning favoring easier interations with humans and other robots.

Among these models, haoti neural networks are based onreurrent network

(RN).In[1℄,afullyonnetedRNlearnsasequenethankstoasinglelayerof

neurons.Thedynamisgeneratedbythenetworkhelptolearnashortsequene.

Afterafewiterations,thelearnedsequenevanishesprogressively.In[2℄,aran-

(3)

neurons. The rst layer generates an internal dynami by means of a RRN.

Theseond layergeneratesaresonanephenomenon.Thenetworklearnsshort

sequenes of 7or 8 states. But this model is highly sensitive to noises orthe

stimulusvariationsanddoesnotlearnlongperiods sequenes.Asimilarmodel

is the Eho States Network (ESN) based on RRN for short term memory [3℄

(STM).Under ertainonditions(detailedin [4℄),theativationofeahneuron

in thehiddenlayer,isafuntion oftheinputhistorypresentedtothenetwork;

thisistheehofuntion.Oneagain,theideaistouseareservoirofdynamis

from whih the desired output islearned in onjuntion with the eet of the

inputativity.

Intheontextof robotis,manymodels onerngesturelearning.Bymeansof

nonlinear dynamial systems, [5℄ develops ontrol poliies to approximate the

reorded movements and to learn them with atting of mixture model using

areursiveleastsquareregressiontehnique.In[6℄,thetrajetoriesofgestures

areaquiredbytheonstrutionofmotorskillswithaprobalistirepresentation

ofthemovement.Trajetoriesanbelearntthroughviapoints[7℄withparallel

vetor-integration-to-endpointmodels[8℄.Inourwork,wewishtobeabletore-

useanddetetsubsequenesandpossibly,ombinethem.Thus,weneedtolearn

someoftheimportantomponentsofthesequeneandnotonlytoapproximate

thetrajetoryofthegesture.

In this paper, we present a biologially inspired model of neural network for

temporalomplexsequeneslearning.Arstapproahdesribedin[9℄proposes

aneuralnetworkfortheonlinelearningofthetimingbetweeneventsforsimple

sequenes (with non ambiguous stateslike A B C). We propose a model for

omplexsequenes(withambiguousstateslikeAandBinABACB).Inor-

dertoremovetheambiguousstatesortransitions,weusebatteriesofosillators

asareservoirofdiversityallowingtoseparate theinputsappearing repeatedly

in thesequene. Insetion 3, weshowresults from simulationsomparing the

performanes of 3 dierent systemsinvolved in the learningand reprodution

of thesameset of omplexsequenes: (i)the systemdesribed in [9℄,(ii) this

system plus a simple ompetitive mehanism between the osillators and the

input(showingtheeet ofaddinginternal dynamis inorder toseparate am-

biguousstates) and(iii)asystemoptimizingtheuseoftheosillatorsbyusing

anassoiativelearningruleinorder toreruitnewinternalstateswhenneeded

(repetition of the same input state). Setion 4 details the appliation of our

modelonarealrobotforthelearningofaomplexgesture.Finally,weonlude

andpointoutsomeopenproblems.

2 A MODEL FOR TIMING AND SEQUENCE

LEARNING

Thearhiteture(Fig. 1)isbasedonaneurobiologialmodel[10℄inspiredfrom

some of the properties of the erebellum and the hippoampus. This model

uses assoiative learning rules between past inputs memorized as a STM and

(4)

here to the sequenes in whih the same state appears only one. The main

advantageofthismodelisthattheassoiativemehanismalsolearnsthetiming

ofthesequene,whihallowsauratepreditionsofthetransitionsthatompose

the sequene. In order to learnomplex sequenes in whih the same state is

repeatedseveraltimes,inourmodelwehaveaddedamehanismthatgenerates

internal dynamis and that an be assoiated with the repeated inputs of the

sequene. The assoiation between therepeated inputsand dierentativities

of the osillator allowsto ode hidden stateswith dierent and un-ambiguous

patternsofativities.Asaresult,ourarhiteturemanagestolearn/preditand

reprodueomplextemporalsequenes.

Fig.1.Complexsequeneslearningmodel.Barredlinksaremodiableonnexions.The

othersareassoiatedtounmodiableonnexions.Theleftpartisdetailedingure3.A

and3.B.Therightpartisdetailedingure4.

2.1 Generating internal diversity

Osillators are very muh used in roboti appliations like loomotion using

entralpatterngenerator(CPG)[11℄.Anosillatorisaontinuoustimereurrent

neural network (CTRNN) omposed of twoneurons (Fig. 2.A). The study on

CTRNNsanbefoundin[12℄.Thiskindofosillatorsisknownforstability,and

resistanetothenoises.CTRNNareeasytoimplementtoo.ACTRNNoupling

twoneuronsproduesanosillator(Fig.2.B):

τ e . dx

dt = −x + S((w ii ∗ x) − (w ji ∗ y) + w e

const

)

(1)

τ i . dy

dt = −y + S((w jj ∗ y) + (w ij ∗ x) + w i

const

)

(2)

with

τ e

atimeonstantfortheexitatoryneuronand

τ i

fortheinhibitoryneuron.

x

and

y

aretheativityoftheexitatoryandtheinhibitoryneuronrespetively.

w ii

istheweightofthe reurrentlink of theexitatoryneuron,

w jj

the weight

ofthereurrentlink oftheinhibitory neuron.

w ij

istheweightofthelinkfrom

theexitatoryneurontoinhibitoryneuron.

w ji

istheweightofthelinkfromthe

inhibitoryneurontoexitatoryneuron.

w e

const and

w i

const aretheweightsofthe

links from the onstantinputs. And

S

is thetransferfuntion of eah neuron.

Inourmodel,weusethreeosillatorswith

τ e = τ i

.

(5)

A. B.

0 0.2 0.4 0.6 0.8

0 50 100 150 200 250 300

time

Fig.2.A. Osillator model.Left neuronisexitatoryand right neuronis inhibitory.

Exitatory links are :

W

ii

= 1

,

W

jj

= 1

,

W

ij

= 1

. Inhibitory links is :

W

ji

= −1

,

Constantinputvalueisequalto

1

withonstantlinks

W

e

const

= 0 . 45

and

W

i

const

= 0

.

Initialativitiesofneuronsare

X (0) = 0

,

Y (0) = 0

.

B. Display oftheinstantaneousmeanfrequenyativity of3osillatorssystemswith

τ

1

= 20

(plainline),

τ

2

= 30

(longdashedline),

τ

3

= 50

(shortdashedline).

2.2 learning ofinternal states

Inordertouserepeatedlythesameinputinagivensequene,dierentongu-

rationsofosillatorsanbeassoiatedwiththesameinput.Tounderstandthe

generation of diversity and its impliation in our learning algorithm,we have

tested twomehanisms: asimpleompetitionouplinginputstateswith osil-

lators (Fig. 3.A) and an assoiativemehanismbased on alearning rule (Fig.

3.B) that reruitsneuronsaordingto theativitiesof theosillatorsand the

repeatedinputs.

CompetitivemehanismTheompetitionisomputedasfollow:eahneuron

ij

ofthe Competition groupatsasanneuron performingthe logialoperator ANDbetweentheneuronsoftheInputsgroupandoftheOsillatorsgroup:

P ot ij = (w input

i

∗ x input

i

+ w osci

j

∗ x osci

j

) − threshold ij

(3)

with

w input

i

= 1

,

w osci

j

= 1

,

threshold ij = 1.2

,

x input

i theativityoftheinput

atindex

i

and

x osci

j theativityoftheosillatoratindex

j

.

Inaseondstep,aompetitionbetweenallneurons

ij

oftheCompetitiongroup isapplied:

W inner ij =

1 if ij = Argmax ij (P ot ij )

0 otherwise

(4)

Thewinnerneuronbeomestheinputofthetemporalsequenelearningnetwork

(subsetion2.3).Inthisway,areservoirofosillatorneuronsanbeusedasa

waytoassoiatethesameinputwithdierentinternalpatterns.Intuitively,the

simpleompetition(nolearningisrequiredhere)allowstodiretlyseletdier-

entinternalstatesorrespondingtothesameinputrepeatedmanytimesinthe

sequene. For examplein Fig. 3.A,eah input (A,B,C,D) anappears upto 3

times(orrespondingtothenumberofosillators)in thesamesequene.More-

(6)

of thesequene. Obviously, iftheompetition betweenosillators is anavenue

worthexploring,itisstillpossibletohaveambiguity.Aninputanbeassoiated

withsamewinnerosillatortwoormoretimes.Consequently,thereisstillpoten-

tialambiguitiesontheinternal statesofourmodel,andsomesequenesould

notbereproduedorretly. A preisemeasure of thisproblem orresponds to

the probability that the samestateanbeassoiatedwith the sameosillator

several timesand therefore theinternal statepartially depends ofthe shape,

phaseand numberofosillators. Typially,the problemhappenswhen agiven

stateomesbakwiththesamefrequenyastheseletedosillator.Theurves

C2andC3ongure5showtheperformanesoftheompetitivemehanism.To

solvethisproblem,anassoiativemehanismallowingtoreruitneuronsoding

internal stateshasbeenadded.

A. B.

Fig.3.A.Modeloftheneuralnetworkouplinganinputstatewithanosillator.All

linksarexedonnetions.B.Modeloftheneuralnetworkusedtoassoiate aninput

statewithaongurationofosillators.Onlyfewlinksarerepresentedforthelegibility.

Dashedlinksaremodiableonnetions.Solidlinksarexedonnetions.

Assoiative mehanism The learning proess of an assoiation between an

inputstateandaongurationofosillatorsis:

U S = w i ∗ x i

(5)

with

w i

the weightof the link from input state

i

, and

x i

ativity of theinput

state

i

. If

U S > threshold

, we ompute the potential and the ativity of the

neuronasfollow:

P ot j =

M

osci

X

j=0

|(w j − e j )| Act j = 1 1 + P ot j

(6)

with

M osci

thenumberofosillators,

w j

theweightofthelinkfromosillator

j

,

and

e j

theativityoftheosillator

j

.Theneuronthathastheminimumativity

is reruited :

W in = Argmin j (Act j )

. Initial weigths of onnexions have high

values.Theosillatorsongurationislearntaordingto theerrorof distane

∆w j = ε(e j − w j )

with

ε

alearning rate,

w j

weight of link from Osillator

j

,

and

e j

ativityof osillator

j

. The Assoiationsgroup beomes thenew input

(7)

of the temporal sequene learningnetwork(subsetion 2.3). As showed onthe

gure3.B,aninputallowsreruiting3dierentneuronsodinginternalstates.

Theyorrespond totheonnetivityoftheunonditionallinks hosenbetween

theInputsgroupandtheAssoiationgroup.Theassoiativemehanismensures

toreruitanewinternal stateforeahinput(A,B,CorD)fromthesequene.

The onnetivity oflinks betweentheInput group andthe Assoiationgroup,

has been hosento haveanumberof hidden statesequal foreah input. This

allows theomparison between the dierentmodels in our simulations.But it

ouldbepossibletohangetheonnetivityofthelinkstoallowthereruitement

of morehiddenstatesforeah repeatedinput in thesequene.We havetested

thismehanismin ourarhitetureinsimulationandrobotiappliation.

2.3 Temporal sequeneslearning

Fig.4.Representation ofhippoampus.EntorhinalCortex (EC) reeivesinputs and

transmits them to Dentate Gyrus (DG)and CA3 pyramidal ells. Between the DG

groupandtheCA3 groupthereare fullyonnetedwithmodiableonnetions.Be-

tweentheECgroupandCA3group,andtheECgroupandDGgroup,therearexed

onetoneighborhoodonnetions.

Thispartofmodelisbasedonashematirepresentationofhippoampus[10℄

(Fig. 4). DG represents past state (STM), and develops a temporal ativity

spetrum.CA3linksallowpatternompletionandreognitionbetweeninoming

statefromECandpreviousstatemaintainedinDG.WesupposetheDGativity

anbemodelledasfollow:

Act DG j,l (t) = 1 m j

· exp − (t − m j ) 2 2 · σ j

(7)

with

Act DG j,l

theativityoftheellatindex

l

ontheline

j

,

t

thetime,

m j

atime

onstantand

σ j

thestandarddeviation.Neuronsononelinesharetheirativity

in thetimeandrepresentatemporaltraeofEC.Learningofanassoiationis

ontheweightsoflinksbetweenCA3andDG.Thenormalizationoftheativity

omingfromDGneuronsisperformedduetothenormalizationoftheDG-CA3

weights.

W CA3(i,j) DG(j,l) =

Act

DGj,l

P

j,l

( Act

DGj,l

)

2

if Act DG j 6= 0 unchanged otherwise

(8)

Interestingly, this model has the property to work when a same input omes

(8)

isstoredduringthetotaltimeofitspresene.Consequently,twosuessivestates

arenotambiguousforthesystem(AA =A).

3 SIMULATION RESULTS

Atemporalsequeneofstatesisrarelyreplayedtwotimesexatlywiththesame

rhythm.Timeanvarybetweentwostatesespeiallywhendemonstratingase-

quene toarobot. Inoursimulationsweapply atimevariationbetweenstates

and observethe onsequenes on three arhitetures. The rst arhiteture is

themodel ofsimplesequeneslearningpresentedin subsetion2.3. Theseond

isthesamemodelplustheompetitivemehanismpresentedinsubsetion 2.2.

The third arhitetureis thesame astherst oneplusthe assoiativemeha-

nism (Fig. 1) seenin subsetion 2.2. Referenessequenes are generated to be

suessfully reprodued by the seond arhiteture with a timing variation of

0%. All arhitetures aretrainedwith thesamesequenesand thesamemaxi-

mumoftiming variation(0%,5%or10%), but withatimevariationrandomly

hosenbetween0and themaximumvariationofthetime. Inourexperiments,

tobootstrapasequene,weprovidetherststate.Consequently,thisstatewill

not be ambiguousin thesequenes. Forexample,a omplexsequenes an be

DBCBACAB :D isthestartingstateanditwillnotberepeatedafter.

Fig.5showstheperformanesofeaharhiteture.Weanseethattherstar-

0 10 20 30 40 50 60 70 80 90 100

2 3 4 5 6 7 8

% of sequences correctly reproduced

sequence length

"C1"

"C2"

"C3"

"C4"

Fig.5.C1:rstarhiteture:simplesequeneslearning.Theresultsarethesamewith

timevariationof0%,5%and10%.C2:seondarhiteture:omplexsequeneslearn-

ingwithaompetitivemehanismandatimevariationof5%.C3:seondarhiteture:

omplexsequeneslearningwithaompetitivemehanismandatimevariationof10%.

C4:urrentarhiteture:omplexsequeneslearningwithanassoiativemehanism.

Theresultsarethesametimingvariationof0%,5%and10%.

hiteture(subsetion2.3)hasverygoodperformaneswithsequenesof3and

4 states, beause those sequenes have no repeated states (simple sequenes).

With sequenes havingmore than 4 states, the performanes fall drastially ,

(9)

time variationhas noeet on theperformanes.The arhiteture annotre-

produe them, beauseCA3group learnstwotransitions and, thus itpredits

twostates for eah repeatedinput. The seond arhiteture using ompetitive

mehanism,hasbetterperformanes,but,aswehaveseenpreviouslyinsubse-

tion2.2,ambiguousinternalstatesanappearandreduethisgainofsequenes

orretly reprodued. Consequently, like the rst arhiteture, the CA3group

learnstwointernal statesand, thus, itpreditstwostatesfromoneinputre-

peated.Wean see,theperformaneshangeaordingto thetimingvariation

between states in the sequenes : a sameinput from agiven sequene an be

assoiatedwith twodierentosillatorsand, onsequentlyadierentinternal

statewins.Thankstothereruitmentmehanism,thethirdarhiteture,hasthe

best performanes : 100% with all tested sequenes. There are notambiguous

statesor internal states.Thetimevariationhasnoeetontheperformanes

ofthemodel.

4 ROBOTIC APPLICATION

A. B.

Fig.6.A. Representation ofdesired sequene.It beginsfrom thestart point. B.We

manipulate Aibo passively. It learns the suession of orientations of the movement

fromthesefrontleftlegmotorinformation.

Therobot used in ourexperiments is an Aibo ERS7 (Sony). In ourappli-

ation, we use onlythe front left leg,in apassivemovement mode to learna

sequeneofgestures.ThesequenetobelearnedandreproduedisshowedFig.

6.A.Inthisappliation,wetestthethird arhiteturepreviouslydesribed.

During learning, we manipulate the front left leg of the robot passively (Fig.

6.B). Duringtheexeutionof themovement,theneural network learnsonline,

and in one shot the suession of the joints orientation thanks to the motors

feedbakinformationofitsleg(proprioeptivesignal).Hene,theinputsofour

model are theorientations/anglesof theleg.The reordedmotors information

while learningareshownin Fig.7.X-learning (horizontal movements) andFig.

7.Y-learning(vertialmovements).

Toinitiatethereprodutionofthesequenebytherobot,wegivetherststate

of the sequene (down). As Aibo an not be manipulated when motors are

ativated, wesendtheommanddiretlytotherobot.Next,Aiboplaysthese-

quene autonomously(Fig.7,top).With thestartingstate,ourmodelpredits

(10)

0.5 0.55 0.6 0.65 0.7 0.75

0 200 400 600 800 1000 1200 1400 1600 1800

angle

time

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 200 400 600 800 1000 1200 1400 1600 1800

angle

time

0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0 200 400 600 800 1000 1200

angle

time

0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7

0 200 400 600 800 1000 1200

angle

time

X-learning Y-learning X-reprodution Y-reprodution

0.1 0 0.5 1

0 0.5 1 0

0.5 1

0 0.5 1

Learntgesture Reproduedgesture

Fig.7.Top:Aiboreproduesthelearntsequene.Middle:X-learningandY-learning

arerespetivelythehorizontalandthevertialmotorsinformationwhilerobotlearns

the sequene.X-reprodution and Y-reprodution are respetively the horizontal and

thevertialmotorsinformationduringthereprodutionofthesequene.Onthegure

Y-reprodution,therstmovementisnotreprodued(notpredited),butgivenbythe

userinordertotriggerthereallitisourbootstrapstatetostartthesequene.X-axis

arethetimeandY-axisare theanglesofthemotors.

5 CONCLUSIONS AND DISCUSSIONS

Wehaveproposedamodelofneuralnetworkforthelearningofomplextempo-

ralsequenes.Thismodelintroduesanassoiativemehanismtakingadvantage

of adiversitygeneratoromposed ofosillators. These osillatorsare basedon

oupled CTRNN. This model is eient in theframe of autonomous robotis

andsueedin learninginoneshotthetimingofsequenesofgestures.

Duringtherobotisappliation,wehavenotiedthat therobot reproduesthe

sequene with adierent amplitude of the movement. This eet omes from

thespeedofthedisplaementofthelegofAibo.Inourappliation,thespeedof

thereprodutionisapredenedonstantdierentfromtheuserdynamiduring

learning.Therhythmofthesequeneisrespetedthankstoatemporalgroupof

neurons.ApossibleimprovementwouldbetoaddamodellikeCPG[5℄foreah

movements(up,down,left andright)omposing sequeneswithvariable

speeds. Inour model, the numberof neurons oding the assoiations between

theinputsandtheosillators,representsthesizeoftheshorttermmemory.In

oursimulationsandappliation,thesequeneslearntdonotsaturatethemem-

ory.Itwouldbeinterestingtoanalyzethebehavioroftheneuralnetworkwith

longer sequenes, andtest the limitations ofthe systemwhen the neurallimit

hasbeenreahedbythereruitmentmehanism.Inthepresentsystem,itwould

meanthatthealreadyreruitedneuronsouldbeerasedinordertoenodenew

Références

Documents relatifs

Two complementary systems are proposed in comparison: ZH is a baseline system without on-line learn- ing using the initial ZSSP and a handcrafted dialogue man- ager policy, whereas

This motivates us to study a problem where the representation is obtained from a classifier pre-trained on a large-scale dataset of a different domain, assuming no access to

1) Given the complexity and contradictory nature of results emerging from other studies, we exhaus- tively evaluate the impact of different learners and data sets. We believe this to

By distinguishing between chronological, narrative and generational time we have not only shown how different notions of time operate within the life stories, but have also tried

the above-mentioned gap between the lower bound and the upper bound of RL, guarantee that no learning method, given a generative model of the MDP, can be significantly more

The comparison between the hypothesized and the actual learning process shows that it is very important how students relate to diagrams on the different levels in

In order to be able to detect possible causal chains of verbs leading to the accident, we marked some verbs with the attributes causal and impact (causal when the verb

Event predictions are learned for both spatial information (a change of the most active place cell) and non-spatial information (a repeated sequence of events, i.e. All the