• Aucun résultat trouvé

Des comportements flexibles aux comportements habituels : meta-apprentissage neuro-inspiré pour la robotique autonome

N/A
N/A
Protected

Academic year: 2021

Partager "Des comportements flexibles aux comportements habituels : meta-apprentissage neuro-inspiré pour la robotique autonome"

Copied!
149
0
0

Texte intégral

(1)

HAL Id: tel-01526482

https://tel.archives-ouvertes.fr/tel-01526482

Submitted on 23 May 2017

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Des comportements flexibles aux comportements

habituels : meta-apprentissage neuro-inspiré pour la

robotique autonome

Erwan Renaudo

To cite this version:

Erwan Renaudo. Des comportements flexibles aux comportements habituels : meta-apprentissage neuroinspiré pour la robotique autonome. Robotique [cs.RO]. Université Pierre et Marie Curie -Paris VI, 2016. Français. �NNT : 2016PA066508�. �tel-01526482�

(2)
(3)

• •

(4)
(5)
(6)
(7)
(8)

`

(9)
(10)
(11)
(12)
(13)
(14)
(15)

– –

(16)

– –

(17)
(18)
(19)
(20)
(21)
(22)

(23)

(24)
(25)
(26)
(27)

– S – A – T – R < S,A,T,R > π π π:S →A π:S×A →[0,1] T:S×A×S →[0,1]

(28)

R:S →R R:S×A →R s∈S a∈ A T,R s0 s1 N Sk=(D0,...,Dn) T ai 1 Dj rt γ∈[0,1] Rt= ∞ ∑ i=0 γi·r t+i+1 γ=0

(29)

– S0 r=0 S1 r=1 S0 α 1−α

(30)

– x γ Rt= T ∑ i=0 rt+i+1 π Vπ(s)(s,a) s a s π Vπ(s)=E π{Rt|st=s}=Eπ { ∑ i=0 γi·r t+i+1|st=s } Qπ(s,a)=E π{Rt|st=s,at=a}=Eπ { ∑ i=0 γi·r t+i+1|st=s,at=a }

(31)

(s) Vπ(s)s s(s)=∑ s∈S T(s,π(s),s)(R(s,a,s)+γVπ(s)) π V π π Vπ π π ∀s∈S,Vπ(s)≥Vπ(s) V∗ ∀s∈S,V∗(s)= π V π(s)= a ∑ s∈S T(s,a,s)(R(s,a,s)+γV∗(s)) ∀(s,a)∈S×A,Q∗(s,a)= π Q π(s,a)=∑ s∈S T(s,a,s)(R(s,a,s)+ b γQ ∗(s,b))

(32)

Vπ π0 π∗ V∗ Vπ π ´ ´ ∆← 0 s∈S v← Vπ(s) Vπ(s)←∑sT(s,π(s),s)[Rπ(s)+γVπ(s)] ∆← (∆,|v−Vπ(s)|) `∆<ε ε stable← vrai

s∈S b← π(s)

π(s)← argmaxa∑sT(s,a,s)[Rπ(s)+γVπ(s)] b=π(s) stable← faux

stable Vπ π

(33)

Vπ ´ ´ ∆← 0 s∈S v← Vπ(s)(s)← a∑sT(s,a,s)[Rπ(s)+γV(s)] ∆← (∆,|v−Vπ(s)|) `∆<ε ε s∈S π(s)← a∑sT(s,a,s)[Rπ(s)+γV(s)] – T,R – 1m×1m 5×5

(34)

15 V(s) Q(s,a) s s Vπ Qπ Vk+1(s)=Vk(s)+α·δk α∈[0,1] δk s k α TD(0) s Vk+1(s) s

(35)

Vk(s) π Vk+1(s)=Vk(s)+α  r(s)+γVk(s)− Vk(s)   δk s π V(s) Q(s,a) Q Q(s,a) r(s,a) s s s V(s) s Q(s,a) Qk+1(s,a)=Qk(s,a)+α   r(s,a)+γQk(s,a) ≡Vπ(s) −Qk(s,a)    π Qπ(s,a) s s

(36)

s Qk+1(s,a)=Qk(s,a)+α    r(s,a)+γ b Qk(s,b) ≡Vπ(s) −Qk(s,a)     k V π p(s,a) a s p(s,a) s s a (s,a,s,r(s,a))

(37)

(s,a) C(s,a,s) s (s,a,s) s a C(s,a)=∑ u∈S C(s,a,u) P(s|s,a)=T(s,a,s)=C(sC(s,a,a),s) R(s,a)= ∑ trt(s,a) C(s,a) E3 (s,a) (s,a)

(38)

∀(s,a)∈S×A(s) Q(s,a) (s,a) s←

a← (s,Q)

a s r

Q(s,a)← Q(s,a)+α[(r+γ bQ(s,b)−Qk(s,a)] (s,a)← s,r

N s←

a← s

s,r← (s,a)

(39)

s a a∗ X [0,1] a= { X ≤ argmaxaQ(s,a) P P(a|s)=∑ (Q(s,a)/τ) b∈A (Q(s,b)/τ) τ τ τ

(40)

ß

(s1,a1) (s2,a2)

r1 (s1,a1)

a2 s2 s2

(41)

N p – ae t e st pt(st,a[i])= N ∑ e=1 I(a[i],ae t)

I(x,y) a[i]=ae

t – Ce e we t(a[i]) |A| |A|−1 pt(st,a[i])= N ∑ e=1 we t(a[i]) – pt(st,a[i])= N ∏ e=1 πe t(st,a[i]) – pt(st,a[i])= N ∑ e=1 πe t(st,a[i])

(42)

ß

(43)
(44)
(45)

(46)
(47)
(48)
(49)
(50)
(51)
(52)

Qs,a(q)=P(Q(s,a)=q)

Q(s,a)

(53)
(54)

h

h h

(55)
(56)

– A1 S S S A2 A1 A2 {A1,A2} A2 A2 A1 S S {A1,A2} S S a a∗ s

A(s,a)=Q(s,a)−V(s)

a a s

C(s,a,a)=∑ s

(57)

a s a a

−C(s,a,a)< ˆRτ (s,a)← (s,{a,a}) (s,{a,a})← (s,a)

(58)
(59)
(60)
(61)
(62)
(63)

– – – Q(s,a) – P P Q

(64)

s r

(65)

S si

W

Q(S,aj)

(66)

Q(s,a) s τ aj St Wj=(w0j,...,wNj) Qt(St,aj)=atj=Wjt·(St,1) Wt a δ=rt+γHab·maxb Wbt−1·St)− Wat−1·St−1) Wt a=Wat−1+αHabδ/ ∑ n sn rt a St−1αHab γHab (S,a,S) T S a S S,a,S T(S,a,S) (S,a) (S,a) Tt(S,a,S)=Tt−1(S,a,S)+αMB·(1−Tt−1(S,a,S)) Rt(S,a)=rt

(67)

– 100 1 64 γ Q(s, a) Qt(s, a)=max rt(s, a),γMB· s Tt−1(s, a, s)·a Qt(s,a)

(68)

VS S V

P(S) S

H(VS) Hmax

Rc

(69)

H(VS)=− ∑ S∈S

P(S)·log2(P(S)) Rc=H(VH S)

max Hmax=log2|S|

Rc Rn

Rn=(1−ω)+ω·Rc ω=1+e−σ|S−S1 0|

S0=50 sigma=0.25 N

(70)
(71)

– Cc Ca vb dib dib vbs vb vb Cc Ca pbs pbt Mbs Mbt

(72)

– (0,1) Cc (0,1) (0,2) (0,2) (1,4) (0,0)

(73)

– pbt,pbs mbt(0),mbs(0) Mbt, Mbs t st pbt=C a· pbs=Cc· tmax=0.1s   

mbst,bt(i)=mbst−1,bt(i−1) ∀i∈|M| mbst,bt(0)=pbs,bt

(74)

– α γ τ 3 Rt=0 Cc pbs Rt= −0.03 pbt Ca Rt=−0.03 Rt=0.97 LC DN PA DN LC Cc Ca LC vb= / vb= . / t=

(75)

a=0.01 / Ca . / . / . / . / . /

(76)

(77)
(78)
(79)
(80)
(81)

– – Ehab EGD t¯rt λ ∆¯rt=¯rt−¯rt−1 ¯ rt=(1−λ)·¯rt−1+λ·rt – ∆¯rt

(82)

(83)

– ∆¯rt HtHab,GD HE t(x)=− |A| ∑ i=0 Pi∗log2(Pi) Pi=p(a=ai|s)

(84)
(85)

α γ τ

(86)
(87)
(88)
(89)
(90)
(91)

– 1er 3me

(92)

0 500 1000 1500 2000 2500 Temps (Decision)

0.06 0.08 0.10 0.12 0.14 Te mp s (s ) RaZ Cons.

Temps planification moyen, RC

0 500 1000 1500 2000 2500 Temps (Decision)

0.06 0.08 0.10 0.12 0.14 Te mp s (s ) RaZ Cons.

Temps planification moyen, SS

(93)
(94)

0 50 100 150 200 250 0 20 40 60 80 100 120

Recompense cumulee

Va ri an ce r ec o mp en se

Distribution desjeux de parametres evalues, GD.

−50 50 150 250 350 400 0 50 100 150 200

Recompense cumulee

Va ri an ce r ec o mp en se

Distribution desjeux de parametres evalues, Hab.

(95)
(96)
(97)
(98)
(99)
(100)
(101)

(102)

– ´ – – – – – –

(103)

.− /

. /

. /

(104)
(105)
(106)
(107)
(108)
(109)

THab,TGD δQ

δP

α=0.2 α=0.02

VE E

VHab=−(αHab·δQ+βHab·THab) VGD=−(αGD·δP+βGD·TGD) α β

(110)

– 30 αGD=1 αHab= 12 β THab TGD , × ,

(111)

x y

N

(112)

                                                               –

(113)

. /

s 0.5

(114)

– α γ τ (s,a,s) n s,a s – 120 – 60 60 30 29,31,34 30 10−3

(115)

10−2

(116)
(117)

10−3

(118)
(119)

(120)
(121)

´

(122)
(123)

s s

(124)
(125)

29 5 30

(126)
(127)
(128)

(129)

– – –

(130)

– ´

– —

(131)
(132)

– – ´ – – — – – ´ – – — – –

(133)

– – – ´ – – – – ´ – – – ´ ´ ´ ´ ¨ –

(134)

– – ´ – – – ¨ – – – – – `

(135)

– — – – ´ – – – – – ß – — – — – – – ` – – –

(136)

– ¨ – – – – ´ – – – ¨ ` ´ – – – – –

(137)

– ¨ – – ´ – – – – – – – – ´ –

(138)

– – – – – – – – ¨ – – ´ ´

(139)

´ ´ ´ ` – – – – – – – – – – – –

(140)

´´ – ´´ – – – – ¨ ´ – ` – ´ –

(141)

– – – ` – – ¨ – – – ´ – ´ – – – –

(142)

— – — – – ε – – – – ` – — –

(143)

– – ` ´ λ – – – – – – – – – ´ – `

(144)
(145)
(146)
(147)
(148)
(149)

Références

Documents relatifs

To test whether the vesicular pool of Atat1 promotes the acetyl- ation of -tubulin in MTs, we isolated subcellular fractions from newborn mouse cortices and then assessed

Néanmoins, la dualité des acides (Lewis et Bronsted) est un système dispendieux, dont le recyclage est une opération complexe et par conséquent difficilement applicable à

Cette mutation familiale du gène MME est une substitution d’une base guanine par une base adenine sur le chromosome 3q25.2, ce qui induit un remplacement d’un acide aminé cystéine

En ouvrant cette page avec Netscape composer, vous verrez que le cadre prévu pour accueillir le panoramique a une taille déterminée, choisie par les concepteurs des hyperpaysages

Chaque séance durera deux heures, mais dans la seconde, seule la première heure sera consacrée à l'expérimentation décrite ici ; durant la seconde, les élèves travailleront sur

A time-varying respiratory elastance model is developed with a negative elastic component (E demand ), to describe the driving pressure generated during a patient initiated

The aim of this study was to assess, in three experimental fields representative of the various topoclimatological zones of Luxembourg, the impact of timing of fungicide

Attention to a relation ontology [...] refocuses security discourses to better reflect and appreciate three forms of interconnection that are not sufficiently attended to