• Aucun résultat trouvé

The challenges of ac�ve explora�on and learning in high-­‐dimensional con�nuous spaces

N/A
N/A
Protected

Academic year: 2022

Partager "The challenges of ac�ve explora�on and learning in high-­‐dimensional con�nuous spaces"

Copied!
21
0
0

Texte intégral

(1)

The  challenges  of  ac�ve  explora�on  and   learning  in  high-­‐dimensional    

con�nuous  spaces  

Pierre-­‐Yves  Oudeyer  

Project-­‐Team  INRIA-­‐ENSTA-­‐ParisTech  FLOWERS   h�p://www.pyoudeyer.com    

h�p://flowers.inria.fr    

(2)

Π

Space of Controllers Task Space = Space of Effects

Reachable Space of Effects

Forward Model Inverse Model

T

πθ1

πθ2 πθ3 πθ4

τλ1, Rλ1 τλ2, Rλ2

τλ3, Rλ3

λj Rm

Parameterized by

θi Rn

Parameterized by

Learning  (generalized)     sensorimotor  mappings  

M S

Probabilistic models P(S|M)

Joint mapping P(M,S): notion of forward and inverse model depend on use

(3)

Why  difficult  to  learn?  

P High-dimensional and continuous P Redundant

P Stochastic, inhomogeneous in terms of learnability

P Physical experiments time + limited life time = limited number of training data

è Guide actively collection of data/experiments to maximize what can be learnt within a life-time;

è Whole joint P(M,S) mapping, even P(S|M), even MàS, cannot be learnt (data too sparse)

(4)

simple complexe

complexe

(S(t),!") simple

complexe

complexe

(S(t),!") Explorer zones:

P Incertitude/erreurs maximales

P Les moins explorées Suppose:

P Stationarité spatiale et temporelle

P Tout est apprenable P Modèle de la fonction

d’erreur

èQuel expérimenter ?

x n+1

(S(t),!") S(t + !)

(S(0),!",1, proj(S(1)))

! pred1

(S(1),!",2, proj(S(2)))

! pred2

!

(S(n"1),!",n, proj(S(n)))

! predn

(S(n+1),!",n+1)

Approche

développementale

Explorer zones:

Progrès en apprentissage empiriquement

mesuré est maximal

Appren�ssage  ac�f  de  modèles  

(5)

Spontaneous  ac�ve  explora�on,   ar�ficial  curiosity  

predict :(S(t),!" ) ! S(t! + ")

!(S(t),"# ) = S(t! + !)" S(t + !) R(S(t),!" ) = # ?

R(S(t),!" ) = ! d#

dt in the vicinity of (S(t),!" )

èNon-stationary function, difficult to model

èAlgorithms for empirical evaluation of de/dt with statistical regression

è IAC (2004, 2007), R-IAC (2009), SAGG-RIAC (2010) McSAGG-RIAC (2011), SGIM (2011)

Non !

(SVR,GPR,NN,...)

Intrinsic  Mo�va�on  

Berlyne  (1960),  Csikszentmihalyi  (1996)   Dayan  and  Belleine  (2002)  

R :!" ! r " #

Quelle fonction de récompense

générique ?

(6)

Es�ma�ng  Learning  Progress:    

a  Regression  Problem  

Challenge 1) Estimate R with few samples

Challenge 2) Meta-exploration/meta-exploitation problem

No   rela�on   between  

X  and  Y  

M M

S R

(7)

How  to  es�mate  it  usefully  for   future  experiments?  

Update locally, but not too much for generalization

No   rela�on   between  

X  and  Y  

M M

S

R ?

(8)

Region-­‐based  evalua�on  of  LP  

No   rela�on   between  

X  and  Y  

M

S R

R1 R2 R3 R4 R5

Coarse representation of R: Fast to estimate (sample and comp. complexity) R1 R2 R3 R4 R5

M

Pred errors

Episodic Time

(9)

From  coarse  to  fine  regions:    

Sub-­‐divide  where  it  is  most   interes�ng/explored  

Progressive categorization

(10)

Sensori state

   

Action state Context

state

Classic machine learner

(e.g. neural net, SVM, Gaussian process) M

Meta machine learner metaM

Progressive categorization

Local model of learning progress

. . .

e(t) =

S(t +1)! S(t! +1) S(t)

S(t +1) S(t! +1)

Sensori state at t+1

Prediction

LP(t,R1) LP(t,R2)

Error feedback

Action selection

system Intrinsic reward

Local model of learning progress R1

R2 R3

IAC, IEEE Trans. EC (2007)

R-IAC, IEEE Trans. AMD (2009)

(11)

Bootstrapping  a  useful  model  of  R  

(even  step-­‐wise)  can  take  �me  in  high-­‐

dimensions  (or  just  large  domains)  

Time Quality

of model

LIFETIME ?

ACTIVE LEARNING: can be worse than random in bootstrap phase

RANDOM

EXPLORATION

(12)

Strategies  for  scalable  ac�ve   learning  in  very  large  spaces  

Control

Social guidance, scaffolding

(Adaptive)

Maturation and embodiment

Task space exploration

(13)

Task  space  ac�ve  explora�on  

What is often most useful is knowing P(M|S) è Leveraging of redundancy by learnng only what is enough

(14)

No   rela�on   between  

X  and  Y  

M S

Knowing these parts of the forward model is sufficient to know how to produce all possible effects in the Task space

(15)

Ac�ve  learning  of  inverse  models   SAGG-­‐RIAC  (RAS,  2012)  

predict :(S(t),!" ) ! proj(S(t! + "))

(Context, Movement) à

Effect

Redundancy of sensorimotor spaces

From the active choice of action, followed by observation of effect …

… to the active choice of effect, followed by the search of a corresponding action policy through goal-directed

optimization (e.g. using NAC, POWER, PI^2-CMA, …)

è self-defined RL problem

control :(S(t), R! )optimisation!

"#!

R! :"# ! "

competence progress : R! ("!!,new) ! R" (#!!,init )

Spontaneous active exploration of a space of fitness functions parameterized by where one iteratively chooses the which maximizes the empirical evaluation of: R!

!

(16)

No   rela�on   between  

X  and  Y  

M S

Episodic Time

Target errors

S

R

R = Competence Progress

(17)

Matura�onal  constraints  

Control

Perception

infant vision adult vision

S,!"

P Progressive growths of DOF number and spatio-temporal resolution

P Adaptive maturational schedule controlled by active learning/learning progress

(Bjorklund, 1997; Turkewitz and Kenny, 1985)

(18)

Learning omnidirectional locomotion

èPerformance higher than more classical active learning algorithms in real sensorimotor spaces (non-stationary, non homogeneous)

(IEEE TAMD 2009; ICDL 2010, 2011; IROS 2010; RAS 2012)

θ

u

v

3 DOF/Leg

!1;1

[ ]24 [!1;1]3

Experimental  evalua�on  

Control Space: Task Space:

(19)
(20)

SGIM:  Socially  Guided  Intrinsic  Mo�va�on  

ICDL-Epirob, 2011

(21)

Baranes, A., Oudeyer, P-Y. (2012) Active Learning of Inverse Models with Intrinsically Motivated Goal Exploration in Robots, Robotics and Autonomous Systems.

http://www.pyoudeyer.com/RAS-SAGG-RIAC-2012.pdf

Baranes, A., Oudeyer, P-Y. (2011) The Interaction of Maturational Constraints and Intrinsic Motivation in Active Motor Development, in Proceedings of IEEE ICDL-Epirob 2011.

http://flowers.inria.fr/BaranesOudeyerICDL11.pdf

Lopes, M., Melo, F., Montesano, L. (2009) Active Learning for Reward Estimation in Inverse Reinforcement Learning, European Conference on Machine Learning (ECML/PKDD), Bled, Slovenia, 2009.

http://flowers.inria.fr/mlopes/myrefs/09-ecml-airl.pdf

Nguyen, M., Baranes, A., Oudeyer, P-Y. (2011) Bootstrapping Intrinsically Motivated Learning with Human Demonstrations, in Proceedings of IEEE ICDL-Epirob 2011.

http://flowers.inria.fr/NguyenBaranesOudeyerICDL11.pdf

Oudeyer P-Y, Kaplan , F. and Hafner, V. (2007) Intrinsic Motivation Systems for Autonomous Mental Development, IEEE Transactions on Evolutionary Computation, 11(2), pp. 265--286.

http://www.pyoudeyer.com/ims.pdf Baranes, A., Oudeyer, P-Y. (2009)

R-IAC: Robust intrinsically motivated exploration and active learning, IEEE Transactions on Autonomous Mental Development, 1(3), pp. 155--169.

Exploration in Model-based Reinforcement Learning by Empirically Estimating Learning Progress, Manuel Lopes, Tobias Lang, Marc Toussaint and Pierre-Yves Oudeyer. Neural Information Processing Systems (NIPS 2012), Tahoe, USA. (pdf)

The Strategic Student Approach for Life-Long Exploration and Learning, Manuel Lopes and Pierre- Yves Oudeyer. under review, . (pdf)

Références

Documents relatifs

This dose blocks fever and reverses endotoxin-induced suppression of pulsatile GnRH and LH secretion in ovariectomized ewes (8, 9). The experiment was conducted according to a

Atelier n°23 : Mise en œuvre de la classe inversée dans le supérieur - L’approche par “ Learning Outcomes ” et par compétences associée au dispositif de la Classe

Ecrire un script "Existfile.sh" qui comporte un paramètre $1 indiquant le fichier dont on veut connaitre l'existence. S'il existe montrer que c'est un fichier ou

Figure 1- 12.Waterfilling implémenté avec les antennes à gain variable ... Guide d'onde rectangulaire ... Décomposition de l'onde à l'intérieur d'un guide d'onde rectangulaire

Catie [25] a étudié le contrôle en boucle retour du moulage de pièces par injection, en choisissant la température et la pression du front d'écoulement dans

Keywords: Blood-brain barrier, mouse brain microvascular endothelial cells, primary cell culture, bEnd.3 cell line, drug permeability, log BB, drug screening;

Il semble qu’enfin, en 1896, les représentations liées à l’implication publique des candi- dats présidentiels dans leur campagne aient rattrapé les pratiques existantes tout

Douze ans de recherche en alphabétisation des adultes en français au Canada : 1994-2005 22/137 Les autres thèmes les plus étudiés sont alphabétisme et travail (13, soit 20 %) ;