Récemment recherché

Aucun résultat trouvé

Étiquettes

Aucun résultat trouvé

Document

Aucun résultat trouvé

Accueil Écoles Thèmes

Connexion

Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning

Partager "Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning"

N/A

N/A

Protected

Année scolaire: 2022

Info

Protected

Academic year: 2022

Partager "Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning"

Copied!

2

0

0

2

0

0

Chargement.... (Voir le texte intégral maintenant)

Télécharger maintenant ( 2 Page )

Texte intégral

(1)

Vrije Universiteit Brussel

Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning Coppens, Youri

Publication date:

2021

License:

Unspecified

Document Version:

Final published version Link to publication

Citation for published version (APA):

Coppens, Y. (2021). Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning. Poster session presented at AI Flanders Research Days 2021, .

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.

• You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal

Take down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Download date: 09. May. 2022

(2)

Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning

Youri Coppens Denis Steckelmacher Catholijn M. Jonker Ann Nowé Towards Explainable Reinforcement Learning

Model Distillation via Inductive Rule Learning

Contribution 1: Exploit meta-information on “ almost” equally good actions

➢ Increase accuracy by capturing exceptions per rule when a rule and policy prediction mismatch, e.g. muddy grid world Contribution 3: Interactive Hierarchical Rule Refinement

Contribution 2: Adapt WRA heuristic of CN2

This work is part of the Flanders AI research program WWW.AIRESEARCHFLANDERS.BE

1. IF X THEN Class=A 2. IF Y THEN Class=B

...

N. IF TRUE THEN Class=B

Final Paper (SpringerLink): https://doi.org/10.1007/978-3-030-73959-1_15

Preliminary Paper (Open Access): https://www.ida.liu.se/~frehe08/tailor2020/TAILOR_2020_paper_48.pdf Code: https://gitlab.ai.vub.ac.be/yocoppen/svcn2

Talk: https://youtu.be/8ech6kv3ebkc Contact: yocoppen@ai.vub.ac.be

∀𝑠 ∈ 𝑆: ℒ 𝑠 = 𝑎 ∈ 𝐴 𝑠 : 𝜋 𝑠, 𝑎 ≥ 𝜏 ∗ max

𝑎′ 𝜋(𝑠, 𝑎 ^′ ) ∀ 𝑎 ^′ ∈ 𝐴(𝑠)

The most likely (best) action to take

𝑊𝑅𝐴 _𝑠𝑒𝑡 = 𝐸 ෠

𝐸 × 𝑃 ෠

𝐸 ෠ − 𝑃 𝐸

# samples overall

# covered samples

Policy 𝜋

State → Actions

Action

Agent

State

Reward

➢ Autonomous learning through experience and feedback

➢ Neural networks represent complex policies, mapping states to values

➢ Interpret and verify learned behavior in a human understandable form

CN2 rule learning

Policy recording Use case: Mario AI Benchmark

1.1 IF X<=18 AND Y=10 AND X>=8 AND X<=9 THEN Class=UP 1.2 IF X<=18 AND X>=10 THEN Class=RIGHT

1.3 IF X<=18 AND X<=8 THEN Class=RIGHT 1.4 IF X<=18 AND Y=11 THEN Class=UP

1.5 IF X<=18 AND Y=9 THEN Class=DOWN 1.6 IF X<=18 THEN Class=RIGHT

2 IF X=19 THEN Class=UP 1 IF X<=18 THEN Class=RIGHT

2 IF X=19 THEN Class=UP

➢ Desired trade-off in

coverage and accuracy

Distilled rule list

➢ Some actions have similar/close quality according to the learned policy

➢ Allow indifference between such actions

➢ Transform policy recordings to multi-valued sets based on a threshold 𝜏

Références

Télécharger maintenant ( PDF - 2 Page - 833.52 KB )

Documents relatifs

مصادر التمويل الداخلية و دورها في تنشيط و تنويع الإستثمار في المؤسسة الإقتصادية

( ةليضف يواوز ةسارد 2008 - 2009 ةركذم ) رييستلا مولع يف ريتسجاملا ةداهش لينل ةمدقم ةيلام عرف "رئازجلا يف ةديدجلا تامزيناكيملا

نماذج الابتكار المالي في مجال الوقف و دورها في تحقيق التنمية المستدامة

Malaysia, Kuwait, Pakistan.. ب ‌ ـىأ فم ربتعي مذلا ،ؼقكلا عم ايتلبماعت يف ةيقار ةراضحب رخزي خيراتب ةيملبسلإا ةملأا تزيمت

نموذجا- Samsung العلامة التجارية كأداة لحماية المستهلك الالكتروني - شركة

:ﺔﺳارﺪﻟا ﺔيﺠﻬﻨﻣ ﺪمتعا ﻩﺬﻫ ﰲ ىﻠع ﺞﻬﻨﳌا ﻲفﺻﻮﻟا ﻲﻠﻴﻠﺤتﻟا ﰲ ضﺮع ﻢﻴﻫﺎفﳌا ﺮطﻷاو ﺔﻴفﺴﻠفﻟا .ﺔﺳارﺪﻟا تاﲑﻐتﳌ ﺔﻄﺧ :ﺔﺳارﺪﻟا ﻄﻄخ ﺪقﻟ ﺖ ﺔﺳارﺪﻟا ﻩﺬﻫ :ﻲﻠﻳ ﺎمك ﺖمﻈﻧ روﺎﳏ ﺔﺛﻼﺛ ﰲ

و نموذج العائد على الإستثمار dupont قياس الأداء المصرفي وفق نموذج

رئازجلا لارنج يتيسوسو يرئازجلا ينطولا كنبلا نم لك ةلاح ةسارد:يناثلا لصفلا يناثلا ثحبملا ةشقانم و جئاتنلا ليلحت: تايضرفلا اهقيبطت يلي اميف لوانتنس

واقع المستهلك الجزائري في ظل استخدام تكنولوجيا المعلومات والاتصال

nechrifinet,… * ﻦﻣ ﻒﻨﺼﻟا اﺬﻫ ﱃإ هﺟﻮﺘﻟا ﻞﺟأ ﻦﻣ ﺔﻴﻋﺎﲨ وأ ﺔﻳدﺮﻓ تﻻوﺎﳏ ﻦﻋ ةرﺎﺒﻋ ﺖنﺎﻛ ةﲑﺧﻷا ﻩﺬﻫ .ﺎﻫﲑﻏو ، ﺎﺠﺘﻠﻟ تاﻮﻨﻗ وأ ﻊﻗاﻮﻤﻛ ﺎﻬفﻴﻨﺼﺗ ﻦﻜﳝ ﻻ هنأ ﻻإ ،قاﻮﺳﻷا ضﺮﻋ ﺔﻴنﺎﻜﻣإ

Postcranial axial skeleton of europasaurus holgeri (dinosauria, sauropoda) from the Upper Jurassic of Germany : implications for sauropod ontogeny and phylogenetic relationships of basal Macronaria

vertebra elements were collected, including: two complete dorsal vertebrae (DFMMh/FV 712.1 and DFMMh/FV 652.4), a fragmentary neural arch (DFMMh/FV 890.1), a complete neural

The rediscovery and redescription of the holotype of the Late Jurassic turtle plesiochelys etalloni

Diﬀering from all other Plesiochelys species in the following combination of features: relatively large carapace (up to 550 mm in length); shell bones relatively thick; carapace oval

L'accompagnement infirmier du patient atteint de sclérose en plaques : travail de Bachelor

Cette méthode est un outil très important pour le soignant, car elle fait partie des réflexions du soignant afin de l’aider à être au clair dans la prise en soin

Téléchargez tous les documents en téléchargeant vos documents d'étude.

Votre document sera enrichi, partagé sur 123dok FR pour vous aider à étudier.

Documents relatifs

Social bookmarking tool based on structurable tags for communities of teachers

Social bookmarking tool based on structurable tags for communities of teachers

2

0

0

Ergodic Effects in Token Circulation

Ergodic Effects in Token Circulation

23

0

0

Correlations between two particles in jets

Correlations between two particles in jets

5

0

0

Synthesis and evaluation of substituted indolizines as peptidomimetics of RGD tripeptide sequence

Synthesis and evaluation of substituted indolizines as peptidomimetics of RGD tripeptide sequence

19

0

0

On the binding of N-acetylglucosamine and chitobiose to hen lysozyme in the solid state at high temperature

On the binding of N-acetylglucosamine and chitobiose to hen lysozyme in the solid state at high temperature

4

0

0

Immunostimulating properties and three-dimensional structure of two tripeptides from human and cow caseins

Immunostimulating properties and three-dimensional structure of two tripeptides from human and cow caseins

5

0

0

Optimal posting price of limit orders: learning by trading

Optimal posting price of limit orders: learning by trading

41

0

0