• Aucun résultat trouvé

Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning

N/A
N/A
Protected

Academic year: 2022

Partager "Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning"

Copied!
2
0
0

Texte intégral

(1)

Vrije Universiteit Brussel

Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning Coppens, Youri

Publication date:

2021

License:

Unspecified

Document Version:

Final published version Link to publication

Citation for published version (APA):

Coppens, Y. (2021). Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning. Poster session presented at AI Flanders Research Days 2021, .

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research.

• You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal

Take down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Download date: 09. May. 2022

(2)

Synthesising Reinforcement Learning Policies Through Set-Valued Inductive Rule Learning

Youri Coppens Denis Steckelmacher Catholijn M. Jonker Ann Nowé Towards Explainable Reinforcement Learning

Model Distillation via Inductive Rule Learning

Contribution 1: Exploit meta-information on “ almost” equally good actions

➢ Increase accuracy by capturing exceptions per rule when a rule and policy prediction mismatch, e.g. muddy grid world Contribution 3: Interactive Hierarchical Rule Refinement

Contribution 2: Adapt WRA heuristic of CN2

This work is part of the Flanders AI research program WWW.AIRESEARCHFLANDERS.BE

1. IF X THEN Class=A 2. IF Y THEN Class=B

...

N. IF TRUE THEN Class=B

Final Paper (SpringerLink): https://doi.org/10.1007/978-3-030-73959-1_15

Preliminary Paper (Open Access): https://www.ida.liu.se/~frehe08/tailor2020/TAILOR_2020_paper_48.pdf Code: https://gitlab.ai.vub.ac.be/yocoppen/svcn2

Talk: https://youtu.be/8ech6kv3ebkc Contact: yocoppen@ai.vub.ac.be

∀𝑠 ∈ 𝑆: ℒ 𝑠 = 𝑎 ∈ 𝐴 𝑠 : 𝜋 𝑠, 𝑎 ≥ 𝜏 ∗ max

𝑎′ 𝜋(𝑠, 𝑎 ) ∀ 𝑎 ∈ 𝐴(𝑠)

The most likely (best) action to take

𝑊𝑅𝐴 𝑠𝑒𝑡 = 𝐸 ෠

𝐸 × 𝑃 ෠

𝐸 ෠ − 𝑃 𝐸

# samples overall

# covered samples

Policy 𝜋

StateActions

Action

Agent

State

Reward

➢ Autonomous learning through experience and feedback

➢ Neural networks represent complex policies, mapping states to values

➢ Interpret and verify learned behavior in a human understandable form

CN2 rule learning

Policy recording Use case: Mario AI Benchmark

1.1 IF X<=18 AND Y=10 AND X>=8 AND X<=9 THEN Class=UP 1.2 IF X<=18 AND X>=10 THEN Class=RIGHT

1.3 IF X<=18 AND X<=8 THEN Class=RIGHT 1.4 IF X<=18 AND Y=11 THEN Class=UP

1.5 IF X<=18 AND Y=9 THEN Class=DOWN 1.6 IF X<=18 THEN Class=RIGHT

2 IF X=19 THEN Class=UP 1 IF X<=18 THEN Class=RIGHT

2 IF X=19 THEN Class=UP

➢ Desired trade-off in

coverage and accuracy

Distilled rule list

➢ Some actions have similar/close quality according to the learned policy

➢ Allow indifference between such actions

➢ Transform policy recordings to multi-valued sets based on a threshold 𝜏

Références

Documents relatifs

( ةليضف يواوز ةسارد 2008 - 2009 ةركذم ) رييستلا مولع يف ريتسجاملا ةداهش لينل ةمدقم ةيلام عرف &#34;رئازجلا يف ةديدجلا تامزيناكيملا

Malaysia, Kuwait, Pakistan.. ب ‌ ـىأ فم ربتعي مذلا ،ؼقكلا عم ايتلبماعت يف ةيقار ةراضحب رخزي خيراتب ةيملبسلإا ةملأا تزيمت

:ﺔﺳارﺪﻟا ﺔيﺠﻬﻨﻣ ﺪمتعا ﻩﺬﻫ ﰲ ىﻠع ﺞﻬﻨﳌا ﻲفﺻﻮﻟا ﻲﻠﻴﻠﺤتﻟا ﰲ ضﺮع ﻢﻴﻫﺎفﳌا ﺮطﻷاو ﺔﻴفﺴﻠفﻟا .ﺔﺳارﺪﻟا تاﲑﻐتﳌ ﺔﻄﺧ :ﺔﺳارﺪﻟا ﻄﻄخ ﺪقﻟ ﺖ ﺔﺳارﺪﻟا ﻩﺬﻫ :ﻲﻠﻳ ﺎمك ﺖمﻈﻧ روﺎﳏ ﺔﺛﻼﺛ ﰲ

رئازجلا لارنج يتيسوسو يرئازجلا ينطولا كنبلا نم لك ةلاح ةسارد:يناثلا لصفلا يناثلا ثحبملا ةشقانم و جئاتنلا ليلحت: تايضرفلا اهقيبطت يلي اميف لوانتنس

nechrifinet,… * ﻦﻣ ﻒﻨﺼﻟا اﺬﻫ ﱃإ هﺟﻮﺘﻟا ﻞﺟأ ﻦﻣ ﺔﻴﻋﺎﲨ وأ ﺔﻳدﺮﻓ تﻻوﺎﳏ ﻦﻋ ةرﺎﺒﻋ ﺖنﺎﻛ ةﲑﺧﻷا ﻩﺬﻫ .ﺎﻫﲑﻏو ، ﺎﺠﺘﻠﻟ تاﻮﻨﻗ وأ ﻊﻗاﻮﻤﻛ ﺎﻬفﻴﻨﺼﺗ ﻦﻜﳝ ﻻ هنأ ﻻإ ،قاﻮﺳﻷا ضﺮﻋ ﺔﻴنﺎﻜﻣإ

vertebra elements were collected, including: two complete dorsal vertebrae (DFMMh/FV 712.1 and DFMMh/FV 652.4), a fragmentary neural arch (DFMMh/FV 890.1), a complete neural

Differing from all other Plesiochelys species in the following combination of features: relatively large carapace (up to 550 mm in length); shell bones relatively thick; carapace oval

Cette méthode est un outil très important pour le soignant, car elle fait partie des réflexions du soignant afin de l’aider à être au clair dans la prise en soin