• Aucun résultat trouvé

A neuro-computational model showing the effects of ventral striatum lesion on the computation of reward prediction error in VTA

N/A
N/A
Protected

Academic year: 2021

Partager "A neuro-computational model showing the effects of ventral striatum lesion on the computation of reward prediction error in VTA"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02388198

https://hal.inria.fr/hal-02388198

Submitted on 1 Dec 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

A neuro-computational model showing the effects of ventral striatum lesion on the computation of reward

prediction error in VTA

Pramod Kaushik, Frédéric Alexandre

To cite this version:

Pramod Kaushik, Frédéric Alexandre. A neuro-computational model showing the effects of ventral striatum lesion on the computation of reward prediction error in VTA. NeuroFrance, the international conference of the french society of Neuroscience, May 2019, Marseille, France. 2019. �hal-02388198�

(2)

A neuro-computational model showing the effects of ventral striatum lesion on the computation of reward prediction error in VTA

Pramod KAUSHIK

1,2,3

, Frédéric ALEXANDRE

1,2,3

1INRIA Bordeaux Sud-Ouest, 200 Avenue de la Vieille Tour, 33405 Talence, France

2LaBRI, Université de Bordeaux, Bordeaux INP, CNRS, UMR 5800, Talence, France

3IMN, Université de Bordeaux, CNRS, UMR 5293, Bordeaux, France

Abstract

Model and Experimentation

Conclusion and Acknowledgements

N eur oF ranc e 2019, M arseill e, F ranc e – 22-24 M ay , 2019

●Modeling Pavlovian learning which is a fundamental learning mechanism in animals.

● Pairs a neutral conditioned stimulus (CS) with a rewarding unconditioned stimulus (US) and CS becomes rewarding stimulus after training

● Model focuses on the mechanism of RPE within pavlovian learning and the effects of Ventral Striatum (VS) lesions to illustrate a fundamental dissociation of magnitude and timing replicating experimental studies.

Conclusion

- Represents a model-free reinforcement learning system and learns the CS-US association in classical conditioning.

- Posits the brain could be solving the dimensions involved in classical conditioning separately in such a distributed manner.

- Such distributed processing could enable the same dimensions to be used to process other natural phenomena..

REFERENCES

Fiorillo, Christopher D., Philippe N. Tobler, and Wolfram Schultz. "Discrete coding of reward probability and uncertainty by dopamine neurons." Science 299.5614 (2003): 1898-1902.

Keiflin, Ronald, and Patricia H. Janak. "Dopamine prediction errors in reward learning and addiction: from theory to neural circuitry." Neuron 88.2 (2015): 247-263.

[Johnson et al., 2016] Johnson, M., Hofmann, K., Hutton, T., and Bignell, D. (2016). The malmo platform for artificial intelligence experimentation. In IJCAI, pages 4246–4247.

Takahashi, Yuji K., et al. "Temporal specificity of reward prediction errors signaled by putative dopamine neurons in rat VTA depends on ventral striatum." Neuron 91.1 (2016): 182-193.

Computational

neuroscience Dopamine Reinformcent

Learning

- Virtual lesions of VS to VTA GABA was made by disconnecting the link between them.

- Magnitude of reward is still conserved when lesions are made

- Timing information is lost indicating there are two dimensions to Pavlovian conditioning, namely timing and magnitude in a deviation from RL models

Features of the Model

- Portrays partial conditioning where VTA dopamine has acquired some CS firing and this expectation induces a partial expectation reducing US firing

- Not all early rewards have the same firing and sooner early rewards fire more than later early rewards in accordance with experiments (Fiorillo 2003)

- A new circuit with VTA GABA as a more biologically plausible expectation signal compared to VS (Keiflin 2015)

- VS Lesions do not affect magnitude encoding of the stimulus and only timing. (Takahashi 2017)

Model Diagram illustrating structures involved in RPE Computation

Pointed arrows represent excitatory connections ,while rounded arrows represent inhibitory projections. Dashed lines represent learnable connections, while solid lines represent fixed connections in the model.

Acknowledgements

We thank Bapi Raju in IIIT Hyderabad for constant guidance and feedback in this work.

Model Terms

VTA Ventral Tegmental Area

VS Ventral Striatum

LH Lateral Hypothalamus

IT Inferior temporal cortex

BLA Basolateral Amygdala

CE Central Amygdala

OFC Orbitofrontal Cortex

PPN Pedunculopontine nucleus

Computational Principles

The proposed model is composed of computational units where each unit represents a population and computes the mean activity of the population.

V(t) represents the membrane potential of the unit and the firing rate is a positive scalar of V(t) given by U(t).

where τ is the time constant of the cell, B is the baseline firing rate and η(t) is the additive noise term chosen randomly at each time step from an uniform distribution between −0.01 and 0.01.

Predictions

- CE and PPN FT encode magnitude of expectation - PPN through VTA GABA cancels dopamine

- A new circuit with VTA GABA as a more biologically plausible expectation signal compared to VS (Keiflin 2015)

- Early Reward cancellation of expectation happens within PPN - Learning of Time before Learning of Magnitude

Références

Documents relatifs

Current neural models of value-based decision-making consider choices as a 2-stage process, proceeding from the “valuation” of each option under consideration to the “selection”

The hypothesis of a common currency leads to two main pre- dictions: the incentive value of monetary and erotic cues should be represented in the same brain region(s), and

We give a very simple algorithm to compute the error and complemen- tary error functions of complex argument to any given

Fig. Complete Reward Cancellation shows the end of conditioning when the magnitude of the stimulus has been fully learnt and the CS firing at the VTA has the same amplitude as

The proposed approach for the prediction of miRNA promoters is a computational methodology based on the hybrid combination of an adaptive genetic algorithm with a nu-SVR

The role of the striatum in linguistic selection: Evidence from Huntington’s disease and computational modeling.. Maria Giavazzi, Robert Daland, Stefano Palminteri, Sharon

since a generic algorithm can only obtain new group elements by querying two known group elements to the generic group oracle; hence a reduction can always extract a

While other areas might be implied in RPE computations in the VTA, within our minimal model, we used functional relevant inputs to the VTA that were shown to be strongly affected