• Aucun résultat trouvé

Cortico-striatal activity driving compulsive reward seeking

N/A
N/A
Protected

Academic year: 2022

Partager "Cortico-striatal activity driving compulsive reward seeking"

Copied!
96
0
0

Texte intégral

(1)

Thesis

Reference

Cortico-striatal activity driving compulsive reward seeking

HARADA, Masaya

Abstract

L'addiction se caractérise par une consommation compulsive de drogue qui impliquent la mise en jeu des voies cortico-striatales. Cependant les causes neuronales de l'émergence de la recherche compulsive de drogue restent mal identifiées. Dans cette étude l'utilisation de la stimulation optogénétique des neurones dopaminergiques de l'aire tegmentale ventrale (ATV) est utilisée comme récompense. Certaines souris persévèrent à accomplir le conditionnement opérant dit “seek-take chain” pour obtenir cette récompense malgré le risque de punition. La transmission synaptique entre le cortex orbitofrontal (OFC) et le striatum et par conséquence l'activité des neurones de projection du striatum (SPNs) est potentialisée chez ces souris persévérantes. L'inhibition chémo-génétique des neurones pyramidaux de l'OFC est capable de réduire ce pic d'activité et ainsi de freiner la recherche compulsive de drogue. Cette étude démontre que le renforcement de la transmission synaptique OFC-DS change l'activité des neurones striataux, encourageant ainsi durablement un comportement compulsif.

HARADA, Masaya. Cortico-striatal activity driving compulsive reward seeking. Thèse de doctorat : Univ. Genève et Lausanne, 2021, no. Neur. 295

DOI : 10.13097/archive-ouverte/unige:154020 URN : urn:nbn:ch:unige-1540200

Available at:

http://archive-ouverte.unige.ch/unige:154020

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Faculté de médecine

DOCTORAT EN NEUROSCIENCES et de Lausanne

Professeur

Dr. Méd. Christian Lüscher, directeure de thèse, Dr. Vincent Pascoli, co-directeure de thèse

TITRE DE LA THESE

Cortico-striatal activity driving compulsive reward seeking

THESE

Faculté de médecine

pour obtenir le grade de Docteur

en Neurosciences par

de Japon

295

Editeur ou imprimeur

Faculté de médecine

Masaya HARADA

Genève

2021

(3)

2

Acknowledgement

I’d like to thank first to my thesis director, Prof. Christian Lüscher for letting me work in an excellent environment and for the guidance and support during my PhD. I also would like to express my gratitude to Dr. Vincent Pascoli for his teaching and guidance. I also appreciate the translation of the abstract into French, which I’m incapable of.

I also thank the members of my thesis committee: Prof. Anthony Holtmaat, Prof. Camilla Bellone and Prof. Carl Petersen for taking their time to read, discuss and evaluate my thesis work.

I want to thank all the current and former members of the Lüscher lab. All the feedbacks I got improved my work a lot. Ruud and Michael read my thesis, improving a lot. Agnes helped me with the experiments and Jennifer took great care of my animals.

My deepest thanks to my family, especially to my wife Ruri. We survived the crisis of covid- 19 together. I’m sure I could not have survived the strict lockdown without her. I will not forget this experience in the rest of my life. And thanks to her big effort, we got a beautiful daughter, Mirei. She was just born at November 2020 but already changed my life a lot.

Of course, my special thank goes to my parents, who supported me from Japan unconditionally.

(4)

3

Abstract

Addicts compulsively seek drug rewards. In mice, top down cortico-striatal projections have been implicated in persevering consumption of rewards even when punished. The temporo-spatial determinants of the activity underlying the emergence of compulsive reward seeking however remains elusive. Here we take advantage of a defining commonality of addictive drugs to train mice in a seek take chain, rewarded by optogenetic dopamine neuron self-stimulation (oDASS) in the ventral tegmental area (VTA). We found that mice persevering when seeking is punished, maintained an increased AMPA to NMDA ratio selectively at orbitofrontal (OFC) to dorsal striatum (DS) synapses but not at other cortico-striatal synpases. In addition an activity peak of spiny projection neurons (SPNs) in the OFC projecting area of the dorsal striatum at the moment of seeking lever retraction, signalling reward availability is detected even under punishment risk.

Conversely, in renouncing individuals, the activity peak degraded, once punishment was introduced. The link of causality is further supported by chemogenetic inhibition of OFC pyramidal neurons, which curbed the activity peak and reduced compulsion as did brief optogenetic hyperpolarization of SPNs, time-locked to the seeking lever retraction. Taken together, we conclude that the strengthening of OFC to DS synapses drives SPN activity when a reward predictive cue is delivered, encouraging reward seeking in subsequent trials.

(5)

4

RESUME EN FRANCAIS

L’addiction se caractérise par une consommation compulsive de drogue. Chez les rongeurs la voie cortico-striatal a été impliquée dans le contrôle de la persévérance de la consommation en dépit de conséquences négatives. Cependant les causes neuronales de l’émergence de la recherche compulsive de drogue restent mal identifiées. Dans cette étude l’utilisation de la stimulation optogénétique des neurones dopaminergiques de l’aire tegmentale ventrale (ATV) est utilisée comme récompense et les souris sont entraînées dans un conditionnement opérant dit “seek-take chain” pour obtenir cette récompense.La transmission synaptique entre le cortex orbitofrontal (OFC) et le striatum est potentialisée, comme indiqué par l’augmentation du ration AMPA/NMDA, uniquement chez les souris persévérantes malgré la punition. D’autres synapses cortico-striatales restent inchangées.

De plus l’activité des neurones de projection du striatum (SPNs) innervés par l’OFC augmente lors de la rétraction du levier de recherche qui informe de la disponibilité de la récompense. Cette activité n’est détectée uniquement chez les souris persévérante lorsque le risque de punition est introduit. A l’inverse, les souris qui renoncent à la récompense lorsqu’il y a un risque de punition, ce pic d’activité est réduit. Le lien de causalité est également supporté par le fait que l’inhibition chemogénétique des neurones pyramidaux de l’OFC est capable de réduire ce pic d’activité et ainsi de freiner la recherche compulsive de drogue. Une brève inhibition optogénétique utilisant l’hyperpolarisation des neurones du striatum, au moment de la rétraction du levier de recherche suffit à réduire la compulsion. Cette étude démontre que le renforcement de la transmission synaptique OFC-DS change l’activité des neurones striataux lorsque la récompense avec risque de punition est prédite, encourageant ainsi durablement la recherche compulsive.

(6)

5

LIST OF ABBREVIATIONS

A2AR adenosine A2A receptor AAV Adeno associated virus ACC Anterior cingulate cortex

AMPA α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid A/N ratio AMPA NMDA ratio

A-O Action-outocome

BNST Bed nucleus of the stria terminalis CeA Central amygdala

CNO Clozapine-N-oxide

CPA Conditioned place aversion CPP Conditioned place preference CRF Corticotropin releasing factor CS Conditioned stimulus

D1R Dopamine receptor type1 D2R Dopamine receptor type 2

DA Dopamine

DAT Dopamine transporter DBS Deep brain stimulation

DLPFC Dorsolateral prefrontal cortex DLS Dorsolateral striatum

DMS Dorsomedial striatum

(7)

6

DREADD Designer receptors exclusively activated by designer drugs DS Dorsal striatum

DSM Diagnostic and Statistical Manual of Mental Disorders EP Entopeduncular nucleus

EPSC Excitatory postsynaptic current FR Fixed ratio

FRET Förster resonance energy transfer GABA Gamma aminobutyric acid

GPe Globus pallidus externus HPA Hypothalamic-pituitary-adrenal LPT Long term potentiation

mPFC Medial prefrontal cortex LH Lateral jypothalamus MSN Medium spiny neuron NAc Nucleus accumbens

NET Norepinephrine transporter NMDA N-Methyl-d-aspartic acid

oDASS optogenetic dopamine neuron self-stimulation OFC Orbitofrontal cortex

PKA Protein kinase A PL Prelimbic cortex

PVT Paraventricular thalamus REP Reward prediction error

(8)

7 RL Reinforcement learning

RI Random interval SERT Serotonin transporter

SNc Substantia nigra pars compacta SNr substantia nigra pars reticulata S-R Stimulus-response

STN Subthalamic nucleus TD Temporal difference

TMS Transcranial magnetic stimulation US Unconditioned stimulut

VP Ventral pallidum

VTA Ventral tegmental area VTE Vicarious trial and error

(9)

8

Table of contents

Acknowledgement ... 2

Abstract ... 3

RESUME EN FRANCAIS ... 4

List of abbreviations ... 5

Table of contents ... 8

List of figures ... 11

1.Introduction ... 13 1.1 Addiction: From recreational use to compulsive use.

1.2.1 Increase of the dopamine concentration in the Nucleus Accumbens: Defining commonality of addictive drugs.

1.2.2 Nucleus Accumbens: A key structure for reinforcement learning monality of addictive drugs.

1.2.3 Dorsal striatum: Output of the basal ganglia

1.2.4 Dopamine receptors: Proteins essential for neuronal plasticity

1.2.5 Dopamine in Nucleus Accumbens: A key structure for reinforcement learning.

1.2.6 Maladaptive reward prediction errors in addiction.

1.3 Behavioral models of addiction.

1.3.1 Non-contingent models 1.3.1.1 Behavioral sensitization.

1.3.1.2 Conditioned place preference (CPP).

(10)

9 1.3.2 Contingent models

1.3.2.1 Reinstatement: model of relapse 1.3.2.2 Escalation

1.3.2.3 Progressive ratio

1.3.2.4 Individual vulnerability: 3 criteria model 1.3.2.5 Compulsivity.

1.3.3 Neuronal mechanism underlying transition to compulsion.

1.3.3.1 Loss of top down control

1.3.3.2 Goal-directed and habitual behavior

1.3.3.3 Goal-directed and habitual behavior in addiction 1.3.3.4 Lateralization of behavioral control in the striatum 1.4 Aim of this study

2.Methods ... 40 2.1 Animals

2.2 Stereotaxic injections.

2.3 Optogenetic self-stimulation apparatus.

2.4 Acquisition of taking response.

2.5 Training of the seek-take chain.

2.6 Evaluation of compulsivity.

2.7 Slice electrophysiology.

2.7.1 Ex vivo synaptic properties of the striatum.

2.7.2 Functional connectivity

(11)

10 2.7.3 DREADD validation.

2.7.4 eArchT3.0 validation.

2.8 Fibre photometry recordings.

2.9 Acute inhibition of the DS

2.10 Tissue preparation for imaging.

2.11 Statistics

2.12 Clustering method

3. Results ... 47 3.1. Compulsive oDASS seeking in a fraction of mice

3.2 Parallel cortico-striatum streams

3.3 Selective plasticity of Cortico-striatal synapses in compulsive mice 3.4 Activity peak at seek lever retraction in persevering mice

3.5 Continuous OFC inhibition during punished oDASS and photometry recording 3.6 Time locked inhibition of SPNs activity

4.Discussion ... 67 4.1 optogenetic Dopamine neurons self-stimulation (oDASS)

4.2 Parallel cortico-striatum projections

4.3 Neuronal circuit promoting compulsive reward seeking.

4.4 Role of OFC-DS pathway

4.5 Neuronal circuit involved in inhibitory control.

(12)

11 4.6 Conclusion

5.References ... 77

List of figures

Introduction

1.1 Increase of the dopamine concentration in the Nucleus Accumbens: Defining commonality of addictive drugs.

1.2 Major dopaminergic, glutamatergic and GABAergic connections to and from the VTA and NAc.

1.3 Firing of a DA neuron during Pavlovian conditioning.

1.4 Differences between natural rewards and drug of abuse.

1.5 Individual vulnerability to addiction.

1.6 Correlations between striatal D2R availability and glucose metabolism in frontal cortex.

1.7 Diagram of the organization of striatonigrostriatal projections.

Results

3.1 oDASS seeking despite negative consequences in a subset of mice.

3.2 Three parallel cortico-striatal pathways.

3.3 Functional connectivity at three parallel cortico-striatal pathways.

3.4 Specific potentiation at OFC-cDS synapses in persevering mice.

3.5 Persistent hyperactivity in cDS around seek lever retraction in perseverer mice.

3.6 Attenuation of compulsive reward seeking and flattening of calcium signal around seek lever retraction by chemogenetic inhibition of the OFC.

3.7 . Attenuation of compulsive reward seeking by time locked inhibition in the cDS.

S.3.1 Examples of outcomes of reward seeking behaviour from two mice.

S.3.2 Behavioural parameters of perseveres and renouncers.

S.3.3 Rectification index at three synapses.

(13)

12

S.3.4 Calcium signals in the cDS during oDASS seeking.

S.3.5 Seek lever retraction, not take lever presentation, induced robust calcium signal.

S.3.6 No difference in calcium signals in lDS around seek lever retraction between perseverers and renouncers.

S.3.7 Calcium signals in the lDS during oDASS seeking.

S.3.8 Expression of hM4D in the OFC.

S.3.9 Time locked inhibition in the cDS did not modify reward seeking behavior not associated with punishment.

(14)

13

Introduction

1.1 Addiction: From recreational use to compulsive use.

To maximize the probability of survival or reproduction, organisms need to seek out rewards and avoid harmful events. The reward system has evolved over time in a way of ensuring behavioral adaptations to reward availability and species survival. Rewards (e.g.

food, sex, social interaction) activate the system, shaping further behavior through learning. Addictive drugs activate the reward system, i.e. the mesolimbic dopamine system (Lüscher and Ungless, 2006), consequently promoting behaviours to procure them.

Addiction is a chronic disease. In the first phase, addictive drugs are used for recreational purposes. Only after extended period of drug intake, some individuals lose control over the consumption (Anthony et al., 1994). Drug seeking despite negative consequences, also named compulsive drug seeking, is a defining symptom of addiction (DSM-5). Clinical observations indicate that not all people exposed to addictive drugs become compulsive.

About 20% of people who use cocaine eventually become addicted (Anthony et al., 1994).

This means that compulsive drug use is developed only in a small subset of individuals who have experienced addictive drugs. However, the neuronal adaptation underlying compulsive drug seeking remain to be elucidated.

1.2.1 Increase of the dopamine concentration in the Nucleus Accumbens: Defining commonality of addictive drugs.

Dopamine neurons originate in the Ventral Tegmental area (VTA) of the mesencephalon.

They project to the Nucleus Accumbens (NAc) (Russo and Nestler, 2013), but also to other brain regions (e.g. Prefrontal cortex, Hippocampus, Amygdala). Although addictive drugs have diverse molecular targets, it is known that all the addictive drugs induce a surge of dopamine (DA) in the NAc (Di Chiara and Imperato, 1988). It remains

(15)

14

controversial how the increase of DA modulates behavior. For example, Tyrosine hydroxylase-Knockout rats which cannot produce DA still show conditioned place preference for morphine (Hnasko et al., 2005). However,optogenetic experiments show that stimulation of VTA DA neurons triggers conditioned place preference (Tsai et al., 2009) and animals willingly self-stimulate their VTA DA neurons (Covey and Cheer, 2019;

Pascoli et al., 2015, 2018), indicating that the DA release in NAc elicits a positive reinforcement effect. In addition, it is widely accepted the DA release is crucial for neuronal adaptations induced by addictive drug.

Addictive drugs have diverse molecular and cellular targets. Some drugs inhibit VTA GABAergic interneurons, leading to the disinhibition of DA neurons. For example, opioids bind to μ-opioid receptors mostly located on GABA neurons in VTA (Corre et al., 2018).

Stimulation of μ-opioid receptors leads to the inhibition GABA release through a presynaptic control and to the hyperpolarization of the GABA neurons. Benzodiazepines positively modulate GABAA receptors in GABA neurons in VTA, disinhibiting DA neurons (Tan et al., 2010). Other drugs such as cocaine or methamphetamine act mainly at the terminals of VTA DA neurons in the NAc. Cocaine blocks the DA transporter (DAT) and inhibits the reuptake of DA (Lüscher and Ungless, 2006). Amphetamine has more complex molecular mechanisms. Amphetamine is taken up by the terminals of DA neurons, playing the role of “false substrate” at the DAT. Inside the terminals of DA neurons, it blocks the vesicular monoamine transporter, leading to the reverse transport of DA through the DAT (Ago et al., 2016). These findings suggest that DA release in the NAc is essential to mediate the effects of addictive drugs. Some drugs have more than one molecular target.

Cocaine blocks serotonin and norepinephrine transporter (SERT and NET, respectively) as well as DAT, causing the increase of serotonin and norepinephrine levels. These effects could also cause neuronal and behavioral adaptations. Recent development of optogenetics has allowed cell type specific activation. VTA DA neurons stimulation reproduced the main neuronal adaptations evoked by cocaine administration (Pascoli et al., 2014, 2015; Terrier et al., 2015), suggesting that the surge of DA commonly induced by addictive drugs is crucially involved in drug-evoked long-lasting adaptations.

(16)

15

Figure 1.1 Increase of the dopamine concentration in the Nucleus Accumbens: Defining commonality of addictive drugs.

Class I and II drugs inhibit GABA neurons (green) in the ventral tegmental area (VTA), leading to the disinhibition of dopamine neurons (red). Class III drugs act on the dopamine terminals in NAc. (Lüscher and Ungless, 2006)

1.2.2 Nucleus Accumbens: A key structure for reinforcement learning

The Nucleus Accumbens (NAc) has several types of neurons. Approximately 95 % of neurons in NAc are GABAergic medium-sized spiny neurons (MSNs). MSNs express either dopamine receptor type 1 (D1) or type 2 (D2), and they are called D1-MSNs or D2- MSNs, respectively. A small population of MSNs expresses both D1 and D2 receptors (Bertran-Gonzalez et al., 2008). D1 and D2-MSNs have different expression of neuropeptides. D1-MSNs express dynorphin and Substance P, while D2-MSNs express enkephalin (Gerfen and Surmeier, 2011). It has been shown that these neuropeptides activate autoreceptors, leading to presynaptic depression (Creed et al., 2016), but their function has yet to be fully investigated. D1 and D2-MSNs also have different projection targets. VTA or lateral hypothalamus (LH) receive input mainly from D1-MSNs (Bocklisch et al., 2013; O’Connor et al., 2015) while the ventral pallidum (VP) receives the input from

(17)

16

both D1 and D2-MSNs (Creed et al., 2016). In the LH, D1-MSNs project preferentially to GABAergic neurons, authorizing feeding behavior (O’Connor et al., 2015; Thoeni et al., 2020). D1-MSNs also preferentially project to GABAergic interneurons of the VTA (Bocklisch et al., 2013; Borgland et al., 2006; Yang et al., 2018). The activation of this projection disinhibits DA neurons in VTA, triggering DA release in the NAc. Recently, it has been demonstrated that D1-MSNs also project to DA neurons in the VTA directly.

However, these synapses only have GABA-B receptors, lacking GABA-A receptors (Edwards et al., 2017; Watabe-Uchida et al., 2012). Besides MSNs, about 1-2% of NAc neurons are cholinergic interneurons and the other 1-2% are GABAergic interneurons (Tepper and Bolam, 2004). The role of acetylcholine in NAc has yet to be investigated. It has been shown that VTA GABAergic neurons control the firing of cholinergic interneurons in NAc, enhancing associative learning (Brown et al., 2012). The firing of cholinergic interneurons is also modulated by D1-MSNs via release of substance P (Francis et al., 2019), which eventually leads to synaptic potentiation in D2-MSN through muscarinic acetylcholine type 1 receptor (M1R). In contrast, D1-MSNs express muscarinic acetylcholine type 4 receptor (M4R). Since it couples to Gi proteins, the activation of M4R is supposed to oppose D1R, which couples to Gs proteins.

(18)

17

Figure 1.2 Major dopaminergic, glutamatergic and GABAergic connections to and from the VTA and NAc. The primary reward system includes dopaminergic projections from the ventral tegmental area (VTA) to nucleus accumbens (NAc). NAc receives several glutamatergic from medial prefrontal cortex (mPFC), ventral hippocampus (Hipp), basolateral amygdala (Amy) and other cortex and thalamus. LH; lateral hypothalamus, LHb; lateral habenula, LDTg; lateral dorsal tegmentum, and RTMg; rostromedial tegmentum. Red, green and blue arrows show glutamatergic, dopaminergic and GABAergic projections, respectively (Russo and Nestler, 2013).

1.2.3 Dorsal striatum: Output of the basal ganglia

Then nucleus accumbens (NAc) is part of the striatum, which has been functionally divided into ventral striatum (VS or NAc) and dorsal striatum. The dorsal striatum can be further subdivided into dorsomedial striatum (DMS) and dorsolateral striatum (DLS), according to its function: limbic, associative and sensorimotor function respectively (Balleine and O’Doherty, 2010a; Belin et al., 2009; Gruber and McDonald, 2012; Shiflett et al., 2010;

Thorn et al., 2010; Yin and Knowlton, 2006). Consistent with this functional classification, a recent study recapitulated the connectome of specific cortical areas to sub-regions of the dorsal striatum (Hunnicutt et al., 2016). DMS receives inputs from cortices such as prelimbic (PL) and orbitofrontal cortex (OFC). On the other hand, DLS receives inputs from sensorimotor cortices such as primary motor cortex (M1) and somatosensory cortex (S1). The cellular organizations in the dorsal striatum is similar to that of the NAc. The majority (about 95%) of striatal neurons are GABAergic medium spiny neurons (MSNs), which can be further divided into D1 and D2 expressing MSNs. The remaining neurons are composed of GABAergic or cholinergic interneurons. In the dorsal striatum, D1-MSNs project to the internal segment of the globus pallidus (GPi) and the substantia nigra pars reticulata (SNr), while D2-MSNs project to external segment of the globus pallidus (GPe), from which neurons project to the subthalamic nucleus (STN). STN neurons project to GPi and SNr, where projections of D1-MSNs and D2-MSNs converge (Vicente et al., 2020).

The DS receives DA projections mainly from the substantia nigra pars compacta (SNc), degeneration of which causes Parkinson’s disease. Both the ventral and dorsal striatum play crucial roles in reinforcement learning (RL) (Graybiel, 2008). We will discuss how different parts of the striatum controls goal directed and habitual behaviour.

(19)

18

1.2.4 Dopamine receptors: Proteins essential for neuronal plasticity

D1 and D2 receptors have different molecular phenotypes. D1 receptors have a lower affinity for DA than D2R (Marcellino et al., 2012). During basal state, because of this low affinity, most of the D1 receptors (~20%) don’t bind to DA. When there is an increase in the concentration of DA, DA binds to D1 receptors. Because D1 receptors couple to Gs or Gs-like proteins (G-alphaOLF), PKA is activated when DA binds to the D1 (Gerfen and Surmeier, 2011). On the other hand, D2 receptors have relatively high affinity, so many D2 receptors (~70%) are tonically activated by DA during the basal state. When there is a decrease of DA concentration, D2 receptors become free from DA. Since D2 receptors couple to Gi proteins, the decrease of DA induces the disinhibition of PKA activity (Gerfen and Surmeier, 2011). In other words, because of the difference in affinity, D1 receptors mainly respond to the surge of DA, and D2 receptors respond to the dip of DA (Iino et al., 2020), and PKA is activated via activation of Gs signaling or inactivation of Gi signaling, respectively. What is the role of PKA in MSNs? PKA activity enhances the excitability via negative modulation of potassium channels (Lahiri and Bevan, 2020). On top of that, PKA activation leads to the phosphorylation of DARPP-32 (Svenningsson et al., 2004) and other signaling molecules essential to the induction of long-term potentiation (LTP).

However, the increase of PKA activity is not enough to induce LTP. Yagishita et al showed that in D1-MSNs, surge of DA promotes LTP only during a narrow time window (~2s) after the glutamatergic input (Yagishita et al., 2014). The same group also demonstrated that a dip of dopamine and simultaneous glutamatergic input triggers LTP in D2-MSNs (Iino et al., 2020). D2-MSNs also express adenosine receptor 2A (A2AR), which couples to Gs proteins. The activation of A2AR is essential for the induction of LTP (Shen et al., 2008).

To summarize, in MSNs the activation of Gs signal or inactivation of Gi signal promotes LTP and opposes LTD, and vice versa.

(20)

19

1.2.5 Dopamine in Nucleus Accumbens: A key structure for reinforcement learning.

We have seen that DA is a key molecule to the regulation of neuronal plasticity. So how does it affect behavior? It has been proposed that DA encodes a reward prediction error (RPE) (Schultz et al., 1997). In a classical Pavlovian conditioning experiment, a conditioned neutral stimulus (CS, e.g. tone) is paired with an unconditioned stimulus (US, e.g. food reward). Before training, dopamine neurons don’t respond to a neutral CS, and they show increased firing when the reward is delivered. After extended repetition of pairing of the CS and US, DA neurons display phasic firing exclusively during the CS delivery, and the delivery of the US does not induce the increase of firing rate anymore.

These studies led to the theory that DA neurons signal discrepancies between predicted and actual reward. In other words, animals ‘predict’ the reward when the CS is delivered, and when the reward is as good as they predicted, dopamine neurons don’t change their firing frequency. If the reward is better or worse than the prediction, they show increase or decrease of firing rate, respectively (positive or negative reward prediction error, respectively). This idea is supported by the experiments where the size of the reward is changed after the conditioning (Cohen et al., 2012; Schultz et al., 1997). Recent development of optogenetics has enabled manipulations of neurons in a time locked manner. The activation of DA neurons per se triggers the emergence of the reward prediction signal and the seeking behavior to the CS (Saunders et al., 2018). The inhibition of DA neurons at the moment of the reward delivery induces the reduction of the reward seeking behavior during the CS, and the activation of the DA neurons during the omission of the reward prevents the induction of extinction (Lee et al., 2020b). These findings indicate that the activities of DA neurons during the US are responsible for the update of the behavior. Interestingly, when the DA neurons are silenced during the CS, no change of the behavior is observed in a configuration of classical Pavlovian conditioning (Lee et al., 2020b). Further investigation is required to decipher the behavioral role of the DA signal during the CS. The RPE theory has been developed for decades, but there is some criticism on it. It has been shown that DA neurons increase the firing rate responding to the reward, while some DA neurons fire less when the punishment is delivered (Cohen et

(21)

20

al., 2012) and the others fire more (Bromberg-martin et al., 2010; Chakraborty et al., 2009).

These responses to the punishment cannot be explained by the classical RPE theory and show that DA neurons are quite a heterogeneous population. This heterogeneity could be explained by the diversity of the projections. The NAc is divided into subregions; NAc core and NAc shell, and NAc shell is further divided into NAc medio-ventral shell, NAc medio- dorsal shell and NAc lateral shell. When the punishment or punishment predictive cue is delivered, DA concentration in the NAc core, lateral shell and medio-dorsal shell is depressed, while in the NAc medio-ventral shell, DA concentration is increased (de Jong et al., 2018). Other dopamine projections, including to the medial prefrontal cortex, basolateral amygdala and tail of striatum are also shown to be crucial for aversive learning (Verharen et al., 2020). These data show that there is a subset of DA neurons not following RPE rules.

Figure 1.3 Firing of a DA neuron during Pavlovian conditioning. (Top) Before learning, the dopamine neuron is activated by the reward in the absence of prediction. (Middle) After learning, the reward predictive cue triggers the increase of firing rate of the dopamine neuron but not the reward delivery. (Bottom) After learning, the reward predictive cue induces the increase of the dopamine neuron and the omission of the reward triggers the suppression of firing at the time when the reward would have been delivered. CS, conditioned, reward-predicting stimulus; R, primary reward.(Schultz et al., 1997)

(22)

21

1.2.6 Maladaptive reward prediction errors in addiction.

So far, we have seen that both natural rewards (e.g. food) and drugs of abuse (e.g.

cocaine) induce an increase of DA concentration in the NAc. One might ask: What is the difference between natural rewards and addictive drug? One critical difference is in the duration of the DA transient during the reward delivery. Natural rewards trigger a short DA transient, while drugs of abuse trigger a long lasting transient (Keiflin and Janak, 2015).

Another important difference comes from the RPE theory. In the case of natural rewards, the DA transient encodes the reward prediction error. During reward delivery, animals evaluate the reward, and the amplitude of the DA transient represents the discrepancy between the predicted and actual reward (Cohen et al., 2012; Schultz, 1998). On the other hand, in the case of a drug of abuse, a strong DA transient is always induced during the reward delivery, because of the drugs pharmacological property. Therefore, according to the RPE theory, drug rewards have positive reward prediction error even after repeated intake, resulting in the continuous increase of the value given to this drug reward. After repeated exposure, the value of the states associated with the drug reward becomes bigger than any other states, which means that the reward system is ‘hijacked’ by addictive drugs. This model suggests that the same neuronal circuit is responsible for the natural reward seeking and drug reward seeking. Actually, this has been observed in several brain regions. After self-administration of natural rewards or drug reward, a synaptic potentiation is observed in VTA dopamine neurons (Chen et al., 2008; Ungless et al., 2001). This potentiation is long lasting in the case of cocaine for at least three months, but in contrast, the synaptic strength returns to the basal level after 3 weeks in the case of food reward (Chen et al., 2008). Another example is observed in the medial prefrontal cortex (mPFC) to dorsal striatum (DS) pathway. Both alcohol and sucrose self- administration induce the potentiation at this pathway (Ma et al., 2018). At this pathway, only alcohol but not sucrose changes the composition of NMDA receptors, making the synapses prone to further potentiation (Ma et al., 2018). These examples imply the possibility that the potentiation induced by self-administration of drugs of abuse is longer lasting than natural rewards because of the difference in molecular mechanisms. As we

(23)

22

have discussed, all addictive drugs induce an increase of DA concentration in NAc. It has been reported that after the administration of cocaine (Ma et al., 2014; Pascoli et al., 2012) or morphine (Hearing et al., 2016), synaptic potentiation is induced at several excitatory synapses in the NAc. This plasticity is crucial for locomotor sensitization (Pascoli et al., 2012) or cue-induced seeking (Hearing et al., 2016), as we will discuss later.

Figure 1.4 Differences between natural rewards and drug of abuse. (A and B) DA release evoked by natural reward (A) and drug of abuse (B).Before learning, reward related cue presentation does not induce a dopamine transient in the case of natural or drug rewards, but the delivery of the reward results in phasic dopamine release in the NAc (top). After extended training, both natural reward and drug reward related cue presentation triggers a dopamine transient. However, in the case of natural rewards, dopamine does not respond to the reward delivery while drug reward still causes massive dopamine release in the NAc (bottum).(C) Theoretically the cue value of addictive drug never reaches plateau because of its pharmacological property (Keiflin and Janak, 2015).

1.3 Behavioral models of addiction.

We have witnessed the advances of behavioral models of addiction for decades. Here we will discuss the pros and cons for each model of addiction and the neuronal mechanisms underlying it.

1.3.1 Non-contingent models

Behavioral models where drugs are administrated in a non-contingent way (i.e. by the experimenter) are probably the easiest ways of studying neuronal adaptations induced by drug of abuse. Because of this advantage, these models have been used in many studies.

(24)

23

1.3.1.1 Behavioral sensitization.

According to the ‘Incentive-Sensitization theory’, addictive behavior is due to neuronal adaptations induced by repeated drug consumption. These neuronal adaptations are considered to be responsible for the increase of ‘incentive salience (not pleasure)’ to reward-associated stimuli (Robinson and Berridge, 1993). This model was inspired by the finding that locomotor activity increased with repeated exposure to a constant dose of the addictive drug. This behavioral sensitization has been observed with many addictive drugs, including cocaine (Creed et al., 2015; Deguchi et al., 2016; Pascoli et al., 2012), amphetamine (Vries et al., 1999) and morphine (Badiani et al., 2000). Cross sensitization of two different drugs, such as morphine and cocaine was also observed (Valjent et al., 2010), suggesting that the neuronal adaptations underlying locomotor sensitization has commonality between several addictive drugs. The neuronal adaptations underlying the behavioral sensitization have been investigated for decades. It has been suggested that AMPA mediated neuronal transmission in the NAc is playing a critical role in behavioral sensitization (Bell and Kalivas, 1996; Boudreau and Wolf, 2005). After 5 daily cocaine administrations and 10-14 days of incubation, AMPA/NMDA ratio is increased in the NAc (Kourrich et al., 2007). However, when AMPA/NMDA ratio is measured 24h after the last cocaine injection, it is lower than saline treated mice (Kourrich et al., 2007). This bidirectional change of AMPA/NMDA ratio looks puzzling, but the molecular mechanism has been suggested. When animals are exposed to cocaine, silent synapses, which lack AMPA receptors but have NMDA receptors, emerge in NAc (Huang et al., 2009). Since these synapses only contain NMDA receptors, AMPA/NMDA ratio is decreased. During incubation period, AMPA receptors are inserted into these silent synapses and they become mature synapses (Huang et al., 2009). This insertion results in the potentiation of the synapses. The reversal of this potentiation abolishes the behavioral sensitization completely (Pascoli et al., 2012), showing that the potentiation is causally related to the behavior. Behavioral sensitization is a good model to study neuronal adaptations induced by drug exposure, but it has a very poor face validity.

(25)

24

1.3.1.2 Conditioned place preference (CPP).

CPP is used to measure the rewarding or aversive properties of certain stimuli. The CPP apparatus has at least two chambers which are identifiable by animals. In a standard procedure, one context is associated with non-contingent drug delivery. The behavioral paradigm is separated into three phases; preconditioning, conditioning and test (Kuhn et al., 2019). During preconditioning phase, no stimulus is delivered and innate preference for the chambers is evaluated. The second phase of conditioning consists of drug delivery in one specific chamber and in the other chamber, animals are exposed to neutral stimuli.

During the test phase, animals freely explore the CPP chamber without any drug. If the administrated drug is rewarding, animals stay more time in the drug-conditioned chamber.

CPP has several advantages over behavioral sensitization. On the test session, animals receive no drug, which makes it possible to record the neuronal activity during reward seeking behavior without interference coming from massive monoamine release induced by drug (Belin-Rauscent et al., 2016). A photometry experiment revealed that when animals enter the cocaine conditioned chamber D1-MSNs in the NAc are activated while D2-MSNs are inactivated (Calipari et al., 2016). Another advantage is that the same procedure allows the evaluation of the aversiveness of a certain drugs or stimuli. Naloxone (Opioid receptor antagonist) injection after repeated opioid treatment is highly aversive and induces robust conditioned place aversion (CPA). Morphine injection induces a synaptic potentiation at the paraventricular thalamus (PVT) to NAc D2-MSN pathway. The place aversion and naloxone-precipitated withdrawal symptoms is suppressed by normalization of neuronal transmission (Zhu et al., 2016). Aversive properties of certain drugs cannot be evaluated by self-administration, because when animals don’t self- administrate, the drug could be aversive, but it also could be neutral. This is an advantage of CPP over other behavioral paradigms.

1.3.2 Contingent models

In contingent models, animals perform an action (e.g. lever press or nose poke) that triggers the infusion of a drug. Because the action is reinforced by the drug, they increase

(26)

25

the likelihood of that action. Because this procedure has high face validity, it has been used as the gold standard in the field of addiction since its inception in 1962 (Kuhn et al., 2019; Panlilio and Goldberg, 2007). Since then, to investigate specific symptoms of addiction, further advances of the behavioral models have been achieved.

1.3.2.1 Reinstatement: model of relapse

Relapse is one of major problems for the treatment of addiction. Even after a long period of successful abstinence, some patients resume drug consumption (Ramo and Brown, 2008). Relapse can be triggered by re-exposure to the drug (Chornock et al., 1992), cues associated with the drug (Wallance, 1989) or stress (Sinha, 2007). This observation has been translated into animal models. The behavioral models can be divided into three phases: Acquisition, abstinence and relapse. In the acquisition phase, all or most of the responses (typically, lever press or nose poke) are reinforced by drug. During the abstinence, animals stay in their home cage (forced abstinence) or are given chance to seek the drug but the same response results in no delivery of the cue or drug (extinction).

Relapse is triggered by priming of drug (Girardeau et al., 2019), contextual cue (Gibson et al., 2018), drug cue (Ma et al., 2014; Pascoli et al., 2014), or stressful stimuli (Shaham et al., 2000). During the acquisition, silent synapses emerges in the NAc (Lee et al., 2013;

Ma et al., 2014; Wright et al., 2020) and they become mature synapses during abstinence.

Depotentiation of glutamatergic synapses, especially at mPFC input to NAc D1-MSNs reverses maturation of silent synapses (Ma et al., 2014; Pascoli et al., 2014), abolishing cue induced reinstatement. With the re-exposure to the drug cue, matured synapses become silent again, maturing again ~6h later (Wright et al., 2020). This dynamic trafficking of AMPA receptors in the NAc is responsible for the reinstatement. D1-MSNs project to the VP, LH and VTA. It has been suggested that NAc D1-MSN to LH pathway is involved in mediating abstinence while VTA output promoting relapse (Gibson et al., 2018). It has been shown that the NAc D1-MSN to LH pathway controls feeding behavior (O’Connor et al., 2015) but in general the outputs of D1-MSNs have not been investigated well in the context of addiction. Further studies should be conducted to dissect how the potentiation in D1-MSNs impacts on the downstream pathways.

(27)

26

1.3.2.2 Escalation

Addicted patients often lose their control over drug intake. Overdose deaths related to opioids is a major problem all over the world. In 2018, about 46,000 people died from opioid overdose in the United States (Centers for Disease Control and Prevention).

Clinical observations suggest that as addicted patients continue to use a drug, the dose increases to get the same effect. In other words, tolerance is developed during prolonged drug intake (Morgan and Christie, 2011). When laboratory animals have short access (typically 1 h per session) to a drug, the amount of intake reaches a plateau quickly, leading to an optimal plasma drug concentration (titration). However, the introduction of extended access to the drug (6~23 h per session) disrupts this titration, or progressive increase, of drug intake (Pelloux et al., 2012, 2018; Wade et al., 2015). Based on this observation, ‘antireward system’ hypothesis has been proposed (Koob, 2015). When the reward system is excessively activated by the intake of a drug, a negative feedback system is recruited, and that system is called the ‘antireward system’. In a physiological state this system is activated by stressful events. Because the drug intake also activates this system, animals feel stressed as they increase the amount of drug intake. To oppose the stressful feelings, they increase the drug intake. Several experiments support this hypothesis. It is well known that stressful events activate hypothalamic-pituitary-adrenal (HPA) axis (Ann and Wand, 2012). Corticotropin releasing factor (CRF) plays an essential role in this axis. Systemic injection of a CRF antagonist attenuates escalation of cocaine intake (Specio et al., 2008). Local infusion of CRF antagonist into the central amygdala (CeA) ameliorates anxiety-like behavior observed after long access to alcohol (Rassnick et al., 1993). Optogenetic inhibition of CeA CRF receptor 1 positive neurons also decreases the escalation of alcohol intake (de Guglielmo et al., 2019). Further investigation demonstrates that those neurons project to the bed nucleus of the stria terminalis (BNST) and the inhibition at the CeA to BNST pathway attenuates alcohol intake (de Guglielmo et al., 2019). Another piece of evidence to support the opponent system is the increase of reward threshold (Koob, 2017). The electrical stimulation of medial forebrain bundle is positively reinforcing (Negus and Miller, 2014). The minimum

(28)

27

amplitude of electrical stimulation to keep the self-stimulation of this structure is called reward threshold. After the long access to the drug, the reward threshold is elevated, and this is highly correlated to escalation (Koob, 2017).

1.3.2.3 Progressive ratio

According to DSM-5, one diagnostic criterion is “Cravings and urges to use the substance”.

This means that addicted patients have an unusually high motivation to take a drug. To measure the motivation of animals to get a drug, progressive ratio schedules are widely used (Brown et al., 2017; Wade et al., 2015). Within a single session, the number of responses to get a drug infusion is exponentially increased. When animals fail to get a new infusion for a certain amount of time, the session ends. The highest met response requirement is called the breakpoint. It has been demonstrated that D1R or D2R antagonist suppresses the breakpoint (Randall et al., 2014), suggesting DA increases the motivation in general. More recent study demonstrated that chemogenetic activation of DA neurons in the VTA but not SNc increases the responding for sucrose under a progressive ratio schedule (Boekhoudt et al., 2018). Nociceptin positive neurons in the VTA project to DA neurons locally in the VTA. These neurons are active when animals are demotivated to get a new reward and stimulation of these neurons reduces the breakpoint (Parker et al., 2019). In this study, natural reward (sucrose solution) was used as a reinforcer. It remains to be answered whether this pathway is responsible for aberrantly high motivation observed in drug addicts. Progressive ratio schedules are often combined with other paradigms to assess individual vulnerability as we will see below.

1.3.2.4 Individual vulnerability: 3 criteria model

In the models of addiction we have discussed so far, there is one assumption: All the animals that consume a drug have a risk to develop addiction. However, clinical studies show that only a small proportion of individual who have used a drug for a long time become addicted eventually (Anthony et al., 1994; Piazza and Deroche-Gamonet, 2013).

(29)

28

To assess the individual vulnerability to addiction, a ‘three criteria model’ has been developed (Belin et al., 2011; Deroche-Gamonet et al., 2004). In this model, three addiction-like behaviors are evaluated that resemble the defining symptoms of addiction in the DSM-4: i) Difficulty in refraining from drug seeking. Daily drug-taking sessions are separated into two periods: “drug on” and “drug off” periods, and the difference of the periods is signaled by the illumination of a light. High response during the drug off period is considered to be one of addiction like behaviors. ii) High motivation to take the drug.

This is measured by progressive ratio schedule, as we have seen before. iii) Persistence of consumption of the drug despite negative consequences or compulsive drug use. This is measured by drug seeking behavior associated with punishment (e.g. electrical foot shock), as we will discuss leter. If animals meet none of these criteria, they are considered to be resistant to addiction while animals which meet all the criteria are “addicted”, and around 20% of animals fall into this category (Deroche-Gamonet et al., 2004). This model was made to evaluate individual vulnerability which had been dismissed for decades.

However, there are some criticisms on it. In this model, three criteria are used but it is unlikely that the same neuronal adaptation is underlying all the addiction-like behaviors. It has been proposed that mPFC function is important for inhibitory control of behavior (Goldstein and Volkow, 2002). Inhibition of mPFC neurons promotes compulsive reward seeking (Domingo-rodriguez et al., 2020). However, this manipulation does not affect the other two addiction like behaviors (Domingo-rodriguez et al., 2020), indicating that this brain region is controlling compulsivity specifically. Moreover, if we look at the shape of the distribution of individual animals behavior, only compulsivity shows a bimodal distribution. This has been reproduced repeatedly by different research groups with different reinforcers (Belin et al., 2011; Marchant et al., 2018; Pascoli et al., 2015, 2018).

Because of this bimodal distribution, the cutoff of compulsive and non-compulsive animals is relatively clear. However, motivation and inability to refrain from drug seeking show unimodal distributions, so the cutoff is inevitably arbitrary.

(30)

29

Figure 1.5 Individual vulnerability to addiction. Distributions of motivation for the drug (a), persistence of drug seeking (b) and resistance to punishment (c). Only resistance to punishment shows bimodal distribution, while the other two parameters show unimodal distribution (Belin et al., 2011).

1.3.2.5 Compulsivity.

Compulsive drug seeking is defined as continued drug use despite harmful consequences.

The DSM-5 has been updated, placing more emphasis on compulsivity for the diagnosis of substance use disorder (American Psychiatric Association). Compulsivity is used in the three criteria model as we have seen already, but this single criterion has several advantages over the other two. Because of this, compulsivity alone is often used as a model of addiction (Chen et al., 2013; Pascoli et al., 2018; Pelloux et al., 2007, 2012).

Behavioral outcomes on compulsivity scores show a bimodal distribution while the other two have unimodal distributions (Belin et al., 2011). This bimodal distribution makes the cutoff clearer. Furthermore, if enough number of animals are used in a single study, mathematical clustering could be applied to separate two groups without using arbitrary cutoff (Harada et al., 2019; Pascoli et al., 2018). Another advantage is the face validity.

As the DSM-5 has been changed, compulsivity is central to addiction. Addicted patients often lose their job, family and they have health or financial problems due to the drug use.

Because of these advantages, compulsive drug use has been studied intensively and the model of compulsivity has been developed. In several studies, the response to take the drug is also punished (Deroche-Gamonet et al., 2004; Pascoli et al., 2015, 2018). This is called compulsive drug taking, but it has been argued that the study of drug taking is insufficient to understand the many aspects of addiction (Belin-Rauscent et al., 2016).

Addicted people spend most of their time looking for the drug, not consuming the drug. To separate drug seeking from drug taking, seeking-taking chained schedules have been

(31)

30

used (Everitt and Robbins, 2016). Briefly, in an operant box with two levers, animals press the seeking lever during the random interval to get access to the taking lever. One additional press on the seeking lever after the end of the random interval triggers the extension of the taking lever. The press on the taking lever derivers the reward. To evaluate compulsion, the last seeking lever press triggers the foot shock in 30-50% of trials. This paradigm separates the drug seeking from the drug taking, making the investigation of the seeking behavior possible. Another advantage is that the drug seeking behavior is not affected by the drug, and so it can be evaluated without the interference coming from the effect of the drug and neuronal recordings are also possible.

1.3.3 Neuronal mechanism underlying transition to compulsion.

As we have seen, compulsivity is widely used as a model of addiction. Therefore, several physiological or biological model have been proposed to explain the behavior. There is an agreement that addiction is a chronic disease. Only after prolonged drug consumption, a subset of the population develops addiction. The transition to addiction is separated into three steps (Piazza and Deroche-Gamonet, 2013): (i) Recreational drug use. At this phase drug taking can be considered normal behavior and be seen in a large proportion of individuals. (ii) The second step is intensified, sustained and escalated drug use. At this phase, the frequency and the amount of drug intake is increased. The motivation to take the drug is also intensified. The behavior is moderately pathological but still organized.

(iii) Loss of control over the drug intake. At this phase, individuals have difficulty in stopping taking the drug despite negative consequences. They spend most of their time in procuring the drug, therefore inevitably giving up other normal behaviors.

1.3.3.1 Loss of top down control

Clinical observations indicate that addicted patients are fully aware of the catastrophic consequences caused by their drug consumption and yet they cannot stop their drug seeking behavior. The weakening of self-control mechanisms strongly correlates with the

(32)

31

impairment of PFC function. Clinical imaging studies show that addicted patient have low glucose metabolic activity in the prefrontal cortex including in the orbitofrontal cortex (OFC), anterior cingulate cortex (ACC), and dorsolateral prefrontal cortex (DLPFC). These brain regions, especially ACC and DLPFC are responsible for inhibitory control (Goldstein and Volkow, 2002, 2011; Volkow and Morales, 2015). Animal studies also show that compulsive cocaine seeking correlates with hypoactivity of the mPFC (Chen et al., 2013).

Indeed, rescue of mPFC hypoactivity by using optogenetics ameliorated compulsive cocaine seeking (Chen et al., 2013). Furthermore, it has been shown that inactivation of mPFC to NAc core pathway promoted compulsive food seeking, indicating that mPFC to NAc core pathway is involved in inhibitory control (Domingo-rodriguez et al., 2020). Based on these findings, the PFC has been a target for transcranial magnetic stimulation (TMS) for the treatment of substance use disorder (Bolloni et al., 2018). It has been also shown that glucose metabolic activity in the PFC correlate with the D2R availability in the striatum (Goldstein and Volkow, 2002). Since addicted patients have lower PFC metabolism, the D2R availability is also lower. It is still unclear whether the reduction of D2R availability is the cause or consequence of the addiction. In other words, it has not been established whether the reduction of D2R availability represents a risk factor for the addiction or chronic drug intake induces the reduction of D2R. One preclinical study shows that high impulsive rats having higher chance to get compulsive, show lower expression of D2R in the ventral striatum but not dorsal striatum (Dalley et al., 2007), supporting the former hypothesis. The role of the D2R in addiction has yet to be fully understood. As we have discussed, all the addictive drugs induce the increase of the DA concentration in the NAc (Lüscher and Ungless, 2006), while D2R responds to the decrease, not increase of the DA concentration because of its high affinity to DA (Iino et al., 2020; Marcellino et al., 2012). These findings indicate that the reduction of D2R does not affect the rewarding property of addictive drug. It has been shown that in D2-MSNs, a dip of DA and simultaneous glutamate input induced LTP (Iino et al., 2020). This LTP is necessary for the cue discrimination task. When the LTP is blocked, mice show reward seeking behavior to the cue not associated to the reward delivery, suggesting that LTP in D2-MSNs plays a crucial role to inhibit inappropriate behaviors. Moreover, aversive events, for example foot shock trigger a dip of DA in the NAc (De Jong et al., 2018). Aversive events and the

(33)

32

silencing of DA neurons in the VTA per se induce robust active avoidance (Danjo et al., 2014). However, the knockdown of D2R in the NAc core abolishes active avoidance behavior completely (Danjo et al., 2014). These data indicate that the expression of D2R in the NAc defines the sensitivity to aversive events. The reduction of D2R induces insensitivity to the negative consequences, resulting in the continuous drug seeking behavior even after facing problems. It has been shown that D2R is crucial for nicotine induced CPP (Wilar et al., 2019), suggesting that the D2R also plays a role in reward based learning. In vivo Förster resonance energy transfer (FRET) experiment shows that cocaine administration decreased PKA activity in D2-MSNs (Goto et al., 2015), suggesting that D2Rs also responses to a surge of DA despite its high affinity.

Figure 1.6 Correlations between striatal D2R availability and glucose metabolism in frontal cortex. (A) Addicted patients show a reduction of D2R availability (top) and glucose metabolism (bottom) in frontal cortex. (B) Correlation between striatal D2R availability and glucose metabolism in the OFC in cocaine (top) and (methamphetamine) addicted patients. (Volkow et al., 2011).

1.3.3.2 Goal-directed and habitual behavior

In 1948, a publication showed a typical learning experiment where animals had to solve a maze to get a food reward (Dolan and Dayan, 2013; Tolman, 1948). As animals are trained more, the delay to solve the maze becomes shorter. Stimulus-response (S-R)

(34)

33

theories tried to explain this phenomenon by insisting that instrumental behaviors reflected the emergence of an associative structure of memory. With increased training, the connection between the representation of a stimulus context and the mechanism generating the behavior becomes stronger. The idea of cognitive map explained this in a different way. During the task, animals create a field map of the environment, which provides the guidance mechanism. To probe cognitive map theory, Edward Tolman performed a famous latent learning experiment, where rats were exposed to the maze without reward in the first phase. In the next phase, these rats were rewarded at the end of the maze. The pre-exposed rats displayed facilitation of learning relative to the naïve rats, indicating that rats learned the maze without any reward delivery (Tolman, 1948).

Interestingly, at the choice points of the maze, rats showed hesitation like behavior called

“‘vicarious trial and error’’ (VTE). If rats showed more VTE behaviors, they eventually became better leaners (Tolman, 1948). It has been shown that with the lesion in the hippocampus, VTE behavior is decreased (Hu and Amsel, 1995). It is well known that in the hippocampus, place cells exist (Eichenbaum et al., 1999; Kaufman et al., 2020; Lever et al., 2002), indicating that the hippocampus plays a critical role in the formation of the cognitive map (Bradfield et al., 2020). During these early experiments, experimenters observed that with increased training, rats show less VTE (Redish, 2016). This observation suggested the idea of a transition from map-based to S-R based behavioral control. The transition from cognitive map to S-R was first tested by nonspatial instrumental behaviors mainly for technical reasons. Then, the notion of cognitive map and S-R developed into goal directed and habitual behavior, respectively. Goal-directed instrumental behavior is dependent on the knowledge of the action and outcome (A-O relationship). Because animals know that specific actions result in a desirable outcome, they perform that action. On the other hand, habitual behavior is maintained by the past experience of reinforcement, and because of this, the habitual behavior is insensitive to devaluation. The notion of goal-directed and habitual behavior has seen further development recently. Reinforcement learning is one of major interests in the field of computational neuroscience (Dabney et al., 2020; Huys et al., 2016). The dichotomy of model-based versus model-free learning is often discussed (Dolan and Dayan, 2013).

They are associated with goal-directed and habitual behavior, respectively. When animals

(35)

34

perform goal-directed behavior, they try to understand the environment, and then evaluate all the possible choices and choose one of them to get the desirable outcome. In the case of model-based learning, the subject builds the internal model of the environment, like the cognitive map, creates the decision tree, predicts all the possible outcome and executes one of the choices. Because of the prospective outcome, the choice is immediately sensitive to devaluation. Since the subject takes all the possible choices into the consideration, the behavior is flexible, but the execution of the behavior consumes a lot of computational effort. On the other hand, model free behaviors are not depending on the internal models but on a prediction error, for example temporal difference (TD) prediction error (Starkweather et al., 2017). In the TD prediction error, at each step, the prediction at the next step is produced, and the choice at that step is criticized by the discrepancy between the utilities predicted before the choice is made and actually observed. Since in the model free behavior, each decision is evaluated after it is executed, it is fundamentally retrospective. To update the behavior, direct experience is necessary. This means that model free behaviors, for example habitual behaviors, are not sensitive to devaluation immediately (Dolan and Dayan, 2013).

1.3.3.3 Goal-directed and habitual behavior in addiction

Experimentally, habit formation has been studied extensively with natural rewards (Gremel and Costa, 2013; Nelson and Killcross, 2013; Renteria et al., 2018). In the acquisition phase, animals are trained to perform a specific action to get the reward.

During the test session, the contingency is degraded, meaning the reward is delivered with or without the specific action. If the performance is goal-directed, animals display a reduction in the number of responses. If the behavior is habitual, the response is insensitive to degradation of the contingency. Another way of evaluating the goal-directed and habitual behavior is to devaluate the outcome. This is achieved by giving animals access to the reward before the test session, or by pairing the reward with an aversive event, typically a systemic injection of lithium chloride, which creates a gastro-intestinal malaise (Balleine, 2019; Vandaele and Janak, 2017). In the field of addiction, it has been proposed that the imbalance of goal-directed behavior and habitual behavior, especially

(36)

35

through abnormal forms of habitual behavior underlies compulsive drug seeking (Belin and Everitt, 2008; Everitt, 2014; Everitt and Robbins, 2013, 2016; Vandaele and Janak, 2017). However, it has been difficult to evaluate the habitual behavior with drug rewards, because the devaluation is not possible due to its pharmacological effect. This problem has been overcome by using a seek-take chained task, where responses on the seeking lever give access to the taking lever, and pressing on the taking lever results in cocaine delivery (Zapata et al., 2010). The devaluation is achieved by the extinction to the taking lever, which means that the response to the taking lever is devalued but cocaine per se still has its value. The extinction to the taking lever reduces the seeking responses early in training (Olmstead et al., 2001; Zapata et al., 2010), indicating that the behavior is goal- directed. However, after extended training, the behavior becomes insensitive to devaluation, meaning animals show habitual behavior only after extended training (Zapata et al., 2010). These observations support the idea of a transition from goal-directed control to habitual control. Interestingly, non-contingent drug exposure promotes habit formation (Keiflin and Janak, 2015; Nelson and Killcross, 2013), implying that exposure to the addictive drug per se promotes habitual control. Several neuronal circuits have been proposed to control goal-directed or habitual behavior (Bradfield et al., 2020; Hart and Balleine, 2016; Hart et al., 2014, 2018; Matamales et al., 2020; Shiflett et al., 2010). Two components of the cortico-striatal pathways are implicated in goal-directed behavior: The prelimbic cortex and the medial part of the dorsal striatum (Hart et al., 2018; Shiflett et al., 2010). These two brain regions are anatomically connected. Lesion of dorsomedial striatum induces impairments of goal-directed behavior (Hart et al., 2018). This brain region is involved in both acquisition and expression of such behavior; lesions of this brain region cause the impairment of goal-directed behavior if the lesion is performed either before or after training. On the other hand, the prelimbic cortex plays a critical role in the acquisition of goal-directed behavior but not the expression of the behavior: lesions of the prelimbic cortex cause the impairment of goal-directed behavior if it is applied before training but not after the training (Hart et al., 2018). These finding imply that the prelimbic- dorsomedial striatum pathway is important for the acquisition of goal-directed behavior but not the expression. Since the lesion of the dorsomedial striatum also results in the impairment of the expression of the behavior, other cortical inputs are crucial for the

(37)

36

expression. Recently, it has been shown that OFC to the dorsal striatum pathway is responsible for goal-directed behavior, suggesting that OFC input is crucial for the expression (Gremel et al., 2016; Renteria et al., 2018). By contrast, it has been shown that animals keep goal-directed behavior even after extended training with the lesions in the lateral part of the dorsal striatum (Balleine, 2019; Corbit et al., 2014). For these studies, natural rewards were used as a reinforcer, but in the case of cocaine self-administration, the habitual cocaine seeking is blocked by silencing dorsolateral striatum (Zapata et al., 2010). Several cortices project to the dorsolateral striatum, including primary or secondary motor cortex and somatosensory cortex, and it has been suggested that the motor cortex is involved in habitual behavior (Everitt and Robbins, 2016). However, since the lesions in the motor cortex causes general motor deficit, lesion studies are not possible. Because of this limitation, this hypothesis has not been tested empirically.

1.3.3.4 Lateralization of behavioral control in the striatum

These findings suggest that in the early phase of the training, the behavior is controlled by the mediodorsal striatum. With extended training, the dorsolateral striatum takes over the control of the behavior. The mechanism of the lateralization is yet to be fully understood. One attractive idea is that a spiraling pattern of connectivity between the striatum and midbrain controls the lateralization (Keiflin and Janak, 2015; Lüscher et al., 2020; Volkow et al., 2019). DA neurons in the VTA project to the NAc, and then NAc D1- MSNs project back to GABAergic neurons in the midbrain. Some D1-MSNs project to the same region of the VTA, forming closed reciprocal loop, while others project to more lateral part of the midbrain, organizing open non-reciprocal loop. These DA neurons project to more dorsal and lateral part of the striatum. In the end, this loop reaches lateral part of the dorsal striatum and substantia nigra pars compacta (SNc). The existence of this spiral loop has been shown in an anatomically study (Haber et al., 2000). There are several other studies supporting this hypothesis. In a classical Pavlovian conditioning experiment, in untrained rats, unpredicted food reward delivery induces a DA transient in the NAc core but not mediodorsal or dorsolateral striatum. Only after training, reward cues and unpredicted food rewards evokes DA transients in the dorsomedial striatum (Brown

Références

Documents relatifs

Low frequency oscillatory activity of the subthalamic nucleus is a predictive biomarker of compulsive-like cocaine seeking... Page 1

The word-fnal accent pattern, accompanied by the /+h/ afx and word-fnal aspiration, displays a shift in accent from the non-fnal to the fnal syllable1. When the /+h/ afx is

J'ai pris mon vieux bougon de plâtre, J'ai pris ma tête entre mes mains, J'ai pris une flamme dans l'âtre. J'ai pris un pli

The convex bodies of Theorem 1.2 are in fact random spectrahedra (i.e., spectrahedra defined using random matrices A 0 ,. , A n ) of appropriate dimension, where the formulas for

Key words and phrases: Solution existence, Generalized vector quasi-equilibrium problem, Implicit generalized quasivariational inequality, Lower semicontinuity, Upper

In Section 4, we prove our second main result, that if the ideal of relations of a category is generated in degrees at most d, then every module presented in finite degrees is

To this aim, the present study compared the performance of right hemisphere damaged (RHD) patients with and without a deficit in processing the contralesional space called

Par ailleurs, à partir des verbalisations de leur activité, nous avons relevé que les conducteurs effectuaient des actions de compréhension de leur Schéma 3 :