• Aucun résultat trouvé

Reward and reward prediction error in declarative memory

N/A
N/A
Protected

Academic year: 2022

Partager "Reward and reward prediction error in declarative memory"

Copied!
253
0
0

Texte intégral

(1)

Reward and reward prediction error in declarative memory

Kate Ergo

Supervisor: Prof. Dr. Tom Verguts Co-supervisor: Prof. Dr. Durk Talsma

A dissertation submitted to Ghent University in partial

(2)
(3)

Reward and reward prediction error in declarative memory

Kate Ergo

A dissertation submitted to Ghent University in partial fulfilment of the requirements for the degree of Doctor of Psychology

Academic year 20202021

Guidance Committee

Prof. Dr. Tom Verguts (promotor)

Department of Experimental Psychology, Ghent University Prof. Dr. Durk Talsma (co-promotor)

Department of Experimental Psychology, Ghent University

(4)
(5)

“The impediment to action advances action. What stands in the way becomes the way.”

Marcus Aurelius

(6)
(7)

Chapter 1 General Introduction ... 17

Long-Term Memory: Non-Declarative versus Declarative Memory ... 18

Memory, Reward and Reward Prediction Error... 18

Reward Prediction Error and Non-Declarative Learning... 19

Reward Prediction Error in Declarative Learning ... 23

Open Issues ... 33

RPE: Signed or Unsigned? ... 33

Timing Issues of RPEs ... 34

RPE: Why and How? ... 35

The Effect of Test Delay on Declarative Memory ... 37

Reconsolidation ... 38

Outline of the Dissertation ... 38

References ... 41

Chapter 2 Reward prediction error and phase connectivity during declarative learning 52 Abstract ... 52

Introduction... 54

Methods ... 57

Participants ... 57

Material ... 58

Procedure ... 58

Familiarization Task ... 58

Acquisition Task ... 58

Design. ... 59

Filler Task ... 59

(8)

Non-Parametric Clustering Analysis ... 64

Power. ... 64

iPLV. ... 64

Results ... 65

Behavioral Results ... 65

Recognition Accuracy ... 65

Certainty Rating ... 65

EEG results ... 66

Power ... 66

iPLV ... 70

SRPE. ... 70

SRPE and Left/Right-Hand Button Presses. ... 72

Discussion ... 76

References ... 82

Appendix ... 91

Chapter 3 Failure to modulate reward prediction errors in declarative learning with theta (6 Hz) frequency transcranial alternating current stimulation ... 92

Abstract ... 92

Introduction... 93

Methods ... 96

Participants ... 96

Material ... 97

Experimental Paradigm ... 97

Familiarization Task ... 97

Acquisition Task ... 97

Design. ... 100

Recognition Test ... 100

Sensations Questionnaire ... 101

tACS Stimulation ... 101

Data Analysis ... 101

Results ... 102

Sensations Questionnaire ... 102

Recognition Accuracy ... 103

(9)

Appendix ... 122

Chapter 4 Reward prediction errors drive declarative learning irrespective of agency 126 Abstract ... 126

Introduction... 127

Methods ... 130

Participants ... 130

Material ... 131

Procedure ... 131

Familiarization Task ... 131

Acquisition Task ... 131

Design. ... 132

Filler Task ... 133

Recognition Test ... 133

Data Analysis ... 135

Results ... 137

Recognition Accuracy ... 137

Certainty Ratings ... 138

Discussion ... 141

References ... 146

Appendix ... 152

Chapter 5 Extreme reward prediction errors support a signed reward prediction error account in declarative learning ... 155

Abstract ... 155

Introduction... 156

(10)

Filler Task ... 160

Recognition Test ... 160

Data Analysis ... 162

Results ... 162

Recognition Accuracy ... 162

Certainty Rating ... 163

Discussion ... 165

References ... 170

Appendix ... 174

Chapter 6 Subjective reward prediction errors generate an unsigned reward prediction error effect in declarative learning ... 177

Abstract ... 177

Introduction... 178

Methods ... 180

Participants ... 180

Material ... 180

Procedure ... 181

Acquisition Phase ... 181

Testing Phase ... 181

Design ... 182

Results ... 184

Recognition Accuracy On Filler Trials ... 184

The Effect of Negative Subjective Reward Prediction Error Trials on Recognition Accuracy ... 185

Discussion ... 186

References ... 189

Appendix ... 193

Chapter 7 General Discussion ... 196

Summary of the Main Findings of the Doctoral Research ... 197

Investigating How RPEs Facilitate Declarative Learning ... 197

Do RPEs Elicit Changes in Brain Connectivity While Learning? ... 197

Testing the Causal Role of Theta Phase Synchronization ... 198

Addressing a Few of the Remaining Open Issues ... 199

(11)

Towards a New Paradigm ... 200

Implications of the Doctoral Research ... 201

The Effect of RPE on Declarative Learning Might Be Subject to a Continuum ... 201

An Alternative Explanation for the Testing Effect ... 203

Retrieving Information from Memory Induces RPEs That Drive Learning ... 205

Limitations and Future Directions ... 205

In Search of Causality ... 206

Transferring Scientific Insights from Theory to Practice ... 207

From the Laboratory to the Classroom ... 207

Nurturing Learning. ... 208

Into the Clinical Field... 210

Advancing Future Work by Employing a Neurocomputational Approach ... 211

Conclusions... 211

References ... 212

English Summary ... 225

References ... 229

Nederlandstalige Samenvatting ... 231

Referenties ... 236

Data Storage Fact Sheets ... 238

% Data Storage Fact Sheet Chapter 2 ... 238

% Data Storage Fact Sheet Chapter 3 ... 241

% Data Storage Fact Sheet Chapter 4 ... 244

% Data Storage Fact Sheet Chapter 5 ... 247

% Data Storage Fact Sheet Chapter 6 ... 250

(12)
(13)

Acknowledgements

After 4,5 half years I finally submitted my PhD thesis. I have to admit; it still feels surreal. Looking back at my time as a PhD student, I can’t help but feel a bit emotional. What a rollercoaster ride it has been, filled with ups and downs. Pursuing this PhD allowed me to not only learn more about the sexiest organ in history: the human brain, but also enabled me to connect with like-minded people from all over the world. These past years have fostered both professional and personal growth. Although pursuing a PhD is mostly considered solitary work, I did not travel alone on this path. In what follows, I would like to thank several people who have made this PhD journey an unforgettable, once-in-a-lifetime experience.

First and foremost, I owe a deep sense of gratitude to my promotor, Tom Verguts.

Tom, thanks for giving me the opportunity to pursue this PhD. No matter how busy you were, you always found the time to give me feedback on my writing or advice on one of our ongoing projects. I really admire the way you approach and analyze everything with great poise. I will never forget that (terrifying) moment when I told you I discovered a mistake in one of my scripts after analyzing the data for months. You reassured me by saying that it’s OK to make mistakes and that the whole point of doing a PhD is to make mistakes and learn from them.

Thanks for being patient with me, for believing in me, and for encouraging me to keep going.

I would also like to thank my co-promotor, Durk Talsma, for giving me the freedom to pursue this topic. Durk, thank you for exchanging thoughts on my PhD, and for giving me the chance to be your teaching assistant. On a side note, I still want to apologize for spilling that

(14)

Acknowledgements

never a dull moment when you were around. Thank you so much for being a part of this journey.

Next, I would like to offer my special thanks to Bernhard Pastötter. Bernhard, throughout my PhD I could always count on your support. During my committee meetings, you were actively thinking along and asking intriguing questions. You always made sure to emphasize the positive. Although my research stay at Universität Trier was cut short due to COVID-19, I really enjoyed being there. I would also like to thank your family for welcoming me and making me feel at home.

I also wish to extend my thanks to Roeljan Wiersema for being a doctoral advisory committee member. Your questions always sparked valuable insights and I greatly appreciated your Dutch humor.

Next up, I want to thank my fellow lab members for their support, insightful feedback, informal chats, and fun moments. Pieter Verbeke, you are probably the most chill hard- working person I know in academia. I very much enjoyed going to MXC’s course together with you and spending time at Nijmegen’s finest bars where we talked about our lives and future plans. You’re a gifted researcher and computational modeler. Thanks for being an awesome colleague. Pieter Huycke, the department’s programming whizkid. You always knew when I was having a bad day. You’d pull me out of the office to get some coffee and knew exactly what to say to me to keep my spirits up. Thanks for supporting me and for reminding me that we all suffer from impostor syndrome. Esin Turkakin, thanks for keeping me motivated with all your runs on Strava. Jacqueline (Jacki) Janowich Wasserott, you were always so kind to me. Thanks for being a great officemate. Thanks to Cristian (Cris) Buc Calderon for briefly sharing an office with me and for never running out of questions during lab meetings. Mehdi Senoussi, thank you for brainstorming with me when I got stuck (once again) analyzing my EEG data. Hey, Kobe Desender, congrats on tenure! And thanks for spicing up the lab meetings with your musical talent. Anna Marzecová, thanks for genuinely caring about me. I very much enjoyed talking to you in real life and over Zoom. Your encouragement has been of great value to me. Elise Lesage, thanks for not abandoning me during the lab trip’s mountain bike ride in the hilly Ardennes and for being a friendly colleague whose door was always open to talk about life in academia and beyond. Jonas Simoens, the junior of the lab.

(15)

I wish you good luck with the rest of your PhD. You’ll do great. Irene Cogliati Dezza, thanks for bringing back some Italian vibes into the lab. Also a big thank you to some former lab members, who guided me through my first months as an academic: Massimo Silvetti, during my first months as a PhD student, we shared an office. Because you had to juggle your time and presence between Ghent and Rome, we didn’t get to spend a lot of time together.

Nevertheless, you managed to give me the perfect gift: A kickboxing set (and yes, it has come in handy ). Eliana Vassena, thanks for being a great role model in neuroscience and for making time to talk to me when I was having doubts about life in academia. Thanks, Clio Janssens, for instilling a sense of academic realism into me. William Alexander, thanks for your memorable rendition of Africa by Toto.

Thanks should also go to my former classmates who later turned into friends and eventually also into colleagues. Merel Muylle, Sofie Van Den Bossche, and Judith Goris: Who would have ever thought we would become ‘doctors’ someday? We did a lot of activities together: Late-night karaoke, Harry Potter marathon, all-you-can-eat ribs night, pizza night, getting drunk together, cooking together, retreating to nature for a relaxing walk, kayaking in Ghent, among many other things. Thanks for helping me decompress after a stressful day at work and for making the past few years so much more enjoyable! A special thank you goes to James Deraeve who took such good care of us back when we were just getting started. James, thanks for your friendship throughout the years.

I am also grateful to all my colleagues in academia with whom I had the pleasure of spending some time. Thanks for the informal chit-chats, smiles, and words of encouragement.

Special thanks go to: Robin Gerrits, Lien Naert, Haeme Park, Chi Zhang, Vesal Rasoulzadeh, Silvia Formica, Raquel London, Roma Siugzdaite, Mario Carlo Severo, Katharina Paul, Davide

(16)

Acknowledgements

hearted person I’ve ever met. I’m grateful our PhD’s have brought us together. Thanks for all the jolly evenings and for loving French fries as much as I do. Sarah Cauchie, bedankt voor de vele steunberichtjes en alle leuke momenten die we samen hebben beleefd in Canada. Mijn dank gaat ook uit naar mijn andere Canada-reismaatjes die deze droomreis mogelijk maakten, en in het bijzonder Ilse Slechten, onze reisgids, voor de uitstekende planning en haar Limburgs enthousiasme en gastvrijheid.

Over the years I had the opportunity to (co-)supervise PEP, RPEP, and master thesis students: Jasper, Eva, Robin, Stefanie, Marie, Emma, Gillian, Toon, Anna, Steven, Luna, and Cathy. Thanks for your enthusiasm and I wish all of you the best with your future endeavors.

I want to thank Lies Baten for her administrative support. Lies, no matter how many times I asked the same questions over and over again, you always answered them with a smile. Thank you! Thanks should also go to Christophe Blanckaert for his technical support and for making exam supervisions a bit less boring.

I also wish to thank my family in the Philippines for providing mental support and comic relief from miles away. Finally, I want to thank my parents, Jezzebel and Willy, for their love and support throughout my life, but most importantly for always believing in me when I doubted myself. Without their love and support, I would have never gotten this far. And to Fellow and Toby : ahwoo.

Kate March 14, 2021

(17)

Chapter 1

General Introduction 1

It is a well-established notion that the human brain houses at least two distinct memory systems, namely declarative and non-declarative memory (Squire, 2004). These memory systems are characterized by their own input modalities, learning algorithms, and brain structures (Poldrack et al., 2001). Although different brain regions underpin declarative versus non-declarative memory (Poldrack & Foerde, 2008), a recent body of work has demonstrated that the same computational principles may apply to both memory systems.

We propose that the predictive Reinforcement Learning framework (Friston, 2010; Silvetti et al., 2018) provides such shared computational principles.

In Chapter 1, we introduce the role of reward and reward prediction error (RPE) in learning. For the purpose of cognitive functioning, we must continuously make predictions while interacting with the world. As a result, the human brain is often considered a predictive machine (Clark, 2013; Friston, 2003). One aspect of the world we make predictions about, is reward. Whenever a discrepancy arises between reward expectation (or reward prediction) and reward outcome, a RPE is elicited (Rescorla & Wagner, 1972). Learning based on RPE was originally proposed in the context of non-declarative memory. We postulate that RPE may support declarative memory as well. Indeed, recent years have witnessed several independent empirical studies reporting effects of RPE on declarative memory. We first provide a brief overview of these studies, identify emerging patterns, discuss open issues such

(18)

General introduction

Long-Term Memory: Non-Declarative versus Declarative Memory

A tennis player knows how to perform a perfect serve, and also knows the opponent’s name. But how are these two types of “knowing” similar, if at all? In long-term memory, such informative knowledge is stored for an extended period of time. There is a general consensus that the human brain houses at least two broad and distinct (long-term) memory systems (Doll et al., 2015; Ryle, 2009; Squire, 2004; Squire & Dede, 2015), each with its own learning algorithms and neural correlates. The first is non-declarative (or habit, or implicit) memory.

Non-declarative memory refers to the acquisition of different types of knowledge, including procedural memory (“knowing how”). This involves acquiring a motor or cognitive skill (procedure) through repeated practice (e.g., learning to play tennis). The process of acquiring non-declarative memories is called non-declarative learning. Non-declarative learning does not require conscious awareness of how learning took place (Squire & Knowlton, 1995) and is independent of any medial temporal lobe (MTL) structure (Knowlton & Foerde, 2008;

Roediger et al., 2008). The second is declarative (or, in humans, propositional, or explicit) memory (“knowing what”). In declarative memory, facts, events, and concepts are stored, that can (at least in humans) be (consciously) declared. It is considered to consist of episodic memory (memory for single episodes) and semantic memory (memory for information aggregated across several episodes) (Anderson, 2013). The process of acquiring declarative memory is referred to as declarative learning. The encoding of declarative memories can happen very quickly (Eichenbaum, 2004; Shohamy & Adcock, 2010) and relies heavily on the hippocampus (Eichenbaum, 2004; Squire et al., 2004) and other structures located in the MTL (Roediger et al., 2008).

Memory, Reward and Reward Prediction Error

Traditional work on the role of reward in learning primarily focused on animal behavior. Ample evidence for the importance of reward has been provided in non-declarative learning studies. Indeed, the theoretical ground for reward-based learning was established with the use of classical (or Pavlovian) (Pavlov, 1902) and operant (or instrumental) (Skinner, 1990; Thorndike, 1932) conditioning paradigms in which stimulus-response associations are learned by means of repeated exposure. Given that human beings are hedonistic creatures

(19)

that constantly seek to be rewarded, studies investigating the role of reward and its prediction were soon thereafter applied to human learning as well (e.g., Adcock et al., 2006; Mather et al., 2011; Scimeca et al., 2016; Wimmer et al., 2014; Wittmann et al., 2005).

Over time, an extensive literature has developed on the implications of reward in learning (for a review, see Banich & Floresco, 2019; Nuttin & Greenwald, 2014). A full discussion of that topic is beyond the scope of this doctoral dissertation. Instead, we focus on the role of reward prediction error (RPE) in learning. The computational principle of RPE- based learning (Sutton & Barto, 1998; Wang et al., 2018) is generally thought to drive non- declarative learning (i.e., RPE = reward outcome  the predicted (or expected) reward). In contrast, until recently, RPE was not studied in the context of declarative memory. However, several empirical studies have reported effects of RPE on declarative memory as well, suggesting that some of the same computational principles shape non-declarative and declarative memory systems. In Chapter 1, we review these recent studies and discuss the most important open questions concerning the role of RPEs in declarative memory. First, however, we provide a brief overview of the computational models linking RPE with non- declarative learning.

Reward Prediction Error and Non-Declarative Learning

As human beings, we are continuously interacting with our environment. It is through this act that we experience certain stimuli and events, learn from them, develop predictions and build mental models. For this reason, the human brain is often equated to being a prediction machine (Clark, 2013). One of the most influential theories in current cognitive neuroscience is predictive coding (Friston, 2003; Rao & Ballard, 1999). According to this

(20)

General introduction

Predictions can be made about several variables, such as tomorrow’s weather, the next action I (or somebody else) will perform, our partner’s mood, and so on. Given that humans are hedonistic creatures, one particularly relevant variable to make predictions about, is reward; a PE in reward (by definition) is a RPE. The concept of RPE has been very influential in non-declarative learning. In particular, RPEs have been implemented in a wide range of computational models. For example, in Kamin's (1969) seminal blocking experiment, the important discovery was made that a reward itself is not the driving force behind learning, but the (R)PE is. More specifically, in this blocking experiment, an event A is consistently followed by an unconditioned stimulus (US). Subsequently, an additional event B is added to A (again followed by the US). After conditioning, it is observed that the animal in the experiment has not learned the association between event B and the US, suggesting that the simple co-occurrence of two events is insufficient to drive learning. To account for the experimental phenomenon of blocking in non-declarative learning, Rescorla and Wagner (RW; (Rescorla & Wagner, 1972); Table 1) developed their now-classic model according to which learning depends on PE. According to the Rescorla-Wagner rule, the interpretation of the blocking effect is that event A blocks the B-US association. Specifically, due to event A, the appearance of the US is no longer surprising. As a consequence, no PE is elicited and learning will not take place. Interestingly, neurophysiological studies revealed that the activity of dopaminergic neurons located in the mammalian midbrain changes in relation to RPE (Schultz, 1998). In particular, synaptic strength increases when a reward is better than expected (or predicted) (i.e., positive RPE), decreases when a reward is worse than expected (i.e., negative RPE) and synaptic strength remains unchanged when a reward is as expected (i.e., no RPE). As such, the activity pattern of these dopaminergic neurons suggests that the valence (positive versus negative) of the RPE matters and is taken into account (consistent with a signed RPE) (SRPE). This dopaminergic signal is subsequently broadcast to other brain regions (Schultz, 2013). In sum, RPEs thus act as teaching signals that are used to correct erroneous predictions, facilitate learning, and allow flexible behavior to occur; a necessary skill to survive in an ever-changing world.

Further computational development of RW led to the temporal difference (TD; Table 1) Reinforcement Learning (RL) model (Sutton & Barto, 1998). The TD model improved upon

(21)

the RW model because it allows learning also when the reward is not immediately present by taking into account the timing within a trial and producing a chain of (reward) predictions.

However, the main success of the RPE concept as implemented in TD was probably because of its close match to neurophysiological data (Enomoto et al., 2011). In particular, single-unit recordings from the midbrain revealed that dopaminergic neurons in the ventral tegmental area (VTA) and substantia nigra (SN) implement a TD-like RPE signature of reward processing (Eshel et al., 2016; Ljungberg et al., 1992; Schultz et al., 1997), such that dopaminergic neurons encode the long-term value of a chain of future rewards while ascribing less value to remote rewards. In recent years, the role of TD-based RPEs in non-declarative learning has become well established in psychology, neuroscience, and Artificial Intelligence. For example, deep Reinforcement Learning (RL) models, which combine RL principles with artificial neural networks, use TD-based RPEs to solve tasks (e.g., playing Atari games) that were long considered beyond the capacity of artificial agents (Mnih et al., 2015; Silver et al., 2016). As such, deep RL allows artificial agents to learn the best actions possible in order to realize their goals and to achieve human-level performance.

In contrast to the RW and TD models that are SRPE-based, Pearce and Hall proposed that learning occurs whenever reward is surprising (either better or worse, that is, different than expected; consistent with an unsigned RPE in which the sign is not considered, but instead the absolute value is taken into account; Table 1) (URPE) (Pearce & Hall, 1980). The amount of surprise modulates how much attention is allocated to the (surprising) reward, which in turn determines how much learning will take place. Neurally, URPEs are encoded by norepinephrine (or noradrenaline) neurons located in the locus coeruleus (LC) (Sara, 2009). It is noteworthy that normative, Bayesian models of learning exhibit features of both SRPE and URPE-based learning. For example, the Kalman filter (Dayan et al., 2000) utilizes a sequence

(22)

General introduction

In sum, the concept of PE, and specifically of RPE, has turned out to be fruitful for understanding non-declarative learning at neurophysiological, behavioral, and computational levels.

Table 1

Models of Learning

Model Model Description

Rescorla-Wagner model (Rescorla

& Wagner, 1972)

This model describes learning the value (expected (or predicted) reward) of specific events (say, events A and B). This information is encoded in their associative strength to a “value” unit, symbolized as wA and wB for events A and B, respectively. Specifically, based on whether events A and B occur (xA = 1 and xB = 1, respectively) or not (xA

= 0 and xB = 0, respectively), an additive prediction is made about the occurrence of reward (V = xA × wA + xB × wB). When reward finally occurs (or not), a reward prediction error is calculated (R – V), where occurrence of reward (denoted R) is typically coded as R = 0 (when there is no reward) or R = 1 (when there is reward). This reward prediction error is then used to change the connection strength between cells encoding A and B on the one hand, and reward on the other: Δwi = α × xi × (R – V), with i ∈ {A, B}. After repeated application of this learning rule, the weights wA and wB allow the model to accurately predict reward, based on the (A, B) input combination. In the Rescorla-Wagner model, learning is driven by a “better-than- expected” signal or signed RPE (SRPE).

Temporal

Difference model (Sutton & Barto, 1998)

The Rescorla-Wagner model can only learn from external feedback (R –V). This is computationally inefficient because reward may not be delivered at each time point where relevant information is provided to the organism. In temporal difference learning, learning can also occur if the prediction of reward changes between two time points t and t + 1. Formally, the learning rule becomes (now with explicit time index t):

(23)

Δwi(t) = α × xi(t)× (R(t) + γV(t + 1) – V(t)), with i ∈ {A, B}. If γ = 0, the rule reduces to the Rescorla-Wagner rule. In case γ > 0, learning can also proceed at times t where no actual reward was delivered, rendering the algorithm more powerful than the Rescorla-Wagner rule. Here again, learning is driven by a “better-than-expected” signal or signed RPE (SRPE).

Pearce-Hall model (Pearce &

Hall, 1980)

According to this model, learning only occurs when a reward is surprising. Specifically, it uses the absolute value of a RPE (a “different- than-expected” signal), consistent with an unsigned RPE (URPE) approach. Formally, (one variant of) the learning rule can be written as:

Δwi(t) = xi(t)× R(t) × |R(t) – V(t) |.

Note. Three models of learning are described. Two of these models are SRPE-based (“better- than-expected” signal): The Rescorla-Wagner model and the Temporal Difference model, respectively. In the Pearce-Hall model learning is based on URPE (“different-than-expected”

signal) signals.

Reward Prediction Error in Declarative Learning

Although the role of RPEs in non-declarative learning has been studied extensively and formalized in a number of computational models, their role in declarative learning has only recently become a topic of interest. Generally speaking, two main approaches exist for elucidating the RPE effect on declarative learning (for a non-exhaustive overview of studies, see Table 2). First, in the reward-prediction approach (Table 3), a statistical distribution determines the probability of reward. The participant knows or estimates this reward

(24)

General introduction

(i.e., preceded by a cue indicating that high or medium reward were equally likely to follow), consistent with a SRPE effect. However, later work could not replicate the SRPE effect in this specific experimental paradigm (Mason, Farrell, et al., 2017; Mason, Ludwig, et al., 2017).

A second implementation of the reward-prediction approach is the recently developed variable-choice paradigm (Figure 1a and Table 2; (De Loof et al., 2018)). Here, participants learn Dutch-Swahili word associations under different RPE value conditions ranging from minimally -0.5 to maximally 0.75. Predicting the reward probability is again quite easy; participants can deduce it from the number of eligible options (i.e., one, two, or four Swahili translations). Behaviorally, memory performance showed a SRPE effect in declarative learning: Recognition accuracy and certainty increased linearly with larger and more positive RPEs (Figure 1b). These results were replicated with image-word associations (De Loof et al., 2018) and face-word associations (Calderon et al., 2020).

In another instantiation of the reward-prediction approach, participants actively track and estimate the reward probability distribution. Here, on each trial, they experience a RPE relative to that (estimated) distribution (Figure 1c-d and Table 2) (Davidow et al., 2016;

Rouhani et al., 2018; Wimmer et al., 2014). Based on this feedback, participants can update their estimate for subsequent trial estimates. For example, in one study (Davidow et al., 2016), participants estimated the (fixed) probability of reward attached to specific stimuli. At reward feedback, a trial-novel image was presented. Subsequent memory performance for these trial-novel images displayed a SRPE effect, which was more pronounced in adolescents than in adults. In another study (Rouhani et al., 2018), participants tracked the reward associated with different indoor and outdoor scenes. On each trial, participants estimated the reward (for a particular scene), and subsequently received feedback about their estimate;

from this difference (feedback estimated reward), an RPE could be calculated. Scene memory was probed after this initial task (via old/new judgments). Here, a clear URPE effect was observed: Scenes associated with a higher URPE during the initial task (i.e., with more surprising rewards, in either positive or negative direction), were afterward better remembered in a surprise memory test (Figure 1c-d).

(25)

A more challenging study (Jang et al., 2019) used a reward-prediction paradigm to disentangle effects of SRPE, surprise (which corresponds to URPE), and uncertainty. On each trial, participants saw a value and an image (animate or inanimate) from which the reward probability could be estimated (Figure 1e). Moreover, unlike in the other paradigms previously discussed, reward probability was not fixed but instead jumped to a different level at unpredictable time points during the experiment, making it more difficult and challenging for participants to estimate the actual reward distribution. Participants had to decide, on each trial, to play or to pass. After making a choice, the image was shown together with reward feedback. Recognition memory for the images was tested through old/new judgments. The results revealed that only SRPE affected subsequent memory performance (Figure 1f). A similar effect of SRPE was produced in a study by Aberg et al. (2017). Here, participants associated cartoon characters with the character’s preferred object. Crucially, the ratio of positive and negative feedback accredited to a particular character was manipulated.

Characters were divided into three categories: In the high-reward category, participants received (irrespective of their choice) positive feedback 80% of the time, whereas this was reduced to 50% in the medium-reward category and 20% in the low-reward category. Finally, in Wimmer et al. (2014) the reward probability would fluctuate slowly but unpredictably on each trial, making the reward-prediction task even more challenging. In this experiment, unlike the other discussed paradigms, a negative effect of (S)RPE was observed. Specifically, trials (and participants) with stronger and more positive RPEs, were associated with impaired declarative learning.

As a second approach, in a multiple-repetition paradigm (Table 3), a set of general information questions are repeated a number of times. Trial-specific confidence ratings (“How certain are you that you answered correctly?”) and feedback are used to compute trial-

(26)

General introduction

Marsh, 2009; Metcalfe, 2017; Metcalfe & Finn, 2011). High-confidence errors occur on those trials during which positive feedback was expected but not obtained; thus, this effect is consistent with a URPE effect. In another experiment (Pine et al., 2018), participants studied a text and were tested on its contents after a two-day delay. Here, a hypercorrection effect was also reported, which the authors interpreted as a URPE effect. Additionally, in this second experiment, participants received false feedback on a small fraction of trials (i.e., trials that were answered correctly but labeled as false), and received novel feedback (i.e., a novel

“correct” answer) on those trials. In those false-feedback trials, a URPE effect was also observed: On trials that were answered with high certainty but that were not rewarded (high URPE), the novel feedback was subsequently recalled more confidently.

While overviewing and categorizing these paradigms, we note that a main difference between the reward-prediction and multiple-repetition approaches is the origin of the RPE:

An independent reward generation mechanism in the former, and the participant’s own confidence (or certainty) in his or her memory in the latter. Another difference is that, in the reward-prediction approach, RPEs are usually computed or estimated, whereas RPEs are deduced from confidence measures in the multiple-repetition approach. There are some exceptions to the latter rule: For example, Rouhani et al. (2018) implemented a reward- prediction paradigm where confidence is used to calculate a RPE. Finally, in the reward- prediction paradigm, memoranda are usually trial-unique, whereas (by definition) they are not in the multiple-repetition approach. These are just a few of the relevant dimensions of RPE-based declarative learning; we discuss some other potentially relevant dimensions in the next section.

(27)

Table 2

Non-Exhaustive Overview of Studies on RPE in Declarative Memory

Authors Approach Task and Stimuli SRPE

or URPE

Effect on Memory

Bunzeck et al. (2010)

Reward- prediction

Each of three cues (colored squares) is followed by one of two potential reward values (medium-low, medium-high, and low-high), so a medium reward can be better or worse than expected depending on the other reward value it is paired with.

After reward feedback, a novel (indoor or outdoor) scene is presented. Scene recognition is probed after a one-day delay.

SRPE Positive

De Loof et al. (2018) (see also Figure 1a-b)

Reward- prediction

On each trial, participants see one Dutch word together with four (trial-novel) Swahili words and choose a translation from either one, two, or four of these Swahili words. Manipulating the number of eligible options (1, 2, or 4) and whether a trial is rewarded or not, allowed manipulation of RPEs. For example, in the

SRPE Positive

(28)

General introduction

Davidow et al. (2016)

Reward- prediction

A cue is presented with two targets linked to different reward values. Subjects must (learn to predict and) choose the high- value target. Trial-novel images are shown during subsequent reward feedback.

Image memory is probed afterward via old/new judgments.

SRPE Positive

Rouhani et al. (2018) (see also Figure 1c-d)

Reward- prediction

Participants track the reward associated with different indoor and outdoor scenes.

On each trial, participants predict the reward (for a particular scene) and subsequently receive feedback about their estimate. From this difference (feedback - predicted reward), a RPE can be calculated. Scene memory is probed after this initial task via old/new judgments.

URPE Positive

Jang et al.

(2019) (see also Figure 1e-f)

Reward- prediction

On each trial, participants see a value and a stimulus (animate or inanimate) for that trial and decide to play or pass on that trial (Figure 1e). After each choice, the image is shown with reward feedback.

Afterward, recognition memory for the images is probed via old/new judgments.

SRPE Positive

Wimmer et al. (2014)

Reward- prediction

Participants track the drifting reward probability of colored squares, which are overlaid with incidental trial-unique images and followed by feedback.

Recognition memory for the images is probed via old/new judgments after a one-day delay.

SRPE Negative

(29)

Butterfield and

Metcalfe (2001)

Multiple- repetition

Participants are presented with questions for which they have to generate an answer and rate their confidence, followed by a surprise retest.

URPE Positive

Metcalfe et al. (2012)

Multiple- repetition

Participants are presented with general information questions. In a first test phase, participants provide answers and rate their confidence. In the subsequent phase, subjects received feedback about their answers. Finally, participants are retested on a subset of questions in a second test phase.

URPE Positive

Pine et al.

(2018)

Multiple- repetition

Participants study a text and are tested after two days, at which time they also provide confidence ratings for their answers. On a small fraction of trials, participants receive false feedback (i.e., trials that were answered correctly but labeled as false), and received novel feedback (i.e., a novel “correct” answer) on those trials. A second (incidental) test is given after 7 days.

URPE Positive

Note. A non-exhaustive overview of the studies included in the literature overview. Each study is reported together with its approach (i.e., reward-prediction or multiple-repetition

(30)

General introduction

Table 3

How to Generate and Measure RPEs: Experimental Approaches Approach Description

Reward- prediction approach

Here, participants must both learn declarative information (e.g., word pairs) and simultaneously estimate a (potentially non-stationary) reward distribution throughout the task (Jang et al., 2019; Rouhani et al., 2018;

Wimmer et al., 2014). In some cases, the correct RPE can be easily derived analytically; in other cases, RPE can only be calculated after fitting a reinforcement learning model and deriving the RPEs from the model estimates (Jang et al., 2019; Wimmer et al., 2014). One example of a reward-prediction approach is the variable-choice paradigm. In the variable-choice paradigm (De Loof et al., 2018; Ergo et al., 2019) (Figure 1a), participants learn stimulus pairs, such as Dutch-Swahili word pairs or image-Swahili stimulus pairs (De Loof et al., 2018). In the former example, on each trial, a Dutch word is shown together with four Swahili words.

Critically, the number of eligible options is manipulated. In the one-option, two-option, and four-option conditions, 1, 2, or 4 Swahili words are eligible (framed), respectively; and the probability of choosing the correct translation is thus 100%, 50%, or 25%, respectively. Feedback is given on every trial. Signed and unsigned trial-by-trial RPEs are calculated based on the difference between actual and predicted reward. Memory is probed in a subsequent recognition test.

Multiple- repetition approach

Here, general information questions are repeatedly presented, and a RPE is estimated based on previous presentations of each question. For example, in Pine et al. (2018), participants first studied a text and subsequently received (multiple-choice) questions about the text. After each question, they rated their confidence and received feedback. The trial-by-trial RPE was calculated using the confidence rating and feedback.

(31)

Hypercorrection effect studies also typically use a multiple-repetition paradigm (Butterfield & Metcalfe, 2006; Metcalfe et al., 2012).

Note. Glancing over the literature revealed two approaches to studying RPE-based declarative learning: The reward-prediction approach and the multiple-repetition approach.

(32)

General introduction

Figure 1

RPE in Declarative Memory

(33)

Note. Reward-prediction approach applied in three paradigms and their typical findings. a) Variable-choice paradigm from De Loof et al. (2018). b) Variable-choice paradigm behavioral results (De Loof et al., 2018) show a SRPE signature for recognition accuracy in both the immediate and delayed test group; recognition of word pairs increased linearly with larger and more positive RPEs. c) Paradigm reproduced from Rouhani et al. (2018). d) Rouhani et al.

(2018) found a URPE (U-shaped) signature; with memory improving for both large negative and large positive RPEs. e) Paradigm reproduced from Jang et al. (2019). f) Jang et al. (2019) found a SRPE signature: Memory score increased with increasing RPE.

Open Issues

Despite growing evidence that RPEs drive declarative memory as well as non- declarative memory, many questions remain unanswered. We discuss a few of them in the next paragraphs. In the current dissertation, we tried to tackle some of these open issues that are listed below.

RPE: Signed or Unsigned?

Studies with a multiple-repetition paradigm typically observed URPE (i.e., surprise or

“different-than-expected”) effects. In contrast, the reward-prediction paradigm has tended to yield SRPE (“better-than-expected”) effects, although URPE effects have occasionally been documented as well (Figure 1d) (Rouhani et al., 2018). Why do different designs tend to generate SRPE versus URPE effects on declarative learning? One potentially relevant factor is the range of the RPEs probed. In particular, studies that found a behavioral SRPE effect (i.e., most reward-prediction paradigms) might simply not have investigated the full range of RPEs.

For example, in the variable-choice paradigm (Calderon et al., 2020; De Loof et al., 2018; Ergo

(34)

General introduction

However, this is unlikely to be the full story, because both RPE signatures have been observed even within a single study. In an EEG study with the variable-choice paradigm (Ergo et al., 2019), a URPE pattern was observed during reward feedback in the theta (4-8 Hz) frequency band, consistent with literature implicating theta in URPE processing (Cavanagh &

Frank, 2014). In contrast, SRPE signatures were found in the high-beta (20-30 Hz) and high- alpha (10-15 Hz) frequency ranges, consistent with a functional role of both beta and alpha power in reward feedback processing (HajiHosseini et al., 2012; Kleberg et al., 2014).

Furthermore, in an fMRI study using a multiple-repetition paradigm, Pine et al. (2018) found SRPE-consistent activation in several areas (including striatum), but URPE signatures in others (including insula). Together, these findings suggest that both SRPE and URPE are important for declarative learning (Rouhani & Niv, 2021); and that we need an account identifying the functional role of each, in behavior, time, (neural) space, and frequency band. The Bayesian learning model mentioned earlier, which naturally incorporates both, may be a useful starting point in this respect. Specifically, as this hybrid model suggests, it may be that URPE drives learning rate, SRPE drives update, and their combination (learning rate × update) determines a (R)PE that drives declarative learning. Future research should be devoted to the development of computational models of RPE-based declarative learning.

Timing Issues of RPEs

In most paradigms, a novel declarative memorandum is presented on each trial, followed by a RPE, followed by declarative feedback about what the correct answer should have been (see Figure 1a, word pair encoding for an example). Here, RPE can have either a retrograde effect (if it interacts with the originally presented memoranda) or instead an anterograde effect (if it interacts with the declarative feedback). Concerning the anterograde effect, in studies using the variable-choice paradigm, the declarative feedback appeared either simultaneously with the RPE (delay of 0 ms; (De Loof et al., 2018)), or with a delay of 3000 ms (Ergo et al., 2019). The fact that we find very similar results in the two cases suggests that the timing of the RPE-feedback interval may not be crucial, at least within the first few 100s of ms. An interesting parallel can be drawn here with the test-potentiated learning effect from the declarative memory literature. Here, taking a test potentiates the learning of (old or novel) material that is subsequently presented (Arnold & Mcdermott, 2013; Pastötter &

(35)

Bäuml, 2014). Also, for a retrograde effect (of RPE on originally presented memorandum), an interesting analogy can be made with earlier literature. In particular, Braun et al. (2019) found a retrograde effect of reward on declarative memory, with objects that were (temporarily) closer to (subsequent) reward being better remembered afterward. In the reward-prediction approach, it remains to be shown which of these two (anterograde or retrograde effect of RPE) is crucial for driving the RPE-based declarative memory improvement.

A RPE can also appear at cue rather than at feedback. Only two papers thus far have investigated both cue- and feedback-locked RPE effects. In Jang et al. (2019), the authors observed cue- but not feedback-locked RPE effects; however, in their experiment, there was both a cue- and a feedback-locked RPE on each trial. It is very well possible that an initial RPE suppresses a second RPE occurring (e.g., a few 100 ms later) in that same trial. In a more recent study by Rouhani and Niv (2021), two experiments were used to dissociate the effects of SRPE and URPE during the presentation of cue versus feedback. Here, they found that SRPE at cue and URPE at feedback both drive declarative learning. The effect was found for both implicit and explicit memory tests. We conclude that RPE timing issues need to be studied more systematically. In particular, if this research is to have practical application in education, such studies will be imperative.

RPE: Why and How?

In non-declarative learning, a normative argument for why RPE is useful is well established: Calculating RPE is necessary for online (i.e., while interacting with the world) reward maximization (Sutton & Barto, 1998); this idea is inherent in the RW, TD, and Pearce- Hall models (Table 1). Does this argument apply to declarative memory as well? An intuitive

(36)

General introduction

order to increase memory consolidation (Skaggs & McNaughton, 1996; Wilson &

McNaughton, 1994). According to this principle, RPE prioritizes which memories to replay;

with episodic events associated with large RPEs having a higher priority on the replay list.

Consequently, these highly prioritized episodic events are replayed more often and thus better remembered afterward. Building on this interpretation, one would expect that imposing longer delays between study and test should enhance the RPE-boosted effect on recognition memory. In De Loof et al. (2018), some participants were tested immediately after learning, whereas others were tested after a one-day delay. In this experiment, we indeed found support for the idea that increasing the period between learning and testing leads to stronger SRPE effects in declarative learning. In particular, the delayed testing group showed higher recognition accuracies and a stronger SRPE effect compared to the immediate testing group.

Another issue is how RPE improves memory. One potential mechanism is via phase- locking to neural oscillations in specific frequency bands. In particular, neural theta phase synchronization (i.e., synchronization of two brain areas in the theta frequency (4-8 Hz)) may provide one (but not an exclusive) solution. Such synchronization can be achieved by making the theta phase of the two areas identical so that theta waves in both areas “go up and down”

together. Brain areas in theta phase synchrony are thought to communicate and learn more efficiently (Fries, 2015), thus facilitating memory integration (Backus et al., 2016). Indeed, episodic memory is enhanced when multimodal (audio-visual) stimuli are synchronously presented in theta phase; with stronger theta phase synchronization predicting better memory performance (Clouter et al., 2017; Wang et al., 2018). Dopaminergic midbrain neurons have also been found to phase-lock to (cortical) theta during encoding, with stronger phase-locking during subsequently remembered (versus forgotten) memoranda (Kamiński et al., 2018). Thus, it is possible that RPEs (via neuromodulatory signaling) increase theta (phase) synchrony, which subsequently allows the relevant brain areas to “glue” the episodes(s) together more efficiently (Berens & Horner, 2017). The EEG variable-choice paradigm study mentioned above (Ergo et al., 2019) provides preliminary evidence for this view. Further, computational models that consider RPE-theta interactions to drive learning, have started to appear (Verbeke & Verguts, 2019).

(37)

Whereas dopaminergic RPEs likely support non-declarative learning via basal ganglia pathways, dopaminergic RPEs may support declarative memory via hippocampus (Lisman et al., 2011). In their neoHebbian framework, Lisman et al. (2011) claim that during initial memory encoding, novel (i.e., surprising) information is accompanied by dopamine bursts.

Specifically, the theory postulates that the hippocampus detects RPEs, projects these to VTA which in turn enables long-term potentiation (LTP) (i.e., the strengthening of synapses), thereby facilitating (declarative) learning. Given the close relationship between dopaminergic firing and RPE (Schultz & Dickinson, 2000), it is plausible that declarative learning is indeed modulated by such a (dopaminergic) RPE signal. Standard theory holds that (dopaminergic) VTA calculates SRPE, but a substantial number of URPE neurons have also been observed in VTA and nearby midbrain areas (Matsumoto & Hikosaka, 2009). Moreover, also noradrenergic LC projects to hippocampus and may thus exert URPE effects (Kempadoo et al., 2016; Wagatsuma et al., 2017). Earlier work proposed that VTA-hippocampus interactions originate in the hippocampus (Lisman & Grace, 2005). Importantly, we propose that VTA- hippocampus interactions may also originate in the VTA, and that SRPEs (encoded by VTA, possibly based on input from ventral striatum; (Takahashi et al., 2016)) and URPEs (encoded in VTA and LC) may modulate LTP in hippocampus for episodic memory encoding.

Consistently, several studies have demonstrated that midbrain VTA activation (triggered by reward or RPE) is associated with improved episodic learning (Calderon et al., 2020; Gruber et al., 2016; Wittmann et al., 2005). Taken together the evidence, it is plausible that a bidirectional flow of information between the VTA and hippocampus supports RPE-based declarative learning.

The Effect of Test Delay on Declarative Memory

(38)

General introduction

al., 2016; Patil et al., 2016). However, a systematic comparison of the delay-by-RPE interaction on declarative memory remains to be carried out.

Reconsolidation

When information is retrieved from memory, it enters a plastic, labile state, allowing the information to be changed, strengthened, or weakened, a process called reconsolidation (Alberini & Ledoux, 2013; Fernández et al., 2016). Reconsolidation is most intensively studied in animal behavior (Alfei et al., 2015; Fernández et al., 2016), but has also been evidenced in humans (Elsey et al., 2018) where it is observed in both non-declarative memory (Nader et al., 2000) and declarative memory (Forcato et al., 2007; Sinclair & Barense, 2018).

Importantly, PE is required for reconsolidation (Exton-McGuinness et al., 2015) both in non- declarative (Sevenster et al., 2013, 2014) and in declarative memory (Sinclair & Barense, 2018, 2019). Given the significant role of RPE in declarative learning, and given that similar principles drive learning and reconsolidation (Sinclair & Barense, 2019), we predict that RPE may modulate reconsolidation too. The multiple-repetition approach, where declarative memory is probed iteratively, can be considered as a first attempt at investigating the role of RPE in the context of reconsolidation. This remains, however, to be further investigated.

Outline of the Dissertation

Learning, RPEs, and declarative memory are sometimes treated as separate topics, each with their own prominent paradigms, findings, and theories. The current perspective suggests instead that they are intimately related. Briefly, learning is modulated by RPEs, and leads to (declarative) memory traces in the brain. In Chapter 1, we discussed a few recent paradigms that have begun to explore such interactions. In the Open Issues section, we highlighted a number of dimensions of those paradigms, that if addressed, could greatly facilitate further development of the research field. Although much remains to be found out, concrete models and predictions are beginning to emerge, with relevance for both Natural and Artificial Intelligence. Concerning the latter, it is of interest that recent deep neural networks integrate non-declarative learning (as in standard neural networks) with declarative memory (Botvinick et al., 2019; Graves et al., 2016). In such artificial systems, RPEs may

(39)

determine when (or how strongly) to store a declarative memorandum. We are excited about what the (near) future will bring in this domain, not only because of its conceptual unification but also because of its promise for informing educational policy and practice.

What follows is a brief description of the chapters containing the empirical studies that were conducted during the PhD. The chapters are presented as independent studies, some of which have been published in scientific journals. Although each of the studies was designed to tackle a separate research question (some of which have been discussed in the Open Issues section), there might be some overlap between studies in terms of the methodologies that were used or the way participants were recruited.

In Chapter 2, we address the open issue of how RPEs improve declarative memory.

Here, we investigate the neural underpinnings of RPE-based declarative learning in an EEG study. More concretely, we put forward theta phase synchronization as a possible mechanism modulating this effect. Theta phase synchronization between (distant) brain regions is believed to facilitate communication and efficient learning (Fries, 2015) and has also been implicated in memory integration (Backus et al., 2016). To test this hypothesis, we measured EEG activity while participants associated Swahili words with one out of four word categories.

Two of these categories were coupled with left-hand responses, whereas the remaining two categories were coupled with right-hand responses. We were particularly interested in how RPEs influenced connectivity between the frontal and motor cortex during declarative learning.

Chapter 3 continues along the path of our search into the underlying neural mechanism of RPE-based declarative learning. More specifically, in this chapter, we formally

(40)

General introduction

In Chapter 4, we changed course a bit and focused on the interplay between RPE- based declarative learning and agency. Agency is defined as the perceived control over learning and the opportunity to make choices (Murty et al., 2015). Although several studies already focused on the role of agency in declarative learning, how agency interacts with RPE remains unknown. More concretely, we investigated if the RPE must derive from the participant’s own response, or whether instead, any RPE is sufficient to obtain the (declarative) learning effect. To test this, we introduced trials on which participants made a choice themselves (agency condition) and trials on which the computer chose for the participant (non-agency condition).

Chapter 5 explored the open issue of RPE range. In particular, we investigated how extreme RPEs (i.e., infrequent, large, and negative RPEs) influence declarative memory.

Previous versions of the variable-choice paradigm used a rather limited range of RPEs with RPEs being skewed to the positive side. This might have biased our results into finding a SRPE effect instead of a URPE effect, which has also been documented in the literature. As such, this experiment was a first attempt at consolidating the URPE and SRPE effects that have both been documented in the literature.

In the last empirical chapter, Chapter 6, we probed the role of objective versus subjective RPEs in declarative learning. In our previous experiments, RPEs were calculated by subtracting objective reward probability (i.e., one divided by the number of options) from reward outcome (i.e., reward/no reward). In Chapter 6, we looked at the effect of subjective RPEs (estimated by subtracting participant’s certainty (i.e., ranging from very certain to very uncertain) from reward outcome (i.e., reward/no reward)) on drive declarative learning using a paradigm in which participants were repeatedly tested.

(41)

References

Aberg, K. C., Müller, J., & Schwartz, S. (2017). Trial-by-trial modulation of associative

memory formation by reward prediction error and reward anticipation as revealed by a biologically plausible computational model. Frontiers in Human Neuroscience, 11, 56.

https://doi.org/10.3389/fnhum.2017.00056

Adcock, R. A., Thangavel, A., Whitfield-Gabrieli, S., Knutson, B., & Gabrieli, J. D. (2006).

Reward-motivated learning: Mesolimbic activation precedes memory formation.

Neuron, 50, 507–517. https://doi.org/10.1016/j.neuron.2006.03.036

Alberini, C. M., & Ledoux, J. E. (2013). Memory reconsolidation. Current Biology, 23(17), R746–R750. https://doi.org/10.1016/j.cub.2013.06.046

Alfei, J. M., Monti, R. I. F., Molina, V. A., Bueno, A. M., & Urcelay, G. P. (2015). Prediction error and trace dominance determine the fate of fear memories after post-training manipulations. Learning and Memory, 22(8), 385–400.

https://doi.org/10.1101/lm.038513.115

Anderson, J. R. (2013). Language, Memory, and Thought. In Psychology Press.

Arnold, K. M., & Mcdermott, K. B. (2013). Test-Potentiated Learning : Distinguishing Between Direct and Indirect Effects of Tests. Journal of Experimental Psychology : Learning Memory & Cognition, 39(3), 940–945. https://doi.org/10.1037/a0029199 Backus, A. R., Schoffelen, J.-M., Szebényi, S., Hanslmayr, S., & Doeller, C. F. (2016).

Hippocampal-prefrontal theta oscillations support memory integration. Current Biology, 26(4), 450–457. https://doi.org/10.1016/j.cub.2015.12.048

Banich, M. T., & Floresco, S. (2019). Reward systems, cognition, and emotion: Introduction

(42)

General introduction

Current Biology, 27(20), R1110–R1112. https://doi.org/10.1016/J.CUB.2017.08.048 Botvinick, M., Ritter, S., Wang, J. X., Kurth-nelson, Z., Blundell, C., & Hassabis, D. (2019).

Reinforcement learning, fast and slow. Trends in Cognitive Sciences, 1–15.

Braun, E. K., Wimmer, G. E., & Shohamy, D. (2019). Retroactive and graded prioritization of memory by reward. Nature Communications, 2018, 1–12.

https://doi.org/10.1038/s41467-018-07280-0

Bunzeck, N., Dayan, P., Dolan, R. J., & Duzel, E. (2010). A common mechanism for adaptive scaling of reward and novelty. Human Brain Mapping, 31(9), 1380–1394.

https://doi.org/10.1002/hbm.20939

Butterfield, B., & Metcalfe, J. (2001). Errors committed with high confidence are

hypercorrected. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27(6), 1491–1494. https://doi.org/10.1037/0278-7393.27.6.1491

Butterfield, B., & Metcalfe, J. (2006). The correction of errors committed with high

confidence. Metacognition and Learning, 1(1), 69–84. https://doi.org/10.1007/s11409- 006-6894-z

Calderon, C. B., De Loof, E., Ergo, K., Snoeck, A., Boehler, C. N., & Verguts, T. (2020). Signed reward prediction errors in the ventral striatum drive episodic memory. Journal of Neuroscience. https://doi.org/10.1101/2020.01.03.893578

Cavanagh, J. F., & Frank, M. J. (2014). Frontal theta as a mechanism for cognitive control.

Trends in Cognitive Sciences, 18(8), 414–421.

https://doi.org/10.1016/j.tics.2014.04.012

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science. The Behavioral and Brain Sciences, 36(3), 181–204.

https://doi.org/10.1017/S0140525X12000477

Clouter, A., Shapiro, K. L., & Hanslmayr, S. (2017). Theta phase synchronization is the glue that binds human associative memory. Current Biology, 27(23), 3143–3148.

https://doi.org/10.1016/j.cub.2017.09.001

Courville, A. C., Daw, N. D., & Touretzky, D. S. (2006). Bayesian theories of conditioning in a changing world. Trends in Cognitive Sciences, 10(7), 294–300.

(43)

https://doi.org/10.1016/j.tics.2006.05.004

Davidow, J. Y., Foerde, K., Galvan, A., & Shohamy, D. (2016). An upside to reward sensitivity:

The hippocampus supports enhanced Reinforcement Learning in adolescence. Neuron, 92(1), 93–99. https://doi.org/10.1016/j.neuron.2016.08.031

Dayan, P., Kakade, S., & Montague, P. R. (2000). Learning and selective attention. Nature Neuroscience, 3 Suppl(november), 1218–1223. https://doi.org/10.1038/81504

De Loof, E., Ergo, K., Naert, L., Janssens, C., Talsma, D., Van Opstal, F., & Verguts, T. (2018).

Signed reward prediction errors drive declarative learning. PLOS ONE, 13(1), e0189212.

https://doi.org/10.1371/journal.pone.0189212

Den Ouden, H. E. M., Kok, P., & de Lange, F. P. (2012). How prediction errors shape perception, attention, and motivation. Frontiers in Psychology, 3, 548.

https://doi.org/10.3389/fpsyg.2012.00548

Doll, B. B., Shohamy, D., & Daw, N. D. (2015). Multiple memory systems as substrates for multiple decision systems. Neurobiology of Learning and Memory, 117, 4–13.

https://doi.org/10.1016/j.nlm.2014.04.014

Eichenbaum, H. (2004). Hippocampus: Cognitive processes and neural representations that underlie declarative memory. Neuron, 44(1), 109–120.

https://doi.org/10.1016/j.neuron.2004.08.028

Elsey, J. W. B., Van Ast, V. A., & Kindt, M. (2018). Human memory reconsolidation: A guiding framework and critical review of the evidence. Psychological Bulletin, 144(8), 797–848.

https://doi.org/10.1037/bul0000152

Enomoto, K., Matsumoto, N., Nakai, S., Satoh, T., Sato, T. K., Ueda, Y., Inokawa, H., Haruno,

(44)

General introduction

Ergo, K., De Loof, E., Janssens, C., & Verguts, T. (2019). Oscillatory signatures of reward prediction errors in declarative learning. NeuroImage, 186, 137–145.

https://doi.org/10.1016/j.neuroimage.2018.10.083

Eshel, N., Tian, J., Bukwich, M., & Uchida, N. (2016). Dopamine neurons share common response function for reward prediction error. Nature Neuroscience, 19(3), 479–486.

https://doi.org/10.1038/nn.4239

Exton-McGuinness, M. T. J., Lee, J. L. C., & Reichelt, A. C. (2015). Updating memories—The role of prediction errors in memory reconsolidation. Behavioural Brain Research, 278, 375–384. https://doi.org/10.1016/j.bbr.2014.10.011

Fazio, L. K., & Marsh, E. J. (2009). Surprising feedback improves later memory. Psychonomic Bulletin and Review, 16(1), 88–92. https://doi.org/10.3758/PBR.16.1.88

Fernández, R. S., Mariano, M. B., & Pedreira, M. E. (2016). The fate of memory:

Reconsolidation and the case of prediction error. Neuroscience and Biobehavioral Reviews, 68, 423–441.

Forcato, C., Burgos, V. L., Argibay, P. F., Molina, V. A., Pedreira, M. E., & Maldonado, H.

(2007). Reconsolidation of declarative memory in humans. Learning & Memory (Cold Spring Harbor, N.Y.), 14(4), 295–303. https://doi.org/10.1101/lm.486107

Fries, P. (2015). Rhythms for cognition: Communication through coherence. Neuron, 88(1), 220–235. https://doi.org/10.1016/j.neuron.2015.09.034

Friston, K. (2003). Learning and Inference in the Brain. Neural Networks, 16(9), 1325–1352.

https://doi.org/10.1016/j.neunet.2003.06.005

Friston, K. (2010). The free-energy principle: A unified brain theory? Nature Reviews.

Neuroscience, 11(2), 127–138. https://doi.org/10.1038/nrn2787

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-barwińska, A.,

Colmenarejo, S. G., Grefenstette, E., Ramalho, T., Agapiou, J., Badia, A. P., Hermann, K.

M., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., & Blunsom, P. (2016).

Hybrid computing using a neural network with dynamic external memory. Nature, 538(7626), 471–476. https://doi.org/10.1038/nature20101

Gruber, M. J., Ritchey, M., Wang, S. F., Doss, M. K., & Ranganath, C. (2016). Post-learning

(45)

hippocampal dynamics promote preferential retention of rewarding events. Neuron, 89(5), 1110–1120. https://doi.org/10.1016/j.neuron.2016.01.017

HajiHosseini, A., Rodríguez-Fornells, A., & Marco-Pallarés, J. (2012). The role of beta-gamma oscillations in unexpected rewards processing. NeuroImage, 60(3), 1678–1685.

https://doi.org/10.1016/j.neuroimage.2012.01.125

Jang, A. I., Nassar, M. R., Dillon, D. G., & Frank, M. J. (2019). Positive reward prediction errors during decision-making strengthen memory encoding. Nature Human Behaviour, 3(7), 719–732. https://doi.org/10.1038/s41562-019-0597-3

Kamin, L. J. (1969). Predictability, surprise, attention, and conditioning. Punishment and Aversive Behavior, 279–296.

Kamiński, J., Mamelak, A. N., Birch, K., Mosher, C. P., Tagliati, M., & Rutishauser, U. (2018).

Novelty-sensitive dopaminergic neurons in the human substantia nigra predict success of declarative memory formation. Current Biology, 28(9), 1333-1343.e4.

https://doi.org/10.1016/J.CUB.2018.03.024

Kempadoo, K. A., Mosharov, E. V., Choi, S. J., Sulzer, D., & Kandel, E. R. (2016). Dopamine release from the locus coeruleus to the dorsal hippocampus promotes spatial learning and memory. Proceedings of the National Academy of Sciences of the United States of America, 113(51), 14835–14840. https://doi.org/10.1073/pnas.1616515114

Kleberg, F. I., Kitajo, K., Kawasaki, M., & Yamaguchi, Y. (2014). Ongoing theta oscillations predict encoding of subjective memory type. Neuroscience Research, 83, 69–80.

https://doi.org/10.1016/j.neures.2014.02.010

Knowlton, B. J., & Foerde, K. (2008). Neural representations of nondeclarative memories.

Current Directions in Psychological Science, 17(2), 107–111.

Références

Documents relatifs

Fig. Complete Reward Cancellation shows the end of conditioning when the magnitude of the stimulus has been fully learnt and the CS firing at the VTA has the same amplitude as

Even more interesting are the results from the pairwise comparisons demonstrating that 1) the perception of the FA feed flavor by the naive piglets (Fig 2A) modulated brain

A neuro-computational model showing the effects of ventral striatum lesion on the computation of reward prediction error in VTA.. NeuroFrance, the international conference of the

In this paper, we present a software design pattern for rewarding users as a way of enhan- cing persuasive human-computer dialogue in BCSS.. The resulting pattern

Furthermore, LTP amplitude varies with the time of day in mice (Chaudhury et al., 2005), and mice with various clock gene muta- tions show defects in certain types

Beha- vioral reaction times, reward-anticipating licking dura- tions and error rates varied for the same reward when tested in two different reward combinations, although the

Using fMRI, we show that signed RPEs (SRPEs) are encoded in the ventral striatum (VS), and crucially, that SRPE VS activity is re- sponsible for the subsequent recollection accuracy

While other areas might be implied in RPE computations in the VTA, within our minimal model, we used functional relevant inputs to the VTA that were shown to be strongly affected