• Aucun résultat trouvé

Chapter 7

General Discussion

The overarching goal of this doctoral dissertation was to provide further empirical support for the role of reward prediction errors (RPEs) in declarative learning. Whereas the importance of RPEs has been well established in non-declarative learning (O’Doherty et al., 2004; Steinberg et al., 2013), by contrast, very few studies have paid attention to the role of RPEs in declarative learning. Indeed, while overviewing the literature it became clear that the topics of RPE and declarative learning have been primarily studied independently of each other. Although declarative and non-declarative learning rely on different neural circuits (Cohen & Squire, 1980; Poldrack & Foerde, 2008; Willingham, 1997), work on a different timescale and are supported by distinct neural mechanisms (Squire, 2004), we hypothesized that an important similarity between declarative and non-declarative learning is that they both depend on RPEs. The predictive Reinforcement Learning (RL) framework (Friston, 2010;

Silvetti et al., 2018) provides a theoretical scaffolding for this hypothesis. According to this framework, RPEs function as teaching signals that enable updating future predictions, optimizing our predictive model of the world, and guide learning. In brief, by building upon formal models of learning and applying computational principles to memory research, we tried to bridge this gap in the literature.

Using the previous studies from the lab (De Loof et al., 2018; Ergo et al., 2019) as a starting point, we examined the validity, robustness, and generality of these findings. In the first two chapters (Chapter 2 and Chapter 3), we studied how RPEs influence declarative learning. More specifically, we used electroencephalography (EEG) and transcranial Alternating Current Stimulation (tACS) to (causally) test the role of theta-frequency oscillations in RPE-driven declarative learning. In the remaining chapters (Chapter 4, Chapter 5, and Chapter 6), we presented behavioral studies in which we addressed some of the open issues discussed in Chapter 1.

Below, we summarize the findings of the empirical studies that were conducted, followed by a discussion of their implications and limitations. We end the general discussion with a paragraph dedicated to future research and how the work presented in this doctoral dissertation might help shape future directions.

Summary of the Main Findings of the Doctoral Research Investigating How RPEs Facilitate Declarative Learning

Do RPEs Elicit Changes in Brain Connectivity While Learning?

In Chapter 2, we examined how RPEs facilitate declarative learning on the neural level.

Here, we proposed that theta phase (4-8 Hz) synchronization modulates the RPE effect in declarative learning. According to the literature, brain regions that are synchronized in the theta phase are thought to communicate and learn more efficiently (Fell & Axmacher, 2011;

Fries, 2015), thus facilitating memory integration (Backus et al., 2016) by gluing together individual episodes into one coherent memory (Clouter et al., 2017). One assumption is that RPEs, originating from the ventral tegmental area (VTA) and substantia nigra (SN), are projected to the medial frontal cortex (MFC) (Nieuwenhuis et al., 2004) (via dopaminergic neuromodulatory signaling) where they induce a theta phase reset followed by an increase in theta phase synchronization (Canavier, 2015). When neurons are synchronized, spike-timing-dependent plasticity is facilitated, which in turn enables long-term-potentiation (LTP) to occur and synapses to be strengthened. It is this change in synaptic strength that lies at the basis of memory formation (Caporale & Dan, 2008). To verify this hypothesis, we measured oscillatory power, phase synchronization, and phase connectivity (between the frontal and motor cortex) while participants associated 60 Swahili words with 4 word categories.

General discussion

during performance feedback processing. More precisely, at the start of performance feedback processing, oscillatory delta power scaled with URPE, followed by oscillatory power in the theta, delta and high-alpha frequency bands scaling with SRPE. We also reported increased delta phase synchronization predicted by URPE. Crucially, and in line with our hypothesis, increased theta phase synchronization between the frontal and right motor cortex during performance feedback processing was predicted by URPE. The latter result thus provides preliminary proof for RPEs modulating theta phase synchronization between brain regions that are relevant to the task at hand. However, one major drawback of our current methodology is that the phase synchronization findings might be biased due to the low number of trials in our experiment (Vinck et al., 2010). As such, more work is needed to verify the reliability of this effect and follow-up studies should focus on increasing statistical power.

Testing the Causal Role of Theta Phase Synchronization

In the next empirical chapter, Chapter 3, we continued our search into the underlying mechanism supporting the effect of RPEs in declarative learning. Here, we presented a neurostimulation study in which we formally tested the causal role of theta phase synchronization in RPE-based declarative learning. Prior work consistently showed a role of theta frequency in both (R)PE processing (Cavanagh et al., 2009, 2010, 2012; Mas-Herrero &

Marco-Pallarés, 2014) and memory (Sederberg et al., 2003). To this end, we applied a 6 Hz alternating current to the MFC while participants learned Dutch-Swahili word pairs coupled with RPEs of different sizes and values. We hypothesized that: (1) If declarative learning is modulated by theta oscillations in MFC, then recognition accuracy and certainty should be modulated by tACS. More concretely, the real stimulation group should show higher recognition accuracies and certainty ratings compared to the sham stimulation group. (2) If theta oscillations are driven by RPE then tACS and RPE should interact. While we again replicated our behavioral result of SRPE-driven declarative learning, tACS failed to modulate the SRPE effect in this particular task and did not alter memory performance. Based on these findings, we cannot provide empirical support for the mechanism of theta phase synchronization in our declarative learning task. However, as we did not concurrently measure brain signals with EEG, it remains unclear whether our stimulation protocol was able to influence underlying brain activity in our task in the first place. To rule out this potential

flaw, future studies should focus on tailoring neurostimulation parameters to each participant individually while measuring ongoing brain activity or by using a subject-specific head model in which current flows can be estimated with, for example, the recently developed Stochastic HeAd Modelling (shamo) tool (Grignard, 2020).

Addressing a Few of the Remaining Open Issues

In the remaining chapters of the doctoral dissertation, we tried to address some of the open issues that were raised in Chapter 1. Given that the crossover of topics between RL principles (such as RPE) and declarative learning is a relatively new research field, there is still much that remains to be explored.

Examining the Importance of Making Our Own Mistakes

The next empirical chapter, Chapter 4, investigated the role of agency in declarative learning. Here, we asked the question of whether a RPE must stem from the participants themselves or whether any RPE, coming from the participant’s environment, suffices to induce the RPE-based learning effect. Based on the RL theory, participants can use RPEs to learn about their own actions (as per convention) or states (i.e., their environment) (Mattar

& Daw, 2018; Sutton & Barto, 2018). To verify this, participants learned Dutch-Swahili word pairs while RPEs were generated. As a crucial manipulation, we included an agency (where participants made their own choice) and a non-agency (where the computer chose for the participant) condition in our paradigm. We expected to find an interaction between RPEs and agency if RPEs allow learning only about one’s own actions. Conversely, there should be no interaction when RPEs also aid in learning about states. We replicated the main effect of SRPE:

Recognition accuracy increased for large, positive RPEs. Importantly, the interaction between

General discussion

environment, active engagement of the participant might be less important than previously thought.

Expanding the Range of RPE

Chapter 5 addressed the open issue of the limited range of RPEs. In particular, in previous versions of our experimental design, the maximal RPE was .75, whereas the minimal RPE was -.50 (hence, lower in absolute value). By definition, one repercussion of this manipulation is that the range of RPEs was skewed to the positive side and this might have influenced our previous results into finding a SRPE effect as opposed to the URPE effect that has also been found in studies using other experimental paradigms. To accommodate this potential flaw, we increased the range of negative RPEs by including a few extreme RPE (i.e., highly infrequent, large, negative RPE) trials. Consequently, by expanding the (negative) range of RPEs, the effect of both large negative and large positive RPEs on declarative learning could be explored within the same paradigm. In line with the SRPE hypothesis, extreme RPE trials were accompanied by lower recognition accuracy. Strikingly, these extreme RPE trials were accompanied by the lowest recognition accuracies, with accuracies being even lower than trials on which no RPE was experienced. Importantly, we ruled out the possibility that participants discounted the extreme RPE trials throughout the experiment. As such, these findings lend further support to the SRPE account.

Towards a New Paradigm

In the final empirical chapter, Chapter 6, we looked into the effect of subjective (or internal) RPEs on declarative learning. Subjective RPEs were estimated by subtracting subjective reward probability (i.e., participant’s choice certainty) from reward outcome (i.e., reward/no reward) on declarative learning. In our previous studies (Calderon et al., 2020; De Loof et al., 2018; Ergo et al., 2019, 2020), objective (or external) RPEs were computed through a fairly artificial manipulation. More specifically, RPEs were calculated by subtracting objective reward probability (i.e., one divided by the number of options) from reward outcome (i.e., reward/no reward). In this experiment, we tried to create a more realistic learning environment where RPEs are based on the participant’s choice certainty.