Does long-term memory affect refreshing in verbal working memory?

(1)

Does Semantic Long-Term Memory Impact Refreshing in Verbal

Working Memory?

Valérie Camos

Université de Fribourg

Gérôme Mora

Université de Bourgogne Franche-Comté

Anne-Laure Oftinger, Stéphanie Mariz Elsig, and

Philippe Schneider

Université de Fribourg

Evie Vergauwe

Université de Genève

Attentional refreshing allows the maintenance of information in working memory and has received growing interest in recent years. However, it is still ill-defined and several proposals have been put forward to account for its functioning. Among them, some proposals suggest that refreshing relies on the retrieval of knowledge from semantic long-term memory. To examine such a proposal, the present study examined the impact on refreshing of two effects known to affect the retrieval from semantic long-term memory: word frequency and lexicality. In working memory span tasks, participants had to maintain memoranda varying in either frequency, or lexicality while performing concurrent tasks. By examining recall performance in complex span tasks and response times for the concurrent task in Brown-Peterson tasks, the present study provided evidence that long-term memory effects (a) affected recall without interacting with manipulation of refreshing and (b) did not affect refreshing speed. These findings challenge the idea that refreshing acts through the retrieval of knowledge from semantic long-term memory. Different WM models are discussed to account for these findings.

Keywords: working memory, long-term memory, refreshing, frequency, lexicality Supplemental materials:http://dx.doi.org/10.1037/xlm0000657.supp

Working memory (WM) is a system in charge of the mainte-nance and processing of information. It is often conceived as the hub of cognition, and measures of WM capacity are the best predictors of achievement in many high-level cognitive activities (seeConway, Jarrold, Kane, Miyake, & Towse, 2007, for a re-view). To account for the maintenance of information in WM, one process has grown in importance in recent years. This mechanism, named attentional refreshing, was first introduced by Johnson (1992), and is now included in several prominent WM models

(Baddeley, 2012;Barrouillet, Bernardin, & Camos, 2004;Cowan, 1995). Although the amount of research studying or simply evok-ing this mechanism is increasevok-ing, we are still far from fully understanding its functioning. Among the different proposals that describe refreshing, some have suggested that it relies on retrieval of information from semantic long-term memory (LTM) to recon-struct or enrich WM traces. Thus, the aim of the present study was to examine the impact of factors known to impact retrieval from semantic LTM on refreshing.

Maintenance in Verbal WM

Recent studies have dissociated two mechanisms involved in the maintenance of verbal information in WM: articulatory rehearsal and attentional refreshing (Camos, 2015, 2017; Camos & Barrouillet, 2014;Camos, Lagner, & Barrouillet, 2009;Camos, Mora, & Bar-rouillet, 2013;Camos, Mora, & Oberauer, 2011;Hudjetz & Oberauer, 2007;Mora & Camos, 2013,2015;Rose, Craik, & Buchsbaum, 2015; Vergauwe, Camos, & Barrouillet, 2014). On the one hand, articula-tory rehearsal relies on processes akin to those involved in language production and is domain-specific, maintaining verbal information in a phonological form (although seeLewandowsky & Oberauer, 2015, for a contradictory view). On the other hand, refreshing is an attention-based mechanism that allows the maintenance of verbal as well as visuospatial information (Souza, Rerko, & Oberauer, 2015; Vergauwe, Barrouillet, & Camos, 2009,2010;Vergauwe et al., 2014). This article was published Online First October 8, 2018.

Valérie Camos, Department of Psychology, Université de Fribourg; Gérôme Mora, Department of Psychology, Université de Bourgogne Franche-Comté; Anne-Laure Oftinger, Stéphanie Mariz Elsig, and Philippe Schneider, Department of Psychology, Université de Fribourg; Evie Vergauwe, Department of Psychology, Université de Genève.

Experiments 1 and 2 are part of Gérôme Mora’s PhD thesis, funded by a grant from the French Ministère de l’Enseignement Supérieur et de la Recherche and awarded by Valérie Camos. This work was also supported by the Institut Universitaire de France (grant to Valérie Camos) and the Swiss National Science Foundation (grant 100019_175960 to Valérie Camos and PZ00P1_154911 to Evie Vergauwe).

Correspondence concerning this article should be addressed to Valérie Camos, Department of Psychology, University of Fribourg, rue de

Fau-cigny 2, 1700 Fribourg, Switzerland. E-mail:[email protected]

This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Learning, Memory, and Cognition

0278-7393/19/$12.00 http://dx.doi.org/10.1037/xlm0000657

(2)

Although the two mechanisms can be jointly used to maintain verbal information (Camos et al., 2009, 2011), each of them is subject to different constraints. Because rehearsal relies on language production processes, it can be impeded by concurrent articulation (Baddeley, 1986;Baddeley, Thomson, & Buchanan, 1975). Refreshing is con-strained by the availability of attention, and any concurrent task that distracts attention away from maintenance reduces its use (e.g., Bar-rouillet, Portrat, & Camos, 2011; seeBarrouillet & Camos, 2015, for a review, andOberauer, Lewandowsky, Farrell, Jarrold, & Greaves, 2012, for an alternative proposal). When these two kinds of con-straints are orthogonally manipulated in complex span tasks, both reduced recall performance, and have often additive effects, suggest-ing independence between the two maintenance mechanisms (see Camos, 2015,2017, for reviews). Bringing further support for the independence of rehearsal and refreshing,Camos et al. (2013)and Mora and Camos (2013)have shown that phonological effects, such as the phonological similarity effect and the word length effect, are dependent on the use of rehearsal but not of refreshing. To summa-rize, there is substantial evidence that rehearsal and refreshing are two distinct mechanisms involved in verbal maintenance in WM.

In the same way as the identification of factors that interact with the use of articulatory rehearsal (e.g., the phonological character-istics of the memory items) sheds light on the nature of this mechanism, identifying factors that interact with the use of re-freshing should contribute to a better understanding of this process. Indeed, though researchers agree that refreshing requires attention, no unified view of refreshing has yet emerged, and several de-scriptions of its functioning can be found in the literature (see Camos et al., 2018, for a review). Refreshing can be conceived as briefly thinking of one or more previously activated representa-tions by bringing them into the focus of attention, resulting in augmented or extended activity (Cowan, 1995, 1999; Johnson, 1992;Johnson, Reeder, Raye, & Mitchell, 2002;Raye, Johnson, Mitchell, Greene, & Johnson, 2007; Vergauwe & Langerock, 2017). It was more recently described as a rapid scanning through items represented in WM (Vergauwe & Cowan, 2015). Refreshing has also been described as a covert retrieval from secondary memory (Loaiza & McCabe, 2012,2013;McCabe, 2008), as the act of bringing back into primary memory items that were dis-placed in secondary memory during the execution of a concurrent task. Alternatively, refreshing has been described as the recon-struction of WM representations (Barrouillet & Camos, 2015), in which semantic knowledge is used to reconstruct degraded WM representations. It has also been mentioned that refreshing might be akin to elaborative rehearsal (e.g.,Hudjetz & Oberauer, 2007; Loaiza & McCabe, 2012), that is the enrichment of WM repre-sentations based on LTM knowledge (Craik & Lockhart, 1972; Greene, 1987).

Among this diversity of conceptions, some commonalities ap-pear and we can broadly segregate these views into two distinct streams, depending on the role they attribute to semantic LTM in refreshing of WM traces. On the one hand, refreshing is conceived as a boost in activation of traces either maintained in the focus of attention or in a heightened accessibility state with no special role for semantic LTM knowledge (e.g.,Cowan, 1995;Johnson, 1992; Vergauwe & Cowan, 2015). On the other hand, refreshing is proposed to be directly dependent on the retrieval of semantic LTM knowledge (e.g.,Barrouillet & Camos, 2015). The concep-tion of refreshing as a covert retrieval from secondary memory

(Loaiza & McCabe, 2012,2013;McCabe, 2008) falls in between these two streams.1_{To summarize, the impact of semantic LTM}

knowledge is a determinant criterion that differentiates models of WM in terms of their conception of refreshing.

Some studies have examined the role of semantic effects, which are traditionally investigated in the LTM literature, in WM span tasks (see next section), strengthening the idea that semantic LTM affects verbal WM (Martin, 2005; Shivde & Anderson, 2011). However, it remains to be demonstrated whether refreshing in particular is sensitive to semantic LTM effects. The aim of the present study was then to examine the impact of effects known to affect retrieval from semantic LTM on refreshing. Doing so will enlighten the relationships between LTM and this particular WM maintenance mechanism, and more importantly, it could help in deciding between the two main streams of conceptions of refresh-ing.

Semantic LTM Influences on Verbal WM

For ease of communication, “LTM effect” refers here to the effect of any factor that impacts the retrieval of information from semantic LTM. For example, the frequency effect is a LTM effect wherein high-frequency words are better and faster retrieved from LTM than low-frequency words (e.g.,Morton, 1979). The impact of several LTM effects such as the effects of frequency, lexi-cality, meaningfulness, or concreteness, have been examined in WM recall. Generally, effects that favor the retrieval from LTM lead to improvement in WM recall performance. For example, Engle, Nations, and Cantor (1990)manipulated the frequency of to-be-remembered words in an operation span task in which participants verified arithmetic operations after reading each to-be-remembered word aloud. The same word lists were also presented in a simple span task requiring their serial order recall. In both span tasks, high-frequency words were better recalled than low-frequency words (see alsoRoodenrys, Hulme, Alban, Ellis, & Brown, 1994; Tehan & Humphreys, 1988; Watkins, 1977), al-though this difference was less pronounced in the operation span task, leading to an interaction between frequency and type of span tasks. In a similar operation span task,Loaiza, Duperreault, Rho-des, and McCabe (2015a) manipulated the lexicality of memo-randa (words vs. nonwords). The words were all high-frequency nouns and changing one letter of each of them created pronounce-able nonwords. Mirroring Engle et al.’s (1990) findings, words were better recalled than nonwords in both span tasks (see also Besner & Davelaar, 1982;Crowder, 1976), although the lexicality effect was reduced in the operation span task compared with the simple span task. In a subsequent study, Loaiza, Rhodes, and Anglin (2015b)manipulated the meaningfulness of memoranda in an operation span task, including words that young adults knew or did not know because they were outdated words. The known words were better recalled than the unknown words. Another finding supporting the participation of semantic LTM in verbal WM is the

1_{As in the former, refreshing reactivates memory traces by bringing}

them back in primary memory in which they benefit form a heightened state of accessibility. However, because working memory (WM) is con-ceived as being embedded within the larger context of long-term memory (LTM), prior knowledge stored in semantic LTM would facilitate refresh-ing as in the latter stream of models.

(3)

concreteness effect. Concrete or imageable words (like pencil) are better recalled than abstract or nonimageable ones (like peace; Acheson, Postle, & MacDonald, 2010; Allen & Hulme, 2006; Bourassa & Besner, 1994;Romani, McAlpine, & Martin, 2008; Walker & Hulme, 1999). Although the impact of LTM effects on WM recall performance has been frequently reported, a straight-forward explanation of this phenomenon is still lacking. Moreover, concerning the particular aim of this study and the possible role of LTM effects on refreshing, direct tests are still missing, and, as we will see in what follows, the few studies that provide indirect evidence concerning the role of semantic LTM in refreshing are actually contradictory.

Examining the impact of concreteness effect in an immediate serial recall task,Campoy, Castella`, Provencio, Hitch, and Badde-ley (2015)varied the availability of attentional resources by asking participants to concurrently perform either a simple tapping or a highly demanding random tapping task. Although the introduction of a higher concurrent attentional demand reduced recall perfor-mance, it did not moderate the concreteness effect. This absence of interaction between the concreteness effect and variation in con-current attentional demand was replicated in another experiment using different concurrent tasks. This absence of interaction could suggest that refreshing, which depends on the availability of at-tention, does not rely on semantic LTM. However, another study suggests that LTM knowledge does influence refreshing.Ricker and Cowan (2010) reported that blocking refreshing has an in-creasingly negative impact across longer retention intervals for English letters, but not for unfamiliar visual stimuli. Similarly, using a paradigm that assesses refreshing speed (see below for a description of this paradigm),Vergauwe et al. (2014)showed that maintenance of letters or words resulted in a postponement of an attentional demanding task proportionate to the memory load, while maintaining unknown fonts did not. Such results indicate that memoranda that are not represented in LTM are less likely to be refreshed in WM. Finally, if one agrees that complex span tasks provide more refreshing opportunities than simple span tasks (see McCabe, 2008), comparing LTM effects between these two types of tasks might indicate whether LTM is implicated in refreshing. BothEngle et al. (1990)andLoaiza et al. (2015a)found that LTM effects were enhanced in the simple span task compared with the operation span task, which suggests that WM recall benefits from LTM knowledge when refreshing opportunities are reduced. To summarize, there are currently no studies that directly address the question of the relationships between refreshing and semantic LTM, and the few relevant studies resulted in divergent findings that do not allow a straightforward conclusion on the issue. The present study directly tested how the efficiency of refreshing is moderated by retrieval from semantic LTM.

The Present Study

The present study reported two series of two experiments. In both series, we examine how LTM effects affect refreshing in WM. More specifically, and according to the conception that refreshing relies on some retrieval from semantic LTM, we ex-pected that reducing refreshing opportunities should have a stron-ger detrimental impact on recall for memoranda that are easy (vs. difficult) to retrieve from LTM. It may seem counterintuitive that the more easy-to-retrieve items would suffer more from a

reduc-tion of refreshing. However, by reducing refreshing opportunities, the difference between easy- and difficult-to-retrieve items will diminish, and could even vanish in the absence of refreshing opportunity. Alternatively, models that put forward that semantic LTM has no specific role in refreshing expect similar refreshing efficiency for easy- and difficult-to-retrieve items.

In the first series (Experiments 1 and 2), we examined whether refreshing efficiency is moderated by LTM effects in recall per-formance. Across two experiments using complex span tasks, we examined two different LTM effects by manipulating the fre-quency (Experiment 1) and the lexicality (Experiment 2) of the memory items. We already described these effects known to affect LTM retrieval. There are remarkably few studies in the literature that have investigated them in complex span tasks (but seeEngle et al., 1990; Loaiza et al., 2015a). Though these experiments observed that material known for being easier to retrieve from LTM (i.e., high-frequency words and words) was also better re-called in a WM recall test, none of these experiments examined whether these LTM effects moderate refreshing efficiency. Thus, to examine to what extent the effect of frequency or lexicality impacts refreshing, we implemented two different methods to disrupt refreshing in complex span task. According toBarrouillet, Bernardin, Portrat, Vergauwe, and Camos (2007), increasing the pace at which the distracting items have to be processed in the concurrent task reduces the availability of attention for active maintenance. In other words, it impairs refreshing. An even stron-ger manipulation of refreshing is to introduce or not a concurrent task in complex span tasks. While attention could be fully dedi-cated to the active maintenance of memory items when there is no concurrent task, adding the requirement to carry out a concurrent task will distract attention away from item maintenance. As al-ready described, these manipulations are specific to refreshing rather than rehearsal; the two mechanisms being independent from each other (Camos et al., 2009,2011,2013). The first manipulation (increasing the pace of concurrent task) was implemented in Ex-periment 1 and the second (introducing a concurrent task) in Experiment 2. In both experiments, we expected that our manip-ulations to reduce refreshing opportunities (by introducing a con-current task, or increasing its pace) would replicate well-known effects and would, thus, lead to a reduction of recall performance (e.g., Barrouillet et al., 2004, 2007, 2011). We also expected to replicate the established effects demonstrating that (a) high-frequency words are better recalled than low-high-frequency words, and (b) words are better recall than nonwords. More importantly for the purpose of this study, we examined the interaction between LTM effects and the disruption of refreshing. For both experiments, according to the conception that refreshing relies on some retrieval from semantic LTM, we expected that disrupting refreshing by reducing the time during which attention is available for mainte-nance should have a stronger detrimental effect in recall perfor-mance for the most easily retrieved memory items (i.e., high-frequency words in Experiment 1 and words in Experiment 2). Alternatively, if semantic LTM has no impact on the functioning of refreshing, the same disruptive impact in recall performance should be observed for easy and difficult to retrieve items.

The second series of experiments used a more fine-grained measure of refreshing efficiency, measuring its speed of reactivat-ing memory items. If refreshreactivat-ing relies on the retrieval from se-mantic LTM, then refreshing speed should be faster for items that

(4)

are easy to retrieve from LTM. Experiments 3 and 4 implemented a paradigm created byVergauwe et al. (2014), which is a variant of the Brown-Peterson paradigm that estimates the speed of re-freshing by manipulating the number of items that need to be maintained. Participants maintain a series of memory items while performing at their own pace a concurrent task during the 12-s delay of retention introduced before recall. While they have to correctly perform the concurrent task, they are instructed to prior-itize maintenance. Consequently, participants are not constrained on the amount or speed at which they have to perform the con-current task, contrary to the previous experiments. Only trials with perfect recall performance are examined. It is assumed in this paradigm that to achieve perfect recall, a participant would have to entirely dedicate attention to the maintenance of memoranda, which should postpone the execution of the concurrent task. Ex-amining the variations of response time for the concurrent task as a function of the number of memory items gives an evaluation of the speed of attentional refreshing. Despite the fact that this par-adigm is, to our knowledge, the first method to assess refreshing speed, it has also provided direct evidence for the involvement of attention in the maintenance of memory items. Indeed, when articulatory rehearsal was possible, participants were able to main-tain some verbal items (four letters, but fewer words) without any postponement of the concurrent task, such a postponement appear-ing when concurrent articulation (like repeatappear-ing ba-bi-boo) was performed. This result supports the fact that attention is in-volved in attentional refreshing but not in rehearsal. Experi-ments 3 and 4 implemented this paradigm in which either the frequency or the lexicality of the memory items was manipu-lated, respectively. We expected to replicate the effects of frequency and lexicality on recall performance, and also the postponement of the processing activity in such a way that response times in the intervening task linearly increase with the number of items to be remembered. More important, if refresh-ing relies on retrieval from semantic LTM, we predicted that such an increase should be reduced for high-frequency words, relative to low-frequency words, and for words relative to nonwords, reflecting a faster speed for the information that is easier to retrieve from LTM. Conversely, if retrieval from semantic LTM does not play a specific role in refreshing, the same refreshing speed is expected for both categories of items.2

Experiment 1

In Experiment 1, participants maintained low- and high-frequency words while performing a location judgment task on series of squares sequentially presented on the screen in a complex span paradigm. This location judgment task is a visuospatial task and should induce minimal representation-based interference with the material to be maintained. Moreover, this task is also known for impacting the refreshing of memory items (e.g., Camos et al., 2011,2013). To vary the opportunities of active maintenance via refreshing, the distracting items to be judged after each word were presented at either a slow or fast pace: the fast pace reduced the time during which attention was available for refreshing. We expected to replicate the finding that high-frequency words are better recalled than low-high-frequency words, and that increasing the pace of the location judgment task leads to poorer recall performance. Moreover, if refreshing relies on

retrieval from LTM, the pace effect should be stronger for the high-frequency than the low-frequency words.

Method

Participants. Thirty French native speakers (M_age ⫽ 20.50, range⫽ 17–23) who were undergraduate students at the Université de Bourgogne (first 20 participants) or Université de Fribourg (10 additional participants) received course credit to participate. The ethics committees of the Université de Bourgogne (Experiments 1 and 2) and the Université de Fribourg (Experiments 1– 4) approved the ethics applications for the study.

Materials. Monosyllabic words were chosen from the French database Lexique3 (New, Pallier, Ferrand, & Matos, 2001) to create 40 lists of six words. Twenty lists included high-frequency words and 20 lists low-frequency words. High-frequency words had a frequency greater than 70 per million (M⫽ 230, SD ⫽ 209), whereas low-frequency words had a frequency less than 1 per million (M⫽ 0.48, SD ⫽ 0.26). For each frequency, the lists were segregated in two sets of 10 lists, one set allocated to one pace. The association between sets and paces was counterbalanced across participants.

A pilot study of 22 students who did not participate in the current study verified that the low-frequency words were known and not considered as nonwords. Moreover, to verify that our material induced a frequency effect, 16 other students performed a simple span task with these lists. Words were successively pre-sented on the center of the screen for 1,000 ms each (seeFigure 1). After the last word, “Rappel” (Recall) prompted participants to recall the words aloud in their order of presentation. If a word was forgotten, participants were instructed to say “je ne sais pas” (I do not know) for the serial position of this word. The lists were presented in a random order, and the corresponding words were also in a different random order for each participant. As expected, high-frequency words (63%, SD⫽ 11) were better recalled than low-frequency words (38%, SD⫽ 10), BF10⫽ 6.9 ⫻ 105

(Bayes-ian two-sided t test).

Procedure. These pretested memory lists were introduced in a complex span task in which participants had to maintain words while judging the location of squares. Each series began with a ready signal centered on screen for 500 ms followed by the first word (see Figure 1). The words were presented in red font for 1,000 ms. After a delay of 500 ms, six squares successively appeared on screen either for 500 ms followed by a 250-ms delay for the fast pace (i.e., 1 processing item every 750 ms) or for 1,000 ms and a 500-ms delay for the slow pace (i.e., 1 processing item every 1,500 ms). The squares of 18-mm side appeared randomly and with equal probability 15 mm above or below the center of the screen. Participants pressed either a left or right key for the upper or lower location, respectively. Then the second word appeared, followed by six squares, and so on. At the end of the list, Rappel prompted the oral recall of the words in their order of presentation.

2_{Note that our prediction was oriented and could call for one-sided}

Bayesian (BF10) statistical tests. However, we chose to report the results of

two-sided tests, because the alternative conception of refreshing seeks for

evidence of absence of difference (BF01). Moreover, by reporting

two-sided tests, readers can easily double the BF10values to get the one-sided

tests if necessary. This document is copyrighted by the American Psychological Association or one of its allied publishers. This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

(5)

Participants said “I don’t know” if they forgot a word. Participants performed 40 trials in a fully crossed design, each list being associated to a single trial. The words within a list were presented in a different random order for each participant. The types of lists and the paces were also presented in a random order to each participant. At the beginning of each trial, the pace of the location judgment task was specified.

A training phase familiarized participants with the location judgment task with 36 stimuli for each pace. Then, participants performed one series of the complex span task for each pace. For these trials, memory items were forenames to avoid any interfer-ence with the experimental lists.

Results and Discussion

All participants reached at least 70% of correct responses on the location judgment task (M⫽ 91%, SD ⫽ 5%). All analyses were run using JASP (JASP Team, 2018), with the default settings. Bayesian analysis of variances (ANOVAs) on the percentage of correct response and response times (RTs) in the location judg-ment task with word frequency (high vs. low) and pace (slow vs. fast) as within-subject factors revealed that the best model of both performance measures included the pace effect only, BF10⫽ 74.8

and BF₁₀⫽ 3.6 ⫻ 109_{, respectively. Participants’ location}

judg-ments were faster but less accurate in the fast (375 ms, SD⫽ 29, and 90%, SD⫽ 6) than in the slow pace condition (407 ms, SD ⫽ 36, and 93%, SD⫽ 5).

A similar Bayesian ANOVA was performed on our main vari-able of interest, the percentage of words correctly recalled in their position of presentation (see online supplementary material for analysis on arcsin-transformed recall accuracy). The best model that accounted for the data included the two main effects of pace and frequency, but not their interaction, BF10⫽ 4.8 ⫻ 1016(see

Figure 2). As often reported in the literature, recall performance was better for the slow (76%, SD⫽ 10) than fast pace condition (63%, SD⫽ 11), BFinclusion

3_{⫽ 2.1 ⫻ 10}8_{, and high-frequency}

words (77%, SD ⫽ 9) were better recalled than low-frequency words (61%, SD⫽ 13), BFinclusion⫽ 1.4 ⫻ 10

12_{. It should be}

noted that this two main effects-only model was 3.2 times4

more preferable than the model with the two main effects and the interaction, resulting in evidence against the interaction of interest, BFinclusion⫽ 0.31.

Thus, contrary to the hypothesis that an increased pace of the concurrent task leads to a stronger detrimental effect on recall for the high-frequency than low-frequency words, Experiment 1 pro-vided evidence that refreshing efficiency is not moderated by the frequency effect. However, our manipulations were successful as the two manipulated factors impacted recall performance overall. As expected, a fast pace reduced recall relative to a slow pace. We also observed that low-frequency words were less accurately re-called than high-frequency words, as reported by Engle et al. (1990). Thus, we were able to replicate this effect although the two experiments differed in some aspects. WhereasEngle et al. (1990) used English words in an operation span task in which the con-current task was self-paced and the list length increased, our complex span task involved lists of constant length of French words and a computer-paced location judgment task. These dif-ferences did not seem to affect the occurrence of the frequency effect. Before drawing further conclusions, we performed a second experiment, as the current findings could be specific to the fre-quency effect and the way in which we disrupted refreshing. Thus, in Experiment 2, we introduced different manipulations of LTM and of refreshing opportunities, both known for having stronger effects on recall performance than the manipulations used in Experiment 1.

Experiment 2

Experiment 2 aimed at testing the same prediction as in Exper-iment 1 by implementing two stronger manipulations.5_{First, we}

varied the lexicality of the memory items, by presenting either words or nonwords to be remembered in a complex span task.

3_{In JASP, BF}

Inclusionis the change from before posterior odds, with prior

(posterior) inclusion probability being the sum of the prior (posterior) probabilities of all models that include the effect. We reported here the inclusion Bayes Factors based on matched models.

4_{This value resulted from the comparison of the model without the}

interaction of interest to the model with this interaction. Note that the BFinclsuionfor the interaction is equal to 1/model-comparison value.

5_{Although nonwords should have no representations in long-term}

mem-ory (LTM), representations of the pronounceable pseudo-words we used in this experiment (i.e., words that were transformed by the replacement of one letter) can be built up based on features stored in LTM that are temporally chunked, which makes their LTM accessibility more difficult than for the directly retrieved representations for words.

Figure 1. Illustration of the simple span task and the complex span paradigm involving a location judgment with fast and slow paces in Experiment 1.

(6)

Second, to induce a stronger manipulation of the disruption of refreshing, we contrasted two conditions in which the between-item delay was either filled by the location judgment task or unfilled. active maintenance in WM, and this strongly con-trasted with the impediment of refreshing during the location judgment task in the former condition. Contrasting these two conditions had the great advantage of maximizing the differ-ence in refreshing opportunities, but at the cost of a differdiffer-ence in representation-based interference generated by the concur-rent task, which was inevitably stronger in the filled-interval condition. Finally, we took the opportunity to examine the role of articulatory rehearsal on the interaction between LTM factor (i.e., lexicality) and refreshing in this experiment.

Previous work has shown that refreshing and rehearsal are distinct mechanisms that are involved in the maintenance of verbal information (seeCamos, 2015,2017, for reviews). While concur-rent attentional demand impedes refreshing, concurconcur-rent articula-tion prevents rehearsal. In Experiment 1, although we varied the availability of attention to actively maintain information across conditions, participants were still free to concurrently rehearse in all conditions. As a consequence, we cannot fully assure that the observed pattern of findings in Experiment 1 did not rely on the use of rehearsal. Thus, to clarify this issue, Experiment 2 disen-tangled the impact of the two mechanisms by orthogonally ma-nipulating the introduction of concurrent articulation and of a concurrent task during the delay between the memory items. An absence of concurrent articulation would permit the use of re-hearsal. Conversely, concurrent articulation promotes the use of refreshing, as showed byCamos et al. (2011). As a consequence, conditions under concurrent articulation constituted the most ad-equate conditions to examine the prediction that LTM effect mod-erates refreshing.

Method

Participants. Thirty-three French native speakers (Mage ⫽

20.12, range ⫽ 18–28) who were students at the Université de

Bourgogne (first 20 participants) and Université de Fribourg (10 additional participants) received course credit for participating. None of them participated in Experiment 1.

Materials. There were 120 words were selected from a set of 327 monosyllabic singular French nouns with a Consonant-Vowel-Consonant phonological structure, extracted from Lexique3. Based on these words, 120 pronounceable nonwords were created using Lexique Toolbox by substituting one phoneme of a word by an-other phoneme randomly selected by the software. Iterations re-sulting in unpronounceable nonwords and pseudohomophones were replaced by another iteration. As in Experiment 1, the ma-terial was tested in a simple span task in which series of 6 items had to be memorized for immediate serial recall (seeFigure 3). The same participants who participated in the experiment per-formed this pretest. We were then able to verify that they recalled more words (49%, SD ⫽ 21) than nonwords (24%, SD ⫽ 11), BF10⫽ 5.7 ⫻ 105.

Procedure. Besides the simple span task, participants per-formed four conditions of a complex span task resulting from the crossing of two conditions of concurrent task (with vs. without) and two conditions of concurrent articulation (with vs. without). Eight trials were presented in each condition, four trials with words and four trials with nonwords. For each trial, six memory items were randomly selected from the correspond-ing pool without replacement. The presentation of items was interleaved with a 6-s delay (seeFigure 3). After the last delay, Rappel was displayed and replaced after 1,000 ms by “1” indicating that participants had to typewrite the first item using keyboard. When finished, they pressed “Enter,” then “2” ap-peared, and so on. When participants did not remember an item, they pressed Enter to go to the next word to recall. They were informed they could not go back after pressing Enter. Once the recall was completed, participants pressed the spacebar to start the next trial.

According to the conditions, the delay of the complex span task was filled with different tasks. In the unfilled-delay con-Figure 2. Mean percentages of recall in correct position according to the type of lists (high vs. low-frequency)

and the pace of the concurrent task (slow vs. fast) in Experiment 1. Y bar represents SE.

(7)

dition, it was left empty, and participants had no task to perform except maintaining items. In the articulatory-suppression con-dition, after a 500-ms delay, a series of 12 10-ms tones (32 bits, 44,100 Hz) interleaved with 490-ms silence were presented through headphones. Participants were instructed to say “oui” (yes) each time they heard a tone. This helped participants to keep the rhythm of utterances, and allowed a strict control on the amount of concurrent articulation across participants and conditions including articulatory suppression. Furthermore, the number of utterances was counted by the experimenter in each trial. In the location-judgment condition, participants performed the same location judgment task as in Experiment 1 on a series of six 667-ms squares interleaved with 333-ms white screen.

Finally, in the articulatory-suppression-and-location-judgment condition, both series of tones and squares were presented simultaneously, and participants performed both the concurrent articulation and the location judgment task. Participants per-formed these conditions in four successive blocks, the order of which being counterbalanced across participants. Series of words and non-words were randomly intermixed within each block.

Each span task began with some practice trials. For simple-span, unfilled-delay, and articulatory-suppression conditions, participants received one practice trial. They received one more trial in the location-judgment condition, and two more trials in the articulatory-suppression-and-location-judgment condition. In the practice trials, memory items were forenames.

Figure 3. Illustration of the simple span task and the four complex span tasks used in Experiment 2.

(8)

Results and Discussion

The data of three participants (M ⫽ 65%, SD ⫽ 5) were discarded because they achieved less than 70% of correct re-sponses in the location judgment task. The remaining participants achieved a mean percentage of correct responses of 86% (SD⫽ 8). In a first step, we analyzed performance on the secondary tasks. Bayesian ANOVAs were performed on accuracy and RTs for the location judgment task with lexicality (words vs. nonwords) and concurrent articulation (with vs. without) as within-subject factors. For accuracy (seeonline supplementary material), a model with the two main effects only accounted best for performance in the location judgment task, BF₁₀ ⫽ 2.5 ⫻ 105_{. Accuracy was}

reduced by the concurrent articulation (without: 88%, SD⫽ 7 vs. with: 84%, SD⫽ 9), BFinclusion⫽ 9.3 ⫻ 10

4_{, and slightly}

lower in nonword (85%, SD⫽ 9) than in word (87%, SD ⫽ 8) conditions, BFinclusion⫽ 5.0. For the RTs, BFs for inclusion of

each main effect and the interaction were all inferior to 1. The null model was the favored model, BF₀₁⫽ 27.8.

A Bayesian ANOVA was also performed on the number of uttered “oui” with lexicality and concurrent task (with vs. without) as within-subject factors. The number of utterances remained fairly high across conditions. The best model included the main effect of the concurrent task only and showed that fewer utterances were produced under location judgment task (with: 11.2, SD⫽ 1.4 vs. without: 11.9, SD⫽ 0.6), BF10⫽ 8.4 ⫻ 103. This small decline

may result from the auditory presentation of beeps to which participants had to pay attention to keep the rhythm. When con-currently performing the location judgment task that also required attention, attention may have been slightly distracted away from the beeps.

Concerning our main analysis of interest, we analyzed recall performance scored in percentage of items recalled in correct

position, as it is usually done for complex span tasks and as in Experiment 1. A Bayesian ANOVA was performed on this score with lexicality, concurrent articulation and concurrent task as within-subject factors. The best model included the three main effects but none of the interactions, BF₁₀ ⫽ 1.7 ⫻ 1052_{. As}

expected, words (61%, SD⫽ 20) were better recalled than non-words (30%, SD⫽ 15), BFinclusion⫽ 1.3 ⫻ 10

42_{, while}

introduc-ing concurrent articulation (with: 37%, SD⫽ 20, vs. without: 55%, SD⫽ 17), BFinclusion⫽ 6.9 ⫻ 1019, or the location judgment task

(with: 41%, SD⫽ 19, vs. without: 52%, SD ⫽ 18), BFinclusion⫽

4.3⫻ 107_{, reduced recall performance (see}_{Figure 4). This model}

was 2.6 times more preferable than the second best model, which included the three main effects and the interaction between con-current articulation and lexicality. Of particular interest here, among the models that included the expected Lexicality X Task interaction, the model with the three main effects and this inter-action had the highest BF (BF⫽ 3.4 ⫻ 1051_{), but it was 5.2 times}

less preferable than the model without the interaction (i.e., the best model that only included the three main effects). The BF_inclusion for the interaction of interest was below 1, BFinclusion ⫽ 0.21,

providing evidence against the interaction of interest.

It is difficult to attribute the lack of interactions to a ceiling or floor effect, as recall performance was far from ceiling and the SEs were similar across experimental conditions (seeFigure 4). How-ever, it is possible that participants can favor either rehearsal or refreshing for maintaining verbal information depending on the constraints of the task (Camos et al., 2011). Thus, we performed a further analysis restricted to the conditions with concurrent artic-ulation, because it is assumed to promote the use of refreshing (Camos et al., 2011). In such conditions, the best model included only the lexicality and task main effects, BF₁₀⫽ 5.8 ⫻ 1016_{, and}

was 3.0 times more preferable than the model with the two main

Figure 4. Mean percentages of recall in correct position according to the type of lists (words vs. nonwords), the introduction of an articulatory suppression and of a concurrent task in Experiment 2. Y bar represents SE.

(9)

effects and the Lexicality X Task interaction (BF⫽ 1.7 ⫻ 1017_),

the BFinclusionfor this interaction being inferior to 1, BFinclusion⫽

0.33.

Experiment 2 strengthened the pattern of findings of Experiment 1 by extending it to another LTM effect. As expected, manipula-tions impeding either refreshing or rehearsal were both successful in reducing recall performance. Moreover, words were better re-called than nonwords in complex span task. This effect was only once reported in a previous study, byLoaiza et al. (2015a)in an operation span task. The emergence of a lexicality effect is rather resistant to change in the type of concurrent task. Finally, although we introduced a stronger manipulation of the availability of re-freshing and also manipulated the possibility to rehearse, the lexicality effect remained similar across conditions. Thus, again, the expected interaction was not observed. This is at odds with our prediction, but also contradicts the alternative view in which increasing refreshing opportunities should result in the reduction of the lexicality effect. Indeed, if word repetition reduces differ-ences in LTM accessibility (Scarborough, Cortese, & Scarbor-ough, 1977), and if refreshing is relying on retrieval from LTM (as we hypothesized), then the repeated refreshing of the memory items in conditions in which attention can be fully dedicated to refreshing (e.g., in conditions without a concurrent task) should lead to a reduction, and even the disappearance, of the frequency and lexicality effects.

To summarize, in Experiments 1 and 2, two different effects known to influence retrieval and recall from LTM proved to impact recall in complex span task as well. More importantly, for both of them, different experimental manipulations of refreshing have shown no evidence in favor of the expected pattern, that is, a stronger effect of refreshing disruption for high-frequency words or words. Across all analyses, the best model never included an interaction between the LTM effect and the refreshing manipula-tion. We even gathered evidence against this interacmanipula-tion. However, the absence of interaction could result from a sort of compensation between two effects. If less low-frequency words or nonwords are encoded in WM while being refreshed more slowly than high-frequency words or words, the constant duration available for refreshing may lead to two main effects without interaction. More-over, the search for an interaction in recall performance could be not sensitive enough to capture the expected effect of semantic LTM on refreshing. In Experiments 3 and 4, we examined how frequency and lexicality effects affected refreshing by assessing a more fine-grained aspect of refreshing, its speed.

Experiment 3

Vergauwe et al. (2014)developed a paradigm that allows one to measure refreshing speed. In a Brown-Peterson paradigm, partic-ipants were given a list of items for further recall and asked to perform an intervening task over a fixed retention interval before recall. However, participants were instructed to perform this in-tervening activity in such a way that, though aiming at the best performance, they did not forget the memoranda. For example, participants were presented with series of 0 to 7 letters to be remembered, and asked during a 12-s retention interval to judge the parity of as many numbers as they could by pressing keys, each key press displaying on screen a new number for judgment (Ver-gauwe et al., 2014, Experiment 4). Because articulatory rehearsal

is blocked by concurrent articulation, it is assumed that partici-pants would refresh memoranda quickly before each processing episode to achieve perfect recall. This should result in a linear increase of the processing times as a function of the number of items to be maintained. More important, followingVergauwe et al. (2014), because testing this hypothesis requires the assurance that participants effectively maintained the memoranda, processing time analyses were restricted to those trials in which participants achieved perfect recall. Thus, in Experiment 3, we examined the effect of maintenance on processing latencies.

As inVergauwe et al. (2014), memory items were succes-sively presented on screen (from 0 to 4 items), followed by a 12-s. delay during which participants had to perform a parity judgment task on digits that appeared successively on screen. Participants were asked to process correctly as many digits as they could, without forgetting the memoranda. To ensure par-ticipants were using refreshing and not rehearsal, they concur-rently and continuously repeated ba-bi-boo. As previously ex-plained for the previous experiment, concurrent articulation prevents rehearsal and favors the use of refreshing. Memory items were either high- or low-frequency words. We aimed to replicate the linear increase of processing times with the num-ber of items, which indexes the use of an attentional mainte-nance mechanism and allows one to measure refreshing speed. Moreover, according to the hypothesis that refreshing relies on the retrieval from semantic LTM, we predicted that the linear increase of the parity judgment response times with memory load would be steeper for the low- than the high-frequency words, reflecting the expected slower refreshing speed for low-frequency words.

Method

Participants and design. A total of 50 native French-speaking undergraduate psychology students (40 females, mean age⫽ 22.07 years, SD ⫽ 46 months) enrolled at the Université de Fribourg participated for course credit. All participants had normal or corrected-to-normal vision. Memory load (from 0 to 4 items) and Type of items (high-frequency vs. low-frequency words) were manipulated within-subjects.

Materials and procedure. The experiment, administered using E-prime software (Psychology Software Tools, Inc., Pittsburgh, PA). Each experimental trial started with a screen that informed the par-ticipant about the number of words to be maintained. Series to be maintained consisted of French words of either high or low frequency. The same pool of words as in Experiment 1 was used. They were segregated into two lists, half of the participants being assigned to one of the list.

The experiment consisted of 50 trials with 10 trials for each memory load (from 0 to 4), with half of the trials including lists either of low-frequency or high-frequency words (i.e., 5 trials per condition). After a fixation cross for 500 ms, participants saw a series of lower-case memory items (Courier New, 18) centrally displayed on screen in black on a white background at a rate of 1 item/s (see Figure 5). The presentation of the last stimulus was followed by a 12-s processing phase during which participants performed a parity judgment task. For this task, numbers from 1 to 10 were centrally displayed on screen (Courier New, 24) and participants were asked to press the right- or left-handed key for

(10)

even and odd numbers, respectively (a and l on the keyboard). Throughout the experiment, all numbers were used approximately equally often. Numbers were presented in gray on a black back-ground and remained on screen until response, which triggered the appearance of the next number. During the parity judgment task, participants were required to utter the syllables ba bi boo contin-uously throughout the entire 12-s processing phase before recall. To remind the participant of this, these syllables were presented centrally on top of each screen displaying a number. When the 12 s had elapsed, recall was probed; the word Rappel (recall in French) appeared on screen. Participants were invited to write down the words on response sheets containing 50 lines of four boxes (representing the maximum of four memory items per series).

Participants were instructed to perform the processing task in such a way that, though aiming at responding as fast and as accurately as possible, they remembered the memoranda in their order of presen-tation. Participants were further asked to rest their index fingers on the two response keys from the fixation screen onward. In a training session, participants were first familiarized with the memory task (one series of 2 low-frequency words, one series of 3 low-frequency words, one series of 4 frequency words, and one series of 3 high-frequency words) before practicing the processing task (20 numbers to be judged). Finally, three practice trials combined the memory task with the processing task (with 0, 2 and 4 memory items). A separate pool of memory items was used for the practice trials.

Results and Discussion

As inVergauwe et al. (2014), we checked that all participants paid sufficient attention to the task and recalled more than 50% of the series of words correctly. However, the data from one participant was excluded after the experiment because he was not a native French speaker contrary to the other participants. To assess recall perfor-mance of the remaining 49 participants, we computed the rate of series in which recall was perfect (i.e., series in which all memoranda were recalled in their order of presentation). All participants reached 50% of correct recall and, as expected, the rate of perfectly recalled series with high-frequency words (.78, SD⫽ .16) was higher than for series with low-frequency words (.61, SD⫽ .15). A Bayesian paired two-sided t test provided strong evidence in favor of a difference between high- and low-frequency words, BF₁₀⫽ 1.2 ⫻ 1010_,

repli-cating the frequency effect observed in Experiment 1.

To examine processing times (i.e., response times in the parity judgment task), we followed the same procedure as inVergauwe et al. (2014)and only correct responses were analyzed. Furthermore, we

distinguished between the very first and the remaining responses, as has been done inVergauwe et al. (2014)and other studies (Engle et al., 1990; Friedman & Miyake, 2004;Jarrold, Tam, Baddeley, & Harvey, 2011). For each trial, two dependent measures were taken: (a) the reaction time (RT) for the individual’s first response in the processing phase, referred to as first processing times and (b) the mean of all subsequent RTs in that processing phase, referred to as subsequent processing times. While first processing times have been attributed to the short-term consolidation of memory traces, subse-quent processing times are typically attributed to the maintenance of memory traces (e.g.,Engle et al., 1990;Jarrold et al., 2011;Vergauwe et al., 2014). Although our prediction concerned refreshing only, we took the opportunity to examine short-term consolidation as well, because it could also shed light on the relationships between WM and LTM. Indeed, some authors propose that short-term consolidation of memory traces benefits from the retrieval of information from LTM (e.g.,Blalock, 2015).

Bayesian ANOVAs with memory load (0 to 4) and type of words (high- vs. low-frequency words) as within-subject factors were per-formed on the first processing times and on the subsequent processing times. The best model for the first processing times included the two main effects only, BF₁₀ ⫽ 3.7 ⫻ 1019_{. Nevertheless, it should be}

noted that this model was only 1.2 times more preferable than the second best model, which included the effect of memory load only, BF₁₀⫽ 3.1 ⫻ 1019_{(Figure 6A). Moreover, the BFs for inclusion of}

the frequency effect and the interaction were small, BFinclusion⫽ 1.1

and 0.35, respectively, while it was large for the memory load effect, BFinclusion⫽ 4.3 ⫻ 1019. This favored the view that only memory

load impacted the first processing times. The analysis on the subse-quent processing times led to similar findings (Figure 6B). The model with the memory load effect only was the best model, BF₁₀⫽ 1.1 ⫻ 1018_{. It was 4.1 times more preferable than the second best model}

with the two main effects, BF₁₀⫽ 2.7 ⫻ 1017_{. In line with}

Experi-ment 1, the BFs for inclusion of the frequency effect and the inter-action were inferior to 1, BFinclusion⫽ 0.25 and 0.02, respectively,

while it was large for the memory load effect,6_BF

inclusion⫽ 1.2 ⫻

1018_.

To obtain an estimate of the refreshing speed, we calculated, for each individual and separately for high-frequency and

low-6_{As suggested by a reviewer, a similar analysis performed on trials with}

imperfect recalled trials resulted on a smaller BF for the memory load

effect, BFinclusion⫽ .46, that gave some evidence against this effect as it

may be expected under unsuccessful refreshing. Figure 5. Illustration of the Brown-Peterson paradigm used in Experiments 3 and 4.

(11)

frequency words, the slope of the linear function7_{relating response}

latency on the parity judgment task (i.e., subsequent processing times) to the number of words that were to be remembered (i.e., list length). We similarly estimated the consolidation speed by calcu-lating the slope of the linear function recalcu-lating the responses latency on the first processing times to the memory load. We already stressed the necessity to restrict processing time analyses to those trials in which participants achieved perfect recall. However, this skimming procedure would lead to more and more participants being discarded as the list length increases, restricting the analysis to a very small subset of high-span participants. Apart from the fact that the assessment of the memory load effect would involve different samples of participants across list lengths, it has been shown that high-span adults have distinct performance in many cognitive tasks (seeConway et al., 2007, for a review). Thus, to perform slopes analyses among the same participants, we reduced the range of list lengths involved in the analyses, while preserving

a sufficient number of participants. As inVergauwe et al. (2014), we chose to extend our analyses toward the longest list length at which two thirds of our sample were still able to achieve perfect recall in at least two trials to allow mean calculation. However, this longest list length differed greatly between high- and low-frequency word conditions, as it reached length 4 for high-frequency lists and only length 3 in low-high-frequency lists, which was a direct consequence of the frequency effect observed in recall. Thus, to stick to the original procedure and allow comparison with Vergauwe et al. (2014), we first analyzed the slopes between 0 and 3, which was the longest list length in both conditions in which at least two third of participants had perfect recall. However, because the slope calculation relied on rather small memory loads, we took into account, in a second series of analyses, the difference in list length between conditions. We analyzed the slope up to the max-imum length for each word condition (i.e., between 0 and 4 for the high-frequency and between 0 and 3 for the low-frequency words; named MaxC), and also for the maximum length (named MaxI) each individual achieved perfect recall in at least two trials.

In a first step, Bayesian two-sided paired t tests were performed on the slopes computed from 0 to 3 memory items on the first processing times (referred as consolidation inTable 1) and on the subsequent times (refreshing inTable 1). Bayes factors in favor of a difference between frequency condition (BF₁₀) and in favor of the null hypothesis (BF₀₁) were computed, the largest of the two being reported inTable 1. While the BF₁₀were inferior to 1, the BF01were superior at 5, supporting the null hypothesis. However,

one might argue that the slope between 0 and 1 depends more on the introduction of a secondary task than on the consolidation or refreshing of a single item and, thus, represents an estimate of dual-task cost rather than an estimate of maintenance-related pro-cesses. Thus, to have a better estimate of the consolidation and refreshing speed that would not be blurred by the dual-task cost, one could argue that slopes should be estimated from memory load 1 onward. Thus, in a next step, similar analyses were performed on slopes from 1 to 3, but also from 0 to 1 to get an estimate of dual-task cost. For the first processing times and the subsequent times, Bayes factors provided good evidence in favor of the null model for 1 to 3 slopes (see Table 1), supporting the idea that frequency does not impact consolidation and refreshing speed. For the 0 to 1 slopes, evidence was not conclusive for the first and the subsequent processing times. Finally, as we mentioned above, in the analyses reported up to this point, the slope calculation was restricted to list lengths up to 3. In a further analysis, the same analysis was performed on slopes calculated from 1 to the maxi-mum length for each condition and for each individual. This confirmed the absence of difference between frequency conditions on consolidation and refreshing speed (seeTable 1).

7_{Regression analysis showed that data were better accounted by a}

degree 2 polynomial function (R2_{⫽ .99 for both high- and low-frequency}

words for subsequent processing times, and .94 and .98 for the first

processing times, respectively) than by a linear function (R2_{⫽ .94 and .99}

for subsequent processing times, and .92 and .87 for the first processing times, respectively). Despite this, we chose to compute slopes based on linear functions because it provided a very good approximation (a loss of only 2.3 and 3.9% of the accounted variance compared with the polynomial functions), a more parsimonious account, and a theoretically based hypoth-esis on the psychological underlying processes.

Figure 6. Mean response times (in milliseconds) according to the mem-ory load (0 to 4 words) and the type of memmem-ory words (high- vs. low-frequency words) in Experiment 3. Y bar represents SE. Panel A represents the response times for the first responses attributed to the consolidation of memory traces and Panel B for the subsequent processing times attributed to the refreshing of memory traces.

(12)

In a final step, the same analyses were performed on a restricted sample in which only participants showing a frequency effect in their recall were included, the data from seven participants being discarded because their recall scores did not show better recall performance for high-frequency words. As clearly shown in Table 1, the overall pattern remained similar. Bayes factors provided evidence in favor of the null hypothesis for both consolidation and refreshing speed, esti-mated from 1 to 3 and from 1 to MaxC or MaxI.

To summarize, we replicated in Experiment 3 that word fre-quency affects recall performance as in Experiment 1. More im-portant, for the purpose of our study, we provided the first evidence that frequency does not moderate refreshing speed, high-and low-frequency words being refreshed at the same rate. It should also be noted that high-frequency and low-frequency words were also consolidated at a similar pace. Before drawing any further conclusions on the functioning of WM maintenance, we performed a final experiment introducing another LTM effect in this paradigm.

Experiment 4

Experiment 4 was modeled after Experiment 3, using the same paradigm and analysis, to reassess the impact of a LTM effect on refreshing speed through the manipulation of the lexicality of memory items. Accordingly, our predictions were also similar. We expected to replicate the finding that that RTs increase as a function of memory load and that lexicality impacts recall. We tested whether words benefit from a faster refreshing speed than nonwords.

Method

Participants and design. A total of 50 native French-speaking undergraduate psychology students (42 females, mean age⫽ 21.52 years, SD ⫽ 46 months) enrolled at the University of Fribourg participated for course credit. All participants had normal or corrected-to-normal vision, and none of them participated in Experiment 3. Memory load (from 0 to 4 items) and Type of items (words vs. nonwords) were manipulated within-subjects.

Materials and procedure. The experiment was similar to Experiment 3 (seeFigure 5), except that series to be maintained consisted of French words or nonwords. The pool of words and nonwords was the same as in Experiment 2.

Results and Discussion

We followed the same data analysis procedure as for Exper-iment 3. Data from two participants were discarded because they did not pay sufficient attention to the task and recalled on average less than 50% of the series correctly. As expected, the rate of perfectly recalled series of the remaining 48 participants was higher for word series (.76, SD⫽ .12) than for series with nonwords (.48, SD⫽ .13), BF₁₀⫽ 9.0 ⫻ 1014_{, replicating the}

lexicality effect observed in Experiment 2.

We first examined processing times on the first and subsequent responses with Bayesian ANOVAs with memory load (0 to 4) and type of memory items (words vs. nonwords) as within-subject factors. The best model for the first processing times included the two main effects only, BF₁₀⫽ 3.3 ⫻ 104_{. It was 3.2 times more}

preferable than the second best model with the effect of memory load only, BF₁₀⫽ 1.0 ⫻ 104_{(Figure 7A). Accordingly, the BF for}

the memory load effect was larger, BFinclusion⫽ 1.8 ⫻ 104, than

for the lexicality effect, BFinclusion ⫽ 3.2, and the interaction,

BFinclusion ⫽ 0.22. The analyses on the subsequent processing

times led to rather similar findings (Figure 7B). The best model included the effect of memory load only,8_BF

10⫽ 126.5. The BF

for inclusion of both the lexicality effect and the interaction were inferior to 1, BFinclusion⫽ 0.32 and 0.16, respectively.

As discussed in Experiment 3, to estimate consolidation and refreshing speed, we have either to restrict the size of list lengths to the lengths in which a large amount of participants were able to recall memory items (as inVergauwe et al., 2014), or to take into account individual differences on the maximum length participants

8_{As in Experiment 3, BF for the memory load effect was small when}

taking into account not-perfectly recalled trials BFinclusion⫽ .42, which is

expected under unsuccessful refreshing. Table 1

Mean Slope (and SD) for High-Frequency and Low-Frequency Series, and BF for Paired Two-Sided t-Tests in Favor (BF10in Italics)

or Against (BF01in Bold, i.e., in Favor of the Null Hypothesis) a Difference in Slope Values Between High-Frequency and

Low-Frequency Word Series

Process Slope

Full sample (n⫽ 49) Restricted sample (n⫽ 42)

High-frequency Low-frequency BF High-frequency Low-frequency BF

Consolidation 0–3 133 (222) 141 (220) 6.09 142 (235) 146 (211) 5.93 0–1 ⫺6 (226) 92 (266) 1.27ⴱ ⫺9 (241) 82 (268) 1.37 1–3 197 (337) 169 (253) 5.05 211 (357) 176 (251) 4.51 1-MaxC 155 (236) 169 (253) 5.72 173 (249) 176 (251) 5.97 1-MaxI 188 (335) 159 (287) 5.27 205 (356) 161 (267) 4.04 Refreshing 0–3 32 (47) 35 (54) 5.22 35 (49) 37 (57) 5.85 0–1 12 (56) 34 (70) 2.11 12 (56) 35 (74) 2.33 1–3 42 (74) 35 (74) 5.45 48 (78) 35 (78) 4.38 1-MaxC 44 (92) 35 (74) 5.41 47 (99) 35 (78) 4.83 1-MaxI 39 (70) 56 (177) 4.30 43 (74) 61 (190) 4.32

Note. MaxI⫽ the maximum length achieved by a participant; MaxC ⫽ the maximum length achieved in the condition (4 in high-frequency word and 3 in low-frequency word conditions). Restricted sample included participants who exhibited a frequency effect in their recall performance.

ⴱ_{In grey, inconclusive BFs (}_{⬍ 3); this value is a BF}

10, all the others are BF01.

(13)

could reach. Like for Experiment 3, we did both and we first determined the longest list length at which two thirds of our sample were still able to achieve perfect recall in at least two trials. As could be expected from recall performance, this longest list length differed greatly between types of memory items, as it reached length 3 for words and only length 2 for nonwords. Thus, we first analyzed the slopes between 0 and 2, which was the longest list length in both conditions.

Bayesian two-sided paired t tests were performed on slopes computed from 0 to 2 memory items on the first processing times and on the subsequent times (seeTable 2). Contrary to Experiment 3, the type of memory items affected the maintenance processes. The Bayes factors supported the hypothesis that lexicality moder-ates maintenance processes, although the evidence was stronger for consolidation than for refreshing. However, as explained in the previous experiment, this finding could rely on different dual-task

cost across conditions, rather than difference in maintenance pro-cesses per se. Similar Bayesian t tests were then performed on slopes from 1 to 2, and from 0 to 1. For the first processing times and the subsequent times, Bayes factor provided good evidence in favor of the null model for 1 to 2 slopes (seeTable 2), supporting the idea that lexicality does not impact consolidation and refresh-ing speed. For the 0 to 1 slopes, Bayes factor supported the existence of a stronger dual-task cost for the nonword series than the word series in the subsequent processing times, but this evi-dence was inconclusive for the first processing times. Finally, the same analyses were performed on slope calculated from 1 to the MaxC and for each MaxI. Because MaxI length (i.e., the max-imum length with at least two correctly recalled series) was 1 for five participants, their data were discarded for this analysis. This final analysis confirmed the absence of difference between lexicality conditions on consolidation and on refreshing when slopes from 1 to MaxC were considered. However, the BF was inconclusive for refreshing speed when slopes were computed from 1 to MaxI (see Table 2). Thus, overall, when dual-task costs were excluded, there was no evidence for different slopes between words and nonwords.

In a second step, the same analyses were performed on a restricted sample in which we only included participants show-ing the lexicality effect in their recall. The data of one partic-ipant were discarded. As clearly shown inTable 2, the overall pattern remained similar, Bayes factors provided evidence against different consolidation and refreshing speeds for words and nonwords when these are estimated from 1 to 2, from 1 to MaxC, and from 1 to MaxI, except that the evidence was inconclusive for refreshing speed when estimated from 1 to MaxI. Finally, it should be noted that the slopes between 0 and 2, and between 0 and 1, which most probably reflect the dual-task cost were larger for nonword series than word series for the subsequent processing times.

To summarize, the pattern of findings in Experiment 4 was similar to what was revealed in Experiment 3. Despite the fact that Experiment 4 used a different paradigm, recall performance was poorer for nonwords than words, as in the complex span task of Experiment 2. However, refreshing speed as well as consolidation speed were not faster for words than for nonwords, suggesting that the observed difference in recall performance does not result from variation in refreshing efficiency. We discuss how WM models can explain this in the General Discussion.

General Discussion

In the past decade, attentional refreshing has gained importance in WM models as one of the main mechanisms to maintain information in the short term. However, its functioning is still unclear and several theoretical proposals have been put forward (e.g., Barrouillet & Camos, 2015;Cowan, 1995;Johnson, 1992; McCabe, 2008;Vergauwe & Cowan, 2015). The aim of the present study was to test proposals suggesting that refreshing relies on retrieval from semantic LTM either to reconstruct degraded WM traces (Barrouillet & Camos, 2015) or to enrich them (e.g.,Hudjetz & Oberauer, 2007;Loaiza & McCabe, 2012). Therefore, we ex-amined how LTM effects impact refreshing efficiency in four experiments. The frequency and the lexicality of the memory items were manipulated in complex span tasks (Experiments 1 and 2) Figure 7. Mean response times (in milliseconds) according to the

mem-ory load (0 to 4 items) and the type of memmem-ory items (words vs. nonwords) in Experiment 4. Y bar represents SE. Panel A represents the response times for the first responses attributed to the consolidation of memory traces and Panel B for the subsequent processing times attributed to the refreshing of memory traces.