• Aucun résultat trouvé

The cost of accumulating evidence in perceptual decision making

N/A
N/A
Protected

Academic year: 2022

Partager "The cost of accumulating evidence in perceptual decision making"

Copied!
18
0
0

Texte intégral

(1)

Article

Reference

The cost of accumulating evidence in perceptual decision making

DRUGOWITSCH, Jan, et al.

Abstract

Decision making often involves the accumulation of information over time, but acquiring information typically comes at a cost. Little is known about the cost incurred by animals and humans for acquiring additional information from sensory variables due, for instance, to attentional efforts. Through a novel integration of diffusion models and dynamic programming, we were able to estimate the cost of making additional observations per unit of time from two monkeys and six humans in a reaction time (RT) random-dot motion discrimination task.

Surprisingly, we find that the cost is neither zero nor constant over time, but for the animals and humans features a brief period in which it is constant but increases thereafter. In addition, we show that our theory accurately matches the observed reaction time distributions for each stimulus condition, the time-dependent choice accuracy both conditional on stimulus strength and independent of it, and choice accuracy and mean reaction times as a function of stimulus strength. The theory also correctly predicts that urgency signals in the brain should be independent of the difficulty, [...]

DRUGOWITSCH, Jan, et al . The cost of accumulating evidence in perceptual decision making.

Journal of Neuroscience , 2012, vol. 32, no. 11, p. 3612-28

DOI : 10.1523/JNEUROSCI.4010-11.2012 PMID : 22423085

Available at:

http://archive-ouverte.unige.ch/unige:31853

Disclaimer: layout of this document may differ from the published version.

1 / 1

(2)

Behavioral/Systems/Cognitive

The Cost of Accumulating Evidence in Perceptual Decision Making

Jan Drugowitsch,1,3* Rube´n Moreno-Bote,2,3* Anne K. Churchland,4Michael N. Shadlen,5and Alexandre Pouget3,6

1Institut National de la Sante´ et de la Recherche Me´dicale, E´cole Normale Supe´rieure, 75005 Paris, France,2Foundation Sant Joan de De´u, Parc Sanitari Sant Joan de De´u and Universitat de Barcelona, Esplugues de Llobregat, 08950 Barcelona, Spain,3Department of Brain and Cognitive Sciences, University of Rochester, Rochester, New York 14627,4Cold Spring Harbor Laboratory, Cold Spring Harbor, New York 11724,5Howard Hughes Medical Institute, Department of Physiology and Biophysics and National Primate Center, University of Washington, Seattle, Washington 98195-7330, and6De´partement des Neurosciences Fondamentales, Universite´ de Gene`ve, CH-1211 Geneva 4, Switzerland

Decision making often involves the accumulation of information over time, but acquiring information typically comes at a cost. Little is known about the cost incurred by animals and humans for acquiring additional information from sensory variables due, for instance, to attentional efforts. Through a novel integration of diffusion models and dynamic programming, we were able to estimate the cost of making additional observations per unit of time from two monkeys and six humans in a reaction time (RT) random-dot motion discrim- ination task. Surprisingly, we find that the cost is neither zero nor constant over time, but for the animals and humans features a brief period in which it is constant but increases thereafter. In addition, we show that our theory accurately matches the observed reaction time distributions for each stimulus condition, the time-dependent choice accuracy both conditional on stimulus strength and independent of it, and choice accuracy and mean reaction times as a function of stimulus strength. The theory also correctly predicts that urgency signals in the brain should be independent of the difficulty, or stimulus strength, at each trial.

Introduction

Decision making requires accumulation of evidence bearing on alternative propositions and a policy for terminating the process with a choice or plan of action. In general, this policy depends not only on the state of the accumulated evidence but also on the cost of acquiring this evidence and the expected reward following from the outcome of the decision. Consider, for example, a for- aging animal engaged in a search for food in an area with poten- tial predators. As it surveys the area to accumulate evidence for a predator, it loses time in the search for food. This cost is offset by a more significant, existential cost, should it fail to detect a pred- ator. This simple example underscores a common feature of de- cision making: choosing a time at which to commit to a decision requires balancing decision certainty, accumulated costs, and ex- pected rewards.

Previous research on decision making and its neural basis has focused mainly on the accumulation of evidence over time and the trade-off between accuracy and decision time (Green and

Swets, 1966; Laming, 1968; Link and Heath, 1975; Ratcliff, 1978;

Vickers, 1979; Gold and Shadlen, 2002, 2007; Bogacz et al., 2006).

This trade-off is optimal under assumptions of no or constant costs associated with the accumulation of evidence and knowl- edge of the task difficulty (Wald and Wolfowitz, 1948). However, under more natural settings, the task difficulty is likely to be unknown, the accumulation costs might change over time, and there might be a loss of rewards due to delayed decisions, or different reward for different decisions. Modeling and analyzing decision making in such settings requires us to take all of these factors into account.

In this article, we provide a formalism to determine the opti- mal behavior given a total description of the task, the rewards, and the costs. This formalism is more general than the sequential probability ratio test (SPRT) (Wald, 1947; Wald and Wolfowitz, 1948) as it allows the task difficulty to change between trials. It also differs from standard diffusion models (Laming, 1968; Link and Heath, 1975; Ratcliff, 1978; Ratcliff and Smith, 2004) by incorporating bounds that change as a function of elapsed time.

We apply our theory to data sets of two behaving monkeys and six human observers, to determine the cost of accumulating evi- dence from observed behavior. Based on these data, we show that the assumption of no cost and of constant cost are unable to explain the observed behavior. Specifically, we report that, for both the animals and the humans, the cost of accumulating evi- dence remains almost constant initially, and then rises rapidly.

Furthermore, our theory predicts that the optimal rule to termi- nate the accumulation of evidence should be the same in all trials regardless of stimulus strength. At the neural level, this predicts that urgency signals should be independent of the difficulty, or

Received July 25, 2011; revised Dec. 5, 2011; accepted Jan. 9, 2012.

Author contributions: J.D., R.M.-B., and A.P. designed research; J.D., R.M.-B., A.K.C., and A.P. performed research;

J.D., R.M.-B., A.K.C., and M.N.S. analyzed data; J.D., R.M.-B., M.N.S., and A.P. wrote the paper.

This work was supported by National Science Foundation Grant BCS0446730 and Multidisciplinary University Research Initiative Grant N00014-07-1-0937 (A.P.). M.N.S. and A.P. are jointly supported by National Institute on Drug Abuse Grants BCS0346785 and a research grant from The James S. McDonnell Foundation. R.M.-B. is supported by the Spanish Ramo´n y Cajal program. We thank Peter Latham for helpful discussions and suggestions.

The authors declare no competing financial interests.

*J.D. and R.M.-B. contributed equally to this work.

Correspondence should be addressed to Jan Drugowitsch, Laboratoire de Neurosciences Cognitives, De´partment d’Etudes Cognitives, E´cole Normale Supe´rieure, 29 rue d’Ulm, 75005 Paris, France. E-mail: jdrugo@gmail.com.

DOI:10.1523/JNEUROSCI.4010-11.2012

Copyright © 2012 the authors 0270-6474/12/323612-17$15.00/0

(3)

stimulus strength. We confirm this prediction in neural record- ings from the lateral intraparietal (LIP) area of cortical neurons.

Materials and Methods

Decision-making task.Here, we provide a technical description of the task and how to find the optimal behavior. More details and many helpful intuitions are provided in Results. We assume that the state of the world is eitherH1orH2, and it is the aim of the decision maker to identify this state (indicated by a choice) based on stochastic evidence. This evidence

xN(␮␦t,t) is Gaussian for some small time periodt, with mean␮␦t and variancet, where兩␮兩is the evidence strength, and␮ ⱖ0 and0 correspond toH1andH2, respectively. Such stochastic evidence corre- sponds to a diffusion modeldx/dt(t), where(t) is white noise with unit variance, andx(t) describes the trajectory of a drifting/diffusing particle. We assume the value ofto be unknown to the decision maker, to be drawn from the priorp() across trials, and to remain constant within a trial. After accumulating evidencex0…tby observ- ing the stimulus for some timet, the decision maker holds belief g(t)p(H1x0…t)p(␮ ⱖ0x0…t) [or 1g(t)] thatH1(orH2) is correct (see Fig. 1A). The exact form of this belief depends on the priorp() overand will be discussed later for different priors. As long as this prior is symmetric, that is,p(␮ ⱖ0)p(0)12, the initial belief at stimulus onset,t0, is alwaysg(0)12.

The decision maker receives rewardRijfor choosingHiwhenHjis correct. Here, rewards can be positive or negative, allowing, for instance, for negative reinforcement when subjects pick the wrong hypothesis, that is, wheniis different fromj. Additionally, we assume the accumulation of evidence to come at a cost (internal to the decision maker), given by the cost functionc(t). This cost is momentary, such that the total cost for accumulating evidence if a decision is made at decision timeTdafter stimulus onset isCTd兲 ⫽冕0Tdctdt(see Fig. 1B). Each trial ends afterTt seconds and is followed by the intertrial intervaltiand an optional pen- alty timetpfor wrong decisions (see Fig. 1C). We assume that decision makers aim at maximizing their reward rate, given by the following:

␳ ⫽ 具R典⫺具CTd兲典

Tt典⫹具ti典⫹具tp典, (1)

where the averages are over choices, decision times, and randomizations oftiandtp. We differentiate between fixed-duration tasks and reaction time tasks. In fixed-duration tasks, we assumeTtto be fixed by the ex- perimenter and to be large when compared withTd, andtp0. This makes the denominator of Equation 1 constant with respect to the sub- ject’s behavior, such that maximizing the reward ratebecomes equal to maximizing the expected net rewardR冔⫺冓C(Td)for a single trial. In contrast, in reaction time tasks, we need to consider the whole sequence of trials when maximizingbecause the denominator depends on the subject’s reaction time throughTt, which, in turn, influences the ratio (provided thatTtis not too short compared with the intertrial interval and penalty time).

Optimal behavior for fixed-duration tasks.We applied dynamic pro- gramming (Bellman, 1957; Bertsekas, 1995; Sutton and Barto, 1998) to derive the optimal strategy for repeated trials in fixed-duration tasks. As above, we present the technical details here and provide additional intu- ition in Results. At each point in time after stimulus onset, the decision maker can either accumulate more evidence, or choose eitherH1orH2. As known from dynamic programming, the behavior that maximizes the expected net rewardR冔⫺冓C(td)can be found by computing the “ex- pected return”V(g,t). This quantity is the sum of all expected costs and rewards from timetafter stimulus onset onwards, given beliefgat timet, and assuming optimal behavior thereafter (that is, featuring behavior that maximizes the expected net reward). Sincegis by definition the belief thatH1is correct, this expected return isgR11(1g)R12[or (1g)R22gR21] for choosingH1(orH2) immediately, while collecting evidence for anothertseconds promises the expected returnV(g(t

t),tt)g,tg(t␦t)but comes at costc(t)t. The expectation of the future expected return is over the beliefp(g(tt)g(t),t) that the decision maker expects to have attt, given that she holds beliefg(t) at timet. This density over future beliefs depends on the priorp(). Per-

forming at each point in time the action (chooseH1/H2or gather more evidence) that maximizes the expected return results in Bellman’s equa- tion for fixed-duration tasks as follows:

Vg,t兲⫽max

gRV11gt1t,gtR12t,1g,tgg共t⫹␦t兲R22gRct21兲␦,t

. (2)

For any fixedt, themax兵.,.,.其operator in the above expression partitions the belief space {g} into three areas that determine for which beliefs accumulating more evidence is preferable to deciding immediately. For symmetric problems [whenR11R22,R12R21,@␮:p(␮)p(⫺␮)], this partition is symmetric around12. Over time, this results in two time-dependent boundaries,g(t) and 1g(t), between which the de- cision maker ought to accumulate more evidence [Fig. 2 illustrates this concept; Fig. 3 provides examples forg(t)]. When a boundary is reached, the decision maker should decide in favor of the hypothesis associated with that boundary. The boundary shape is determined by the solution to Equation 2, which we find numerically by backward induction, as shown below.

Optimal behavior for reaction time tasks.We solved for the optimal behavior in reaction time tasks by maximizing the whole reward rate, Equation 1, rather than just its numerator. This reward rate depends not only on the expected net reward in the current trial, but through the expected trial timeTtand the penalty timetpalso on the behav- ior in all future trials. Thus, if we were to maximize the expected returnV(g,t) to determine the optimal behavior, we would need to formulate it for the whole sequence of trials (witht0 now indicat- ing the stimulus onset of the first trial instead of that of each of the trials). However, this calculation is impractical for realistic numbers of trials. We therefore exploited a strategy used in “average reward reinforcement learning” (Mahadevan, 1996), which effectively penal- izes the passage of time. Instead of maximizing V(g,t), we use the

“average-adjusted expected return”g,t兲 ⫽ Vg,t兲 ⫺ ␳t, which is the standard expected return minus␳tfor the passage of some timet, whereis the reward rate (for now assumed to be known). The strategy is based on the following idea. At the beginning of each trial, the decision maker expects to receive the reward ratetimes the time until the begin- ning of the next trial. This amount equals the expected returnV(12, 0) for a single trial, as above. Therefore, removing this amount from the ex- pected return causes the average-adjusted expected return to become the same at the beginning of each trial. This adjustment allows us to treat all trials as if they were the same, single trial.

As for fixed-duration tasks, the optimal behavior is determined by, in any state, choosing the action that maximizes the average-adjusted expected return. Since the probability that option 1 is the correct one is, by definition, the beliefg, choosingH1would result in the immediate expected rewardgR11(1g)R12. The expected intertrial interval isti典 ⫹ 共1 ⫺ g兲៮tpafter which the average-adjusted expected return is 12, 0, withtidenoting the standard intertrial interval,tpthe penalty time for wrong choices, and wheretp ⫽ 具tp兩wrong choice典 (that is, assuming a penalty time that is randomized, has meantp, and only occurs in trials where the wrong choice has been made). At the beginning of each trial, att0, the belief held by the subject isg12, such that the average-adjusted expected return at this point is12, 0. Thus, the average- adjusted expected return for choosingH1isgR11⫹共1⫺g兲R12⫺共具ti典⫹ 共1⫺gtp)␳ ⫹12, 0. Analogously, the average-adjusted expected return for choosingH2is共1⫺gR22gR21⫺共具ti典⫹gtp兲␳ ⫹ 12, 0. When collecting further evidence, one needs to only wait ␦t, such that average-adjusted expected return for this action is具V˜共g共t ⫹ ␦t兲,t ⫹

t)g,tg共t⫹␦t兲ct兲␦t⫺␳␦t. As we always ought to choose the action that maximizes this return, the expression forg,tis the maximum of the returns for the three available actions. This expression turns out to be invariant with respect to the resulting behavior under the addi- tion of a constant (that is, replacing all occurrences ofg,tby g,t兲 ⫹ K, whereKis an arbitrary constant, results in the same behavior). We remove this degree of freedom and at the same time simplify the equation by choosing the average-adjusted expected re-

(4)

turn at the beginning of each trial to be12, 0兲 ⫽ 0. This results in Bellman’s equation for reaction time tasks to be the following:

g,t兲⫽max

V˜gRg11t共1t1g兲,tR22gRt12gRg,21t共具g共t⫹␦t兲ti共具ti1cgttp兲␦g兲␳,ttp兲␳,␳␦t

.

(3) The optimal behavior can be determined from the solution to this equa- tion (see Results) (see Fig. 2A). Additionally, it allows us to find the reward rate: Bellman’s equation is only consistent if—according to our previous choice—12, 0兲 ⫽0. Therefore, we can initially guess some, resulting in some nonzero12, 0, and then improve the estimate of iteratively by root finding until12, 0兲 ⫽ 0.

Optimal behavior with minimum reward time.The experimental protocol used to collect the behavioral data from monkeys differed from a pure reaction time task because of implementation of a “min- imum reward time” tr. When the monkeys decided before tr, the delivery of the reward was delayed untiltr, whereas decisions aftertr resulted in immediate reward. Since the intertrial interval began after the reward was administered, the minimum reward time effectively extends the time the decision maker has to wait until the next stimulus onset on some trials. This is captured by a decision time-dependent effective intertrial interval, given byti,eff(t)⫽冓ti冔⫹max{0,trt}. If the decision is made before passage of the minimum reward time such thattrt, then the effective intertrial interval isti,eff(t)⫽冓ti冔⫹trt, which is larger than the standard intertrial interval,ti. Otherwise, if ttr, the effective intertrial interval equals the standard intertrial interval, that isti,eff(t) ⫽冓ti. This change allows us to apply the Bellman equation for reaction time tasks to the monkey experiments.

We simply replacetibyti,eff(t) in Equation 3.

Solving Bellman’s equation.Finding the optimal behavior in either task type requires solving the respective Bellman equation. To do so, we assume all task contingencies [that is, priorp(), the cost function c(t), and task timingstiandtp] to be known, such thatV(g,t) forg (0,1) andt[0,…] remains to be computed. As no analytical solution is known for the general case we treat here, we solve the equation numerically by discretizing both belief and time (Brockwell and Ka- dane, 2003). We discretizedginto 500 equally sized steps while skip- pingg0 andg1 to avoid singularities inp(g(t␦t)g(t),t). The step size in timetcannot be made too small, as a smallertrequires a finer discretization ingto representp(g(tt)g(t),t) adequately.

We have chosent5 ms for all computations presented in Results, but smaller values (for example,t2 ms, as applied to some test cases) gave identical results.

For fixed-duration tasks, the discretized version ofV(g,t) can be solved by backward induction: if we knowV(g,T) for someTand allg, we can computeV(g,Tt) by use of Equation 2. Consecutively, this allows us to computeV(g,T2␦t) using the same equation, and in this way eval- uateV(g,t) iteratively for alltT␦t,T2␦t,…, 0 by working backward in time. The only problem that remains is to find aTfor which we know the initial conditionV(g,T). We approach this problem by assuming that, after a very long timeT, the decision maker is guaranteed to commit to a decision. Thus, at this time, the only available options are to choose eitherH1orH2. As a consequence, the expected return at time TisV(g,T)max{gR11(1g)R12, (1g)R22gR21}, which equals Equation 2 if one removes the possibility of accumulating further evi- dence. ThisV(g,T) can be evaluated regardless ofV(g,Tt) and is thus known.Twas chosen to be five times the time frame of interest. For example, if all decisions occurred within 2 s after stimulus onset, we used T10 s. No significant change inV(g,t) (within the time of interest) was found by settingTto larger values.

We find the average-adjusted expected return共g,t兲similarly toV(g,t) by discretization and backwards induction on Eq. (3). However, asis unknown, we need to initially assume its value, compute12, 0, and then adjustiteratively by root finding (12, 0changes monotonically with) until12, 0兲 ⫽ 0.

Belief for Gaussian priors.The data we analyzed used a set of discrete motion strengths, and we used a prior onreflecting this structure in our model fits (see next section). Nonetheless, we provide here the ex- pression for belief resulting from assuming a Gaussian priorp共␮兲 ⫽ N共␮兩0,␴2for evidence strength over trials, as this prior leads to more intuitive mathematical expressions. The prior assumes that the evidence strength兩␮兩is most likely small but occasionally can take larger values.

We find the posteriorgiven all evidencex0…tup to timetby Bayes’

rule,p共␮兩␦x0…t兲 ⬀p共␮兲冲nN共␦xn兩␮␦t,t, resulting in p共␮兩␦x0…t兲⫽N

tx共t兲⫺2, 1

t⫹␴⫺2

, (4)

where we have used the sufficient statisticst ⫽ 兺ntandxt兲 ⫽ 兺nxn. With the above, the beliefg(t)p(␮ ⱖ0x0…t) is given by the following:

gt兲⫽⌽

冉 冑

tx共t兲⫺2

, (5)

where() is the standard normal cumulative function. Thus, the belief isg(t)12ifx(t)0,g(t)12ifx(t)0, andg(t)12ifx(t)0. The mapping betweenx(t) andg(t) giventis one-to-one and is inverted by the following:

xt兲⫽

t⫺21gt兲兲, (6)

where⫺1() is the inverse of a standard normal cumulative function.

Belief for general symmetric priors.Assume that the evidence strength can take one ofMspecific non-negative values {1,…,M}, at least one of which is strictly positive, and some evidence strengthmis chosen at the beginning of each trial with probabilitypm, satisfyingmpm ⫽1. Let the priorp() be given byp(m)p(⫽ ⫺␮m)pm/2 for allm1,…,M. In some cases, one might want to introduce a point mass at

m0. To share such a mass equally between0 and0, we handle this case by transforming this single point mass into equally weighted point massesp()p(⫽ ⫺⑀)pm/2 for some arbitrarily small. The resulting prior is symmetric, that is,p()p(⫺␮) for all, and general, as it can describe all discrete probability distributions that are symmetric around 0. It can be easily extended to also include continuous distributions.

With this prior, the posterior ofhaving the valuemgiven all evi- dencex0…tup to timetis as before found by Bayes’ rule, and results in the following:

p共␮ ⫽ ␮mx共t兲,t兲pmex共t兲␮mt2m2

npne2tn2ex共t兲␮ne⫺x共t兲␮n. (7)

The belief at timetis defined asg(t)p(␮ ⱖ0x0…t) and is therefore given by the following:

g共t兲

mpmex共t兲␮m2tm2

npnet2n2ex共t兲␮ne⫺x共t兲␮n. (8)

It has the same general properties as for a Gaussian prior and is strictly increasing withx(t), such that the mapping betweeng(t) andx(t) is one-to-one and can be efficiently inverted by root finding.

Accumulating evidence in the presence of a bound.The mapping be- tweeng(t) andx(t) in Equations 5 and 8 was derived without a bound in particle space {x}. Here, we show that it also holds in the presence of a bound (Moreno-Bote, 2010). This property is critical because it underlies the assertion that the diffusion model with time-varying boundaries per- forms optimally (see Results). Intuitively, the crucial feature of the map- ping betweeng(t) andx(t) is thatg(t) does not depend on the whole trajectory of the particle up to timetbut only on its current locationx(t).

As such, it is valid for all possible particle trajectories that end in this location. If we now introduce some arbitrary bounds in particle space and remove all particle trajectories that have crossed the bound beforet, there are potentially fewer trajectories that lead tox(t), but the endpoint of these leftover trajectories remains unchanged and so does their map-

(5)

ping tog(t). Therefore, this mapping is valid even in the presence of arbitrary bounds in particle space.

More formally, assume two time-varying bounds, upper bound

1(t) and lower bound2(t), with2(0)01(0) and1(t)2(t) for allt0. Fix some timetof interest, and letxdenote a particle trajectory with locationx៮共sforst. Also, let⍀共xt兲兲⫽兵x៮:␪2s兲⬍x៮共s兲⬍

1s兲᭙st,x៮共t兲 ⫽ xt兲}denote the set of all particle trajectories that have locationx(t) attand did not reach either bound before that.

The posterior ofgiven only the trajectory endpoint is thus given by the following:

p共␮兩xt兲兲 ⬀

p共␮兲pxt兲兩␮兲⫽p共␮兲

⍀共x共t兲兲

px៮兩␮兲dx៮, (9)

wheredenotes a proportionality with respect to. The path integral sums over all trajectories that have not reached the bound beforet. Con- sidering a single of these trajectories, it can be split into small steps

x៮共t兲⫽ x៮共t ⫹␦t兲⫺ x៮共tdistributed asN共␦x៮共t兲兩␮␦t,t, such that its probability is given by the following:

px៮兩␮兲⫽

n⫽0

t␦t

21␲␦t

e

nt␦t共␦x៮共n␦t兲⫺␮␦t兲2

2␦tDx៮兲ex共t兲␮⫺2t2, (10) wherex共t兲 ⫽ 兺n⫽0t␦tx共n␦t兲andt ⫽ 兺n⫽0t␦ttwas used, andD共x៮兲is a function of the trajectory that captures all terms that are independent of

. This already clearly shows that, as a likelihood of,p(x) is propor- tional to some function ofx(t) andt, with only its proportionality factor being dependent on the full trajectory. Using this expression in the pos- terior ofresults in the following:

p共␮兩xt兲兲 ⬀

p共␮兲ex共t兲␮⫺

t 22

⍀共x共t兲兲

Dx៮兲dx៮⬀

p共␮兲ex共t兲␮⫺

t

22. (11)

Therefore, the posterior ofdoes not depend on(x(t)) and is thus independent of the presence and shape of boundaries. As the beliefg(t) is fully defined by the posterior of, it shares the same properties.

Belief transition densities.Solving Bellman’s equation to find the opti- mal behavior requires us to evaluateV(g(tt),tt)g,tg(t␦t), which is a function ofV(g(tt),tt) andp(g(t⫹␦t)g(t),t). Here, we derive an expression forp(g(tt)g(t),t) for smallt, for a Gaussian prior, and for a general symmetric prior on.

In both cases, the procedure is the same: using the one-to-one map- ping betweengandx, we getp(g(tt)g(t),t) from the transformation p(g(tt)g(t),t)dg(tt)p(x(tt)x(t),t)dx(tt). The right-hand side of this expression describes a density over future particle locationsx(tt) after some timetgiven the current particle location x(t). The latter allows us to infer about the value offrom which we can deduce the future location byp(x(tt)x(t),t)⫽ 兰p(x(tt)

,x(t),t)p(x(t),t)d. Given, the future particle location isp(x(t

t),x(t),t)N(x(tt)x(t)␮␦t,t) due to the standard diffusion from known locationx. This expression ignores the presence of a bound during the brief time intervalt. Therefore, our approach is valid in the limit in whichtgoes to zero. Practically, a sufficiently smallt(as used when solving Bellman’s equation) will cause the error due to ignoring the bound to be negligible. Note thatp(x(t),t) depends on the priorp(), and so does dx(tt)/dg(tt).

Ifp() is Gaussian,p共␮兲 ⫽ N共␮兩0,␴2, the mapping betweeng(t) and x(t) is given by Equations 5 and 6. For the mapping betweeng(tt) andx(t

t),tin these equations has to be replaced bytt. The posterior of givenx(t) andtis given by Equation 4, such that the particle transition density results inp(x(tt)x(t),t)N(x(tt)x(t)(1teff),t(1

teff)), where we have defined␦teff ⫽ ␦t/共1 ⫹ 12. Asg(tt) is the cumulative function ofx(tt), its derivative with respect tox(tt) is the Gaussiandg共t ⫹␦t兲/dx共t ⫹ ␦t兲 ⫽N共x共t ⫹␦t兲兩0,t␦t ⫹12).

Combining all of the above, replacing allx(t) andx(tt) with their map- ping tog(t) andg(tt) result, after some simplification, in the following:

pgt⫹␦t兲兩gt兲,t兲

⫽ 1

␦teff

exp

关⌽⫺1共⌽g共t⫺12gt␦t兲兲兴2t兲兲2␦t

1effteff⫺1gt兲兲兲2

. (12)

Ifp(␮) is the previously introduced discrete symmetric prior, we have the mapping betweeng(t) andx(t) given by Equation 8. To simplify notation, define

ampmex共t兲␮m2tm2, mpmex共t⫹␦t兲␮mt⫹␦t2 m2, bmpme⫺x共t兲␮mt2m

2, mpme⫺x共t⫹␦t兲␮mt⫹␦t2 m 2,

(13) such that the mapping between g(t) and x(t) can be written as g共t兲 ⫽兺mam/兺n共anbn兲 ⫽ft共x共t兲兲. Even thoughft(x(t)) is not analyti- cally invertible, it is strictly increasing withx(t) and can therefore be easily inverted by root finding. The same mapping is established betweeng(tt) andx(t␦t) by replacing allaandbbyand. Using the same notation, the posterior ofgivenx(t) andt(Eq. 7) isp共␮ ⫽ ␮mxt兲,t兲 ⫽ am/兺nan bn)andp共␮ ⫽ ⫺ ␮mxt兲,t兲 ⫽bm/兺nanbn, such that the particle tran- sition density after some simplification is as follows:

pxt⫹␦t兲兩xt兲,t兲⫽Nxt⫹␦t兲兩xt兲,␦t

mmm

nanbn.

(14) Based on the mapping betweeng(tt) andx(tt), we also find the following:

dgt⫹␦t

dxt⫹␦t兲⫽

冉 冘

mmm

冊冉 冘

nn

冉 冘

nn

冊冉 冘

mmm

冊 冉 冘

nnn

2

(15) Combining all of the above results in the following belief transition density:

pgt⫹␦t兲兩gt兲,t兲

N共x共t⫹␦t兲兩x共t兲,␦t兲

冉 冘

n共a˜nn

3

mnmn

冊冉

m

冊冉

mnmn

m

n共anbn.

(16) Belief equals choice accuracy.To compute the cost function that corre- sponds to some observed behavior, we need to access the decision mak- er’s belief at decision time. We do so by establishing (shown below) that, for the family of diffusion models with time-varying boundaries that we use, the decision maker’s belief at decision time equals the probability of making the correct choice (as observed by the experimenter) at that time.

Note that showing this for the family of diffusion models does not nec- essarily imply that the belief held by the actual decision maker equals her choice accuracy, especially if the integration of evidence model of the decision maker does not correspond to a diffusion model or she has an incorrect model of the world. However, we will assume that subjects follow such optimal diffusion models, and therefore we infer the decision maker’s belief of being correct at some time by counting which fraction of choices performed at this time lead to a correct decision.

Références

Documents relatifs

Hypothesis 2 (Adoption Stage Presidency Hypothesis): The member state that holds the presidency in the adoption stage is better able to realize a policy outcome close to its

sant les rôles, ils les surveillent avec une attention toujours tendue, et, à la moindre infraction, ils leur jouent toutes sortes de mauvais tours. Ainsi, je connais un

land) into various outputs (ex: cropping  or cattle farming). •

In this paper, we prove that the two flavors of well-compo- sedness called Continuous Well-Composedness (shortly CWCness) and Digital Well-Composedness (shortly DWCness) are

Inspired by [MT12], we provide an example of a pure group that does not have the indepen- dence property, whose Fitting subgroup is neither nilpotent nor definable and whose

First, I present a paper-based cell-free synthetic biology platform using RNA toehold switch sensors to detect RNAs from human gut microbiome.. We showed that this

‫א‬ Abstract: understanding balances and fiqh analysis are considered from the accurate methods and complex processes at the scientific level, that provides Islamic jurisprudence

Recently, Chae, Hou and Li [Ch, HL] have proven that Navier-Stokes Boussinesq equations (8,9) (just called Boussinesq equa- tions in these papers) have global smooth solutions when d