Bayesian optimal design for ordinary diﬀerential equation models with application in biological science

(1)

Bayesian optimal design for ordinary differential equation models with application in biological science

Antony M. Overstall ^∗† , David C. Woods ^† and Ben M. Parker ^‡

† Southampton Statistical Sciences Research Institute, University of Southampton, United Kingdom

‡ School of Computing and Engineering, University of West London, United Kingdom

Abstract

Bayesian optimal design is considered for experiments where the response distribu- tion depends on the solution to a system of non-linear ordinary differential equations.

The motivation is an experiment to estimate parameters in the equations governing the transport of amino acids through cell membranes in human placentas. Decision- theoretic Bayesian design of experiments for such nonlinear models is conceptually very attractive, allowing the formal incorporation of prior knowledge to overcome the pa- rameter dependence of frequentist design and being less reliant on asymptotic approx- imations. However, the necessary approximation and maximization of the, typically analytically intractable, expected utility results in a computationally challenging prob- lem. These issues are further exacerbated if the solution to the differential equations is not available in closed-form. This paper proposes a new combination of a proba- bilistic solution to the equations embedded within a Monte Carlo approximation to the expected utility with cyclic descent of a smooth approximation to find the optimal design. A novel precomputation algorithm reduces the computational burden, mak- ing the search for an optimal design feasible for bigger problems. The methods are demonstrated by finding new designs for a number of common models derived from differential equations, and by providing optimal designs for the placenta experiment.

Keywords: Approximate coordinate exchange algorithm; decision-theoretic design; Gaussian process emulation; nonlinear design.

∗ CONTACT Antony M. Overstall; A.M.Overstall@southampton.ac.uk; Southampton Statistical Sciences

Research Institute, University of Southampton, Southampton, SO17 1BJ, UK.

(2)

1 Introduction

The dynamics behind a complex physical process can often be described by a set of non-linear ordinary differential equations, where the solution to these equations represents the evolution of system states with respect to time. It is common for the system of equations to depend on some unknown physical properties (parameters) of the process in question and, potentially, on some additional controllable variables. In this paper, new methods are presented for designing experiments for the estimation of statistical models built on the solution to such a system of equations; that is, choosing the most informative combinations of time points and values of the controllable (design) variables at which observations of the physical process should be made. A decision-theoretic approach is adopted, and hence the quality of a design is measured via the expectation of a utility function chosen to encapsulate the aims of the experiment.

We assume equations with s system states u(t; x, θ) = [u 1 (t; x, θ), . . . , u s (t; x, θ)] ^T mod- eled as a function of time t and v design variables with values held in the treatment vector x ∈ X ⊂ R ^v . The p-vector θ ∈ Θ ⊂ R ^p holds the physical parameters requiring estimation.

For notational simplicity, the dependence of the system states on x and θ is usually sup- pressed, with u(t) = u(t; x, θ), unless multiple treatments or parameter vectors are being considered. We mostly find designs for initial value problems, with u(t) defined via equations







˙

u(t) = f (u(t), t, x; θ) for t ∈ T = [T ₀ , T ₁ ]; 0 ≤ T ₀ < T ₁ , u(T 0 ) = u 0 ,

(1) where ˙ u(t) is the gradient vector of u(t) with respect to time t, u ₀ = (u ₀₁ , . . . , u _0s ) ^T ∈ R ^s denotes initial conditions and, for given θ, f : R ^s × T × X → R ^s is a continuous function satisfying the Lipschitz condition (see Iserles, 2009, p. 3). This latter assumption ensures equation (1) has a unique solution.

Our research is motivated by experiments to study the transport of serine, an amino

acid, within a human placenta. Specifically, interest is in the movement of serine across a

placental cell membrane (called a vesicle). In the experiments, initial amounts (µl) of both

radioactive and non-radioactive serine are placed exterior and interior to the vesicle, and

then the amount of radioactive serine interior to the vesicle is measured at a series of time

points. The experimenters have control over initial amounts of both the interior and exterior

(3)

non-radioactive serine for each experiment, and the times (in seconds) at which observations are taken. The theoretical interior amounts of radioactive and non-radioactive serine at time t form the s = 2 system states, u(t) = [u ₁ (t), u ₂ (t)] ^T , with the v = 2 design variables, x = (x ₁ , x ₂ ) ^T ∈ [0, 1000] ² , being, respectively, the exterior amounts of radioactive and non- radioactive serine at time t = 0. The equations governing the evolution of the system states are

˙

u ₁ (t) = ^x

¹

^(u

²

^(t)+θ _u

?²

(u(t),t,θ,x) ^θ

⁴

^)−u

¹

^(t)(x

²

^+θ

²

^θ

³

⁾ ,

˙

u ₂ (t) = ^x

²

^(u

¹

^(t)+θ _u

?²

(u(t),t,θ,x) ^θ

⁴

^)−u

²

^(t)(x

¹

^+θ

²

^θ

³

⁾ , u ₁ (0) = u ₀₁ ,

u ₂ (0) = u ₀₂ ,



 

 

 

 

t ∈ [0, 600], (2)

where

u ^? (u(t), t, θ, x) = 1

θ ₁ {2x ^? ₁₂ u ^? ₁₂ (t) + (1 + θ ₂ ) [θ ₄ x ^? ₁₂ + θ ₃ u ^? ₁₂ (t)] + 2θ ₃ θ ₄ } ,

u ^? ₁₂ (t) = u ₁ (t) + u ₂ (t), x ^? ₁₂ = x ₁ + x ₂ , and initial conditions u ₀ = (u ₀₁ , u ₀₂ ) ^T ∈ [0, 1000] ² are the amounts of radioactive and non-radioactive serine interior to the vesicle at time t = 0.

Here, the four physical parameters correspond to the maximum uptake (θ ₁ ), the proportion of the reaction occurring through active transport (θ ₂ ) and two reaction rates (θ ₃ and θ ₄ ).

The values of these parameters are of scientific interest. See Panitchob et al. (2015) and Widdows et al. (2017) for further details of the model and experiment.

To model experimental data from a physical process governed by (1), we build a statistical model linking the physical parameters to noisy observations of the system states, or functions thereof, via an assumed data-generating process dependent on the solution to the equations (see, for example, Ramsay et al. 2007). We also assume that an experiment can be conducted where these observations are collected at various different times and, possibly, from multiple runs of the experiment with different combinations of values of the design variables. Let n denote the number of runs in the experiment, with the j th run being made for treatment x _j = (x _1j , . . . , x _vj ) ^T and initial conditions u _0j = (u _j01 , . . . , u _j0s ) ^T , with observations being made at time points t _j = (t _j1 , . . . , t _jn

_j

) ^T (j = 1, . . . , n). At each time point, observations y _jl ∈ Y _jl ⊂ R ^c are taken on c ≤ s different responses. Let y ^T _j = (y ^T _j1 , . . . , y ^T _jn

j

) be the cn _j - vector of observations from the jth run, and y = y ^T ₁ , . . . , y ^T _n T

be the vector of observations

from the whole experiment.

(4)

We describe the experimental data y using statistical model

y|θ, γ, d ∼ F (θ, γ; d) , (3)

with F a specified probability distribution, γ ∈ Γ ⊂ R ^q a q-vector of nuisance parameters, and d ∈ D a vector specifying the design, chosen from the space of possible designs D.

The dependence of (3) on physical parameters θ and design d is through the solution to equations (1). The most common form of this dependence, assumed in this paper, is via the expected response,

E y _jl |θ, x _j , t _jl

= g (u(t _jl ), θ) ,

with g : R ^s × Θ → Y _jl an assumed function. However, the methodology developed here is also immediately applicable to other types of dependency.

Here, we find designs for experiments where one or more of the treatments x 1 , . . . , x n , observation times t _j1 , . . . , t _jn

_j

, for j = 1, . . . , n, and initial conditions u ₀₁ , . . . , u _0n are under the experimenters’ control. In practice, some of these may be fixed by the protocol of the experiment. We also find designs where the initial conditions are unknown, and included in the vector of parameters.

In the human placenta experiment, the initial quantities of non-radioactive serine interior (u ₀₂ ) and exterior (x ₂ ) to the vesicle can be varied, with the initial quantities of radioac- tive serine (u 01 and x 1 ) fixed by the experimental protocol. The c = 1 observed response, y _jl , is the amount of interior radioactive serine at time t _jl (j = 1, . . . , n; l = 1, . . . , n _j ). A statistical model is assumed where E (y _jl |θ, x _j , t _jl ) = u ₁ (t _jl ; x _j , θ). Hence, for this experi- ment g(u, θ) = u ₁ . The design consists of n combinations of initial quantities of exterior and interior non-radioactive serine, x _2j and u _02j , along with corresponding observation times t j1 , . . . , t jn

j

; that is, d = [(x 21 , u 021 ) ^T , . . . , (x 2n , u 02n ) ^T , t ^T ₁ , . . . , t ^T _n ] ^T .

Previous research on optimal design for models formed as the solution of ordinary dif-

ferential equations has focussed on frequentist methods for models with additive normally

distributed errors, with a design selected that maximizes a function of the Fisher informa-

tion matrix for θ (e.g. Atkinson and Bogacka, 2002 and Rodr´ıguez-D´ıaz and S´ anchez-Le´ on,

2014). The inverse of the Fisher information matrix provides an asymptotic approximation

to the variance-covariance matrix for maximum likelihood estimators of θ. As is usual for

nonlinear models, the information matrix depends on the value of θ, which is uncertain

(5)

prior to the experiment. The most common methodology to overcome this dependence is the adoption of pseudo-Bayesian techniques, where a design is found that maximizes the expectation of the function of the information matrix with respect to a prior distribution for θ. Numerical methods are used to obtain the derivatives of the expected response with respect to θ that are necessary to obtain the information matrix. Most commonly, the “di- rect method” (Valko and Vajda, 1984) is employed, with an additional set of differential equations being defined that then also require numerical solution. Many developments in this area have occurred in the chemical engineering literature, labeled “model-based design of experiments”; see Franceschini and Macchietto (2008) for a review.

In contrast to the above approaches, in this paper, we present and apply the first methods for decision-theoretic Bayesian optimal design for models formed from ordinary differential equations. Although straightforward in principle, Bayesian optimal design faces a number of practical difficulties. Firstly, assessment of a given design requires evaluation of an expected utility depending on high-dimensional and typically intractable integrals. Secondly, max- imization of the expected utility presents a high-dimensional and stochastic optimization problem. See Ryan et al. (2016) and Woods et al. (2017) for recent reviews.

To address the high-dimensional optimization problem, we extend and apply the ap- proximate coordinate exchange (ACE) algorithm recently proposed by Overstall and Woods (2017). A brief description of the algorithm is provided in Section 4.1 and Appendix A.

The computational burden of optimal Bayesian design is exacerbated when the model evaluations (systems states) are only available as the numerical solution to the differential equations. In addition to increasing the computational expense of evaluating the expected utility, numerical solutions introduce an additional source of uncertainty through the nu- merical errors that result from finite discretization of the time interval T . We evaluate the expected utility by embedding within a Monte Carlo approximation scheme an adaption of the probabilistic solution to systems of differential equations proposed by Chkrebtii et al.

(2016); see Section 2. In essence, this approach accounts for uncertainty due to discretization

error by placing a joint Gaussian process prior on both the system states and time deriva-

tives, and predicts future system states by conditioning on the derivatives. In Section 3,

after introducing the foundations of Bayesian design, we propose innovative precomputation

of variance and covariance quantities that substantially reduces the computational burden of

(6)

incorporating the probabilistic solution into a Bayesian design strategy. Our approach makes it possible to search for multi-variable designs which would otherwise be computationally infeasible.

We demonstrate the effectiveness for optimal design of the combination of Monte Carlo approximation, probabilistic numerics and cyclic descent for a variety of exemplar models in Section 4. The differing complexities of the problems addressed showcase the flexibility of the methodology. In Section 5 we apply the methodology to a realistic statistical model for the human placenta example, based on the solution to (2), and compare to designs proposed by the experimenters. We find designs for the goals of parameter estimation and model selection, where the aim is to determine if a simpler model with θ ₃ = θ ₄ (i.e. the two reaction rates equal) is an adequate description for the data.

2 Probabilistic solutions to ordinary differential equations

When working with numerical models implemented via computer code, it has become stan- dard to build statistical approximations, or emulators, by performing a computer experiment to obtain model outputs at carefully selected input combinations. Most commonly, a Gaus- sian process (GP) prior is assumed to describe the output from the model, with the emulator formed from the updated posterior GP (conditioned on the model output from the computer experiment); see Sacks et al. (1989) and Santner et al. (2003). In contrast, central to the Chkrebtii et al. (2016) methodology is the adoption of a GP prior for the hth derivative function ˙ u _h (·), h = 1, . . . , s, defined via mean and covariance functions ˙ m _0h (·) and ˙ C ₀ (·, ·), where we assume a common covariance function for each of the s derivatives. Such a prior implies that for any finite collection of times t = (t ₁ , . . . , t _w ) ^T , the joint distribution of

˙

u _h (t) = [ ˙ u _h (t ₁ ), . . . , u ˙ _h (t _w )] ^T will be multivariate normal N ( ˙ m _0h (t), C ˙ ₀ (t, t)), with n-vector m ˙ 0h (t) having lth entry ˙ m 0h (t l ) and, for vector t ⁰ = (t ⁰ ₁ , . . . , t ⁰ _w

0

) ^T , w × w ⁰ matrix ˙ C 0 (t, t ⁰ ) having lkth entry ˙ C ₀ (t _l , t ⁰ _k ). A joint Gaussian process prior for both ˙ u _h (·) and the solution function u _h (·) then follows directly, implying the joint distribution





˙ u _h (t) u h (t ⁰ )



 ∼ N









˙ m _0h (t) m 0h (t ⁰ )



 ,





C ˙ ₀ (t, t) C ¯ ₀ (t, t ⁰ ) C ¯ 0 (t ⁰ , t) C 0 (t ⁰ , t ⁰ )







 ,

(7)

0 100 200 300 400 500 600

0 2 4 6 8

t (seconds) u

1

( t )

0 100 200 300 400 500 600

992 994 996 998 1000

t (seconds) u

2

( t )

Figure 1: Plots showing 1000 draws from the probabilistic solution of u ₁ (t) and u ₂ (t) against t for system of equations (2) that describe the transport of serine in a human placenta.

with w-vector m _0h (t) having lth entry m _0h (t _l ) = R t

l

0 m ˙ _0h (z) dz + u _0h , w × w ⁰ matrix C ₀ (t, t ⁰ ) having lkth entry C ₀ (t _l , t ⁰ _k ) = R t

l

0 R t

⁰_k

0 C ˙ ₀ (z, z ⁰ ) dzdz ⁰ , and w × w ⁰ cross-covariance matrix C ¯ ₀ (t, t ⁰ ) having lkth entry ¯ C ₀ (t _l , t ⁰ _k ) = R t

⁰_k

0 C ˙ ₀ (t _l , z) dz; see also Solak et al. (2003) and Holsclaw et al. (2013). Hence, solution vector u h (t) = [u h (t 1 ), . . . , u h (t n )] ^T follows the mul- tivariate normal distribution N (m _0h (t), C ₀ (t, t ⁰ )). Note that definition of the covariance function of u _h (t) via integration ensures C ₀ (0, 0) = 0 and hence enforces the boundary condition u _h (0) = u _0h .

For a given x and θ, this prior distribution can be updated using derivative evaluations on a grid τ = (τ ₁ , . . . , τ _N ) ^T of time points via Algorithm 1 by sequentially conditioning on f (u, τ _r+1 , x; θ) calculated for solution state u _h sampled from the posterior distribution at point τ r . The final marginal Gaussian process for u h (t) has mean and covariance functions given by

m N h (t) = u 0h + ¯ C 0 (t, τ )B N F N e h , C N (t, t ⁰ ) = C 0 (t, t ⁰ ) − C ¯ 0 (t, τ )B N C ¯ 0 (τ , t ⁰ ) , for h = 1, . . . , s, where e _h is the hth unit vector, and the N ×s matrix of derivative evaluations F _N and the updated N × N derivative covariance matrix B _N are defined as in Algorithm 1.

Chkrebtii et al. (2016) allowed covariance function ˙ C ₀ (t, t ⁰ ) to depend on hyperparam-

eters controlling the scale and length of the covariances. Given experimental data, a joint

posterior distribution for the model parameters and hyperparameters can be sampled by

embedding the probabilistic solution to the differential equations within a Markov chain

Monte Carlo scheme. Chkrebtii et al. (2016) also suggested possible fixed values for the

(8)

Algorithm 1: Sequential updating and sampling for time points t = (t ₁ , . . . , t _w ) ^T of the joint Gaussian process for the derivative and solution for the s system states for initial values u ₀ , treatment vector x, physical parameters θ and evaluation grid τ = (τ ₁ , . . . , τ _N ) ^T , with τ ₁ = T ₀ . (Adapted from Chkrebtii et al., 2016).

1 Set Λ 1 = 0 and f ₁ = f (u 0 , T 0 , x; θ)

2 for r = 1, . . . , N − 1 do (a) Set τ _r = (τ ₁ , . . . , τ _r ) ^T (b) Compute

B _r = ( ˙ C ₀ (τ _r , τ _r ) + Λ _r ) ⁻¹ a _r = B _r C ¯ ₀ (τ _r , τ _r+1 )

C r = C 0 (τ r , τ r ) − C ¯ 0 (τ r+1 τ r )B r C ¯ 0 (τ r , τ r+1 )

C ˙ _r+1 = ˙ C ₀ (τ _r+1 , τ _r+1 ) − C ˙ ₀ (τ _r+1 , τ _r )B _r C ˙ ₀ (τ _r , τ _r+1 ) Λ _r+1 = diag{Λ _r , C ˙ _r+1 }

(c) Compute

m _r = u ₀ + F ^T _r a _r , where F _r is the r × s matrix with kth row f _k (k = 1, . . . , N − 1) (d) Sample

u(τ _r+1 ) ∼ N (m _r , C _r I _S ) and compute

f _r+1 = f (u(τ _r+1 ), τ _r+1 , x; θ)

3 Compute

B _N = ( ˙ C ₀ (τ _N , τ _N ) + Λ _N ) ⁻¹ A _N (t) = B _N C ¯ ₀ (τ , t)

M _N (t) = 1 _n ⊗ u ^T ₀ + A ^T _N (t)F _N , with 1 _n the n-vector with all entries equal to one and F _N the N × s matrix with kth row f _k (k = 1, . . . , N )

C _N (t, t) = C ₀ (t) − C ¯ ₀ (t, τ )B _N C ¯ ₀ (τ , t)

4 For h = 1, . . . , s, sample

u h (t 1 ), . . . , u h (t n ) ∼ N (M N (t)e h , C N (t, t)), where e h is the hth unit vector

(9)

covariance hyperparameters. In Section 3.2 we demonstrate the computational savings that can be achieved for optimal design via precomputing of various posterior quantities when these parameters are fixed.

Figure 1 presents 1000 draws from probabilistic solutions for the placenta example fol- lowing equations (2). Updated Gaussian processes for u ₁ (t) and u ₂ (t) were generated using Algorithm 1 assuming a squared exponential covariance function for ˙ C(t, t ⁰ ) (see Rasmussen and Williams, 2006, p. 83). An evaluation grid τ with N = 501 evenly spaced time points was used, and the solution sampled for time t ∈ [T ₀ , T ₁ ] = [0, 600] seconds with physical parame- ters θ = (200, 0.05, 100, 100) ^T , initial values u ₀ = (0, 1000) ^T and treatment x = (7.5, 1000) ^T . Note how the uncertainty in the solution increases as t increases away from t = T ₀ = 0 where we know, in this example, the true value of u(t).

3 Bayesian design for ordinary differential equation models

3.1 Decision-theoretic Bayesian optimal design

Design of experiments fits naturally within a Bayesian framework, with the decision on what design d to employ made before the data is collected. Hence it is natural to use available prior information to inform this choice. This information includes the form of statistical model (3) including any underpinning physical theory, for example, as encapsulated in equations such as (1). It also includes any prior information on the values of the parameters θ, γ, captured via a prior density π(θ, γ), which we assume is independent of the design.

A decision-theoretic Bayesian optimal design, d ^? , maximizes the expectation of a specified utility function φ(θ, y, d) with respect to the unknowns prior to experimentation,

Φ(d ^? ) = max

d∈D E [φ(θ, y, d)|d]

= max

d∈D

Z

Θ.Y

φ(θ, y, d)π(θ, y|d) dθ dy ,

where the joint distribution of the unknown physical parameters and responses, conditional on the design used for data collection, can be decomposed as

π(θ, y|d) = Z

Γ

π(y|θ, γ, d)π(θ, γ) dγ ,

(10)

and hence, when regarded as a function of θ alone, is proportional to the posterior density.

See the seminal review paper by Chaloner and Verdinelli (1995).

The function φ(θ, y, d) quantifies the utility, relative to the aims of the experiment, for choosing design d when we obtain data y under physical parameters θ. Its choice should reflect the goals of the experiment. Here, we apply the following exemplar utility functions:

1. Negative squared error loss (NSEL) for estimation of θ:

φ(θ, y, d) = −kθ − E(θ|y, d)k ² ₂ ,

with k·k _p denoting the l _p -norm and E(θ|y, d) the posterior mean, where expectation is taken with respect to the marginal density π(θ|y, d) = R

Γ π(θ, γ|y, d) dγ. It can be shown that the expected utility simplifies to

Φ(d) = − Z

Y

tr {var(θ|y, d)} π(y|d) dy ,

the negative expected value of the posterior variance-covariance matrix for θ with respect to the marginal distribution of the response.

2. Negative absolute error loss (NAEL) for estimation of θ:

φ(θ, y, d) = −kθ − Med(θ|y, d)k ₁ ,

with Med(θ|y, d) the vector of marginal posterior medians of the physical parameters.

3. Shannon information gain (SIG) for θ:

φ(θ, y, d) = log π(y|θ, d) − log π(y|d) , (4) where

π(y|d) = Z

Θ

π(y|θ, d)π(θ) dθ , π(y|θ, d) = Z

Γ

π(y|θ, γ, d)π(γ) dγ .

Maximizing the expectation of (4) is equivalent to maximizing the expected Kullback- Liebler divergence between the prior and posterior distributions (Chaloner and Verdinelli, 1995).

For the human placenta example, we also employ two bespoke utility functions tailored to

the problems of point estimation and model selection.

(11)

4. 0-1 utility for estimation of θ:

φ(θ, y, d) = 1 Θ ^ˇ [E(θ|y, d)] , with 1 Θ ^ˇ the indicator function for the product set

Θ = ˇ

p

Y

i=1

Θ ˇ _i =

(ˇ θ ₁ , . . . , θ ˇ _p ) | θ ˇ _i ∈ Θ ˇ _i ∀i ∈ {1, . . . , p} ,

where ˇ Θ _i = θ ˇ | θ _i − δ _i ≤ θ ˇ ≤ θ _i + δ _i , and δ = (δ ₁ , . . . , δ _p ) ^T is a specified tolerance vector. That is, the utility is equal to 1 if, for all i = 1, . . . , p, the ith element of the posterior mean vector E(θ|y, d) lies within δ _i of the corresponding element of θ.

For the final utility function considered we redefine the utility as a function of the chosen model m ∈ M.

5. 0-1 utility for model selection:

φ(m, y, d) = 1 m (m ^? ) ,

where 1 m is the indicator function for the singleton set with element m and m ^? ∈ arg max _m∈M π(m|y) is the model with maximum posterior probability. For this utility, the expected utility is given by

Φ(d) = X

m∈M

π(m) Z

Y

φ(m, y, d)π(y|m, d) dy ,

with π(y|m, d) = R

Θ

^(m)

R

Γ

^(m)

π(y|θ ^(m) , γ ^(m) , m, d)π(θ ^(m) , γ ^(m) |m) dγ ^(m) dθ ^(m) , and θ ^(m) ∈ Θ ^(m) and γ ^(m) ∈ Γ ^(m) physical and nuisance parameters, respectively, for model m.

A barrier to the application of Bayesian design for most nonlinear models, including those

considered in this paper, is the analytic intractability of both the utility function (which typi-

cally depends on posterior quantities) and expected utility. Numerical methods are therefore

required, with a double-loop Monte Carlo approximation being commonly employed (Ryan,

2003). Such an approach uses an “inner” Monte Carlo sample of size ˜ B to approximate

any necessary posterior quantities, and then an “outer” Monte Carlo sample of size B to

approximate the expected utility with respect to the joint distribution of y and θ; see also

Overstall and Woods (2017).

(12)

We use the approximation

Φ(d) = ˆ 1 B

B

X

b=1

φ(θ ˆ _b , y _b , d) , (5)

with {θ _b , y _b } ^B _b=1 a first (outer) sample from the joint distribution of the physical parameters and response, and ˆ φ(θ, y, d) a further Monte Carlo approximation to the utility function.

Each of the utility functions above can be approximated using a second (inner) Monte Carlo sample n

θ ˜ ˜ b , γ ˜ ˜ b

o B ^˜

˜ b=1

from distribution with density π(θ, γ):

1. NSEL:

φ(θ, ˆ y, d) = −kθ − E(θ|y, ˆ d)k ² ₂ , for an importance sampling estimate of E(θ|y, d),

E(θ|y, ˆ d) =

B ˜

X

˜ b=1

w ˜ b θ ˜ ˜ b , (6)

with

w ˜ b = π(y| θ ˜ ˜ b , γ ˜ ˜ b , d) P B ^˜

˜ b=1 π(y| θ ˜ ˜ b , γ ˜ ˜ b , d)

. (7)

See Ryan et al. (2016) and references therein.

2. NAEL:

φ(θ, ˆ y, d) = −kθ − Med(θ|y, d d)k ₁ ,

with vector Med(θ|y, ˆ d) having ith entry Med d _i (θ|y, d) = (˜ θ _i(z) + ˜ θ _i(z+1) )/2 (i = 1, . . . , p), where ˜ θ _i(1) ≤ · · · ≤ θ ˜ _{l( ˜} _B) are the ordered values taken by the ith element of the sample n

θ ˜ ˜ b

o B ^˜

˜ b=1

, z = max{l = 1, . . . , B| ˜ P l

˜ b=1 w _i(˜ _b) ≤ 0.5} and the w _i(˜ _b) are the weights (7) ordered according to θ _i(˜ _b) .

3. SIG:

φ(θ, ˆ y, d) = log ˆ π(y|θ, d) − log ˆ π(y|d) , with

π(y|d) = ˆ 1 B ˜

B ˜

X

˜ b=1

π(y| θ ˜ ˜ b , γ ˜ ˜ b , d) , π(y|θ, ˆ d) = 1 B ˜

B ˜

X

˜ b=1

π(y|θ, γ ˜ ˜ b , d) .

(13)

4. 0-1 estimation:

φ(θ, ˆ y, d) = 1 Θ ^ˇ

h E(θ|y, ˆ d) i ,

for ˆ E(θ|y, d) once again the importance sampling estimate (6) of the posterior mean.

5. 0-1 model selection:

φ(m, ˆ y, d) = 1 m ( ˆ m ^? ) , where ˆ m ^? ∈ arg max _m∈M π(m) P B ^˜

˜ b=1 π(y| ˜ θ ^(m) ˜ b , γ ˜ _˜ ^(m)

b , m, d)/ B ˜ for n

θ ˜ ˜ ^(m) b , γ ˜ ^(m) _˜

b

o B ^˜

˜ b=1 a sam- ple from the prior distribution under model m with density π(θ ^(m) , γ ^(m) |m).

The above Monte Carlo approximations ˆ φ to the utility functions will introduce some bias into the approximation of the expected utility, as the utilities are nonlinear functions of posterior quantities. In general, this bias will be inversely proportional to the value of ˜ B, and hence can be made negligible for large inner samples.

3.2 Extensions to ordinary differential equation models

To apply the methodology outlined in the previous section to models built from systems of ordinary differential equations requires incorporation of further steps to account for dis- cretization errors in the numerical solution to the equations, and to mitigate the additional computational cost of multiple evaluations of the numerical solution. The approximations to the expected utilities require repeated sampling of y from distribution (3), and evaluation of the corresponding density function π(y|θ, γ, d). When the distribution of y depends on the solution vector, the approximations require at least B + ˜ B evaluations of a numerical solution to u(t _jl ; x _j , θ) for each j = 1, . . . , n and l = 1, . . . , n _j . In addition to the computational cost of these repeated evaluations, the necessary discretization of the time domain by a numerical solver introduces an additional source of uncertainty that should be accounted for in both the design of the experiment and the subsequent inference.

The probabilistic solution of Chkrebtii et al. (2016), outlined in Section 2, fits naturally

within a Monte Carlo approximation of the expected utility; for each generated value of

the physical parameters θ, a solution path for u(t) is generated from an updated Gaussian

process. The uncertainty introduced by the discretization of time is quantified, and updated,

via the joint Gaussian process prior for the time derivatives and solution. Algorithm 2

(14)

outlines the steps in generating an approximation to a general utility function φ using double loop Monte Carlo. As given, Algorithms 1–2 depend on the initial values u _0j for the jth treatment, i.e. the initial values are assumed known. In some situations, learning unknown initial values may be part of the inference problem, i.e. prior distributions are assumed and updated to a posterior distribution in light of the experimental responses. This case can be incorporated into these algorithms by replacing all occurrences of u 0j by a value u 0jb

generated from the prior distribution in Algorithm 2, in an analogy to how the physical parameters θ are handled.

Algorithm 2: Evaluation of the approximate expected utility ˆ Φ(d) when the distribu- tion of the response depends on the solution to a ordinary differential equation.

1 for ˜ b = 1, . . . , B ˜ do

Sample (˜ θ ^T ˜ b , γ ˜ _˜ ^T _b ) ^T ∼ π(θ, γ) (the prior distribution) for j = 1, . . . , n do

for l = 1, . . . , n _j do

Sample u _s (t _jl ; x _j , θ ˜ ˜ b ) using Algorithm 1

2 for b = 1, . . . , B do

Sample (θ ^T _b , γ ^T _b ) ^T ∼ π(θ, γ) (the prior distribution) for j = 1, . . . , n do

for l = 1, . . . , n j do

Sample u _s (t _jl ; x _j , θ _b ) using Algorithm 1 Sample y _j |θ _b , γ _b , d ∼ F(θ _b , γ _b ; d)

Calculate ˆ φ(θ _b , y _b , d) using the inner sample generated in step 1

3 Calculate ˆ Φ(d) = _B ¹ P B

b=1 φ(θ ˆ _b , y _b , d)

Naive implementation of Algorithm 2 for approximating the expected utility presents a considerable computational challenge, with the matrix computations in steps 2(b) and 3 of Algorithm 1 being undertaken ˜ n(B + ˜ B) times, with ˜ n = P n

j=1 n _j . In particular, calculation of matrix B _N requires inversion of an N × N matrix. This leads to an algorithm with computational complexity O(˜ nN ³ (B + ˜ B)).

To reduce the computational cost of the algorithm, we can compromise on the choice

of covariance function ˙ C ₀ (t, t ⁰ ). Rather than tune the covariance through the selection of

different parameter values for each choice of x and θ, we can fix these parameters (e.g. fol-

(15)

lowing recommendations in Chkrebtii et al. (2016); see Section 4 for our choices). This allows precomputation of various covariance matrices and vectors, see Algorithm 3. Such precom- putation alleviates the need to invert B _N when sampling u(t), reducing the computational complexity of the approximation to O(N ³ + ˜ nN ² (B + ˜ B )).

In fact, this precomputation can be performed just once, prior to any optimization routine being called. Hence for large experiments and Monte Carlo sample sizes, the computational complexity of the precomputation is essentially fixed, and the complexity of the approxima- tion within the optimization becomes O(˜ nN ² (B + ˜ B )). This computational savings makes the optimization feasible for experiment sizes, evaluation grids and Monte Carlo sample sizes for which designs could not otherwise be found.

Algorithm 3: Precomputation of variances C _r , ˙ C _r+1 , B _r and covariances a _r for eval- uation grid τ = (τ ₁ , . . . , τ _N ) ^T

1 Set Λ ₁ = 0

2 for r = 1, . . . , N − 1 do (a) Set τ _r = (τ ₁ , . . . , τ _r ) ^T (b) Compute

B _r = ( ˙ C ₀ (τ _r , τ _r ) + Λ _r ) ⁻¹ a r = B r C ¯ 0 (τ r , τ r+1 )

C _r = C ₀ (τ _r , τ _r ) − C ¯ ₀ (τ _r+1 , τ )B _r C ¯ ₀ (τ _r , τ _r+1 )

C ˙ _r+1 = ˙ C ₀ (τ _r+1 , τ _r+1 ) − C ˙ ₀ (τ _r+1 , τ _r )B _r C ˙ ₀ (τ _r , τ _r+1 ) Λ _r+1 = diag{Λ _r , C ˙ _r+1 }

3 Compute B _N =

C ˙ ₀ (τ _N , τ _N ) + Λ _N −1

4 Examples

4.1 Preliminaries

In this section we demonstrate the Bayesian design methodology for three common examples of models formed from the solution of ordinary differential equations:

1. a compartmental model (Section 4.2);

2. a model formed from the FitzHugh-Nagumo equations (Section 4.3);

(16)

3. a model of the JAK-STAT mechanism (Section 4.4).

For each, we use the methodology in Section 3.2 to approximate expected utilities for parameter estimation. Bayesian optimal (or near optimal) designs are found by embedding these Monte Carlo approximations within the ACE algorithm (Overstall and Woods, 2017).

The ACE algorithm is a cyclic descent, or coordinate exchange, algorithm (see Meyer and Nachtsheim, 1995 and Lange, 2013, p. 171) that performs a sequence of conditional maxi- mizations for each element (coordinate) of d in turn, keeping all other elements fixed. Each of these one-dimensional maximizations is performed by constructing a Gaussian process smoother, or emulator, for the Monte Carlo approximation as a function of the coordinate.

Use of an emulator alleviates both the computational burden and lack of smoothness asso- ciated with the Monte Carlo approximations. This algorithm extends the optimal design via curve fitting methods originally presented by M¨ uller and Parmigiani (1996) to high- dimensional design problems. The ACE algorithm is outlined in Appendix A and imple- mented in the acebayes R package (Overstall et al., 2018b, Overstall et al., 2018c), available on CRAN.

To employ the probabilistic solution to the ordinary differential equations, a choice of covariance function is required for the Gaussian process prior on the derivative functions.

The choice of covariance function should be determined by the assumed smoothness of the solutions u _h (t). Chkrebtii et al. (2016) suggested two covariance functions, the squared exponential covariance

C ˙ ₀ (t, t ⁰ ) = √

πα ⁻¹ λ exp

−(t − t ⁰ ) ² /4λ ² , (8) which is infinitely differentiable and hence suitable for smooth solutions, and the piecewise linear uniform covariance

C ˙ ₀ (t, t ⁰ ) =



 

 

α ⁻¹ {min(t, t ⁰ ) − max(t, t ⁰ ) + 2λ} for {max(t, t ⁰ ) − min(t, t ⁰ )} /2 > λ ,

0 otherwise ,

(9)

where α, λ > 0. This latter function is non-differentiable and hence suited to non-smooth

solutions. We employ these two functions, with fixed values of α and λ to facilitate the

precomputation outlined in Section 3.2. Throughout, we assume the probabilistic solution is

calculated on a grid τ = (τ ₁ , . . . , τ _N ) ^T of equally-spaced points and, unless otherwise stated,

set α = N and λ = 4(τ N − τ 1 )/N .

(17)

The Supplementary Material contains an R package called aceodes and a vignette. The vignette describes how aceodes can be used to reproduce the designs found in the remainder of this section and in Section 5.

4.2 Compartmental model

In pharmacokinetics studies, compartmental models are used to describe the distribution of a drug inside a living body. Such models have been routinely used to demonstrate optimal experimental design methodology (see, for example, Atkinson et al., 1993, Ryan et al., 2014, and Overstall and Woods 2017). To compare designs found using the probabilistic solution to designs found using an exact solution, we use a simple example where an analytical solution to the differential equations is available. An open one-compartment model is considered with first-order absorption, described by the following system of s = 2 ordinary differential equations for t ∈ [0, 24] hours:

˙

u ₁ (t) = −θ ₁ u ₁ (t) ,

˙

u ₂ (t) = (θ ₂ /θ ₃ )u ₁ (t) − θ ₂ u ₂ (t) , u(0) = (D, 0) ^T ,

where u ₁ (t) and u ₂ (t) are respectively the amounts of drug outside and inside the body, D is the known initial dose, and θ = (θ ₁ , θ ₂ , θ ₂ ) ^T are unknown parameters.

These equations define a homogeneous linear system with constant coefficients, resulting in the analytical solution

u ₁ (t) = D exp (−θ ₁ t) , u ₂ (t) = Dθ ₂

θ ₃ (θ ₂ − θ ₁ ) (exp(−θ ₁ t) − exp(−θ ₂ t)) . (10) Following Ryan et al. (2014), we assume D = 400 and log θ i ∼ N(µ i , 0.05), indepen- dently, for l = 1, 2, 3, with (µ ₁ , µ ₂ , µ ₃ ) ^T = (log 0.1, log 1, log 20) ^T . The amount of drug inside the body, y _l , is observed at observation time t _l , and is modeled through assuming y _l ∼ N (u ₂ (t _l ), σ ² + τ ² u ₂ (t _l ) ² ), independently, where σ ² = 0.1 and τ ² = 0.01. The choice of design here only involves selecting n = 15 observation times: t ₁ , . . . , t _n . We impose the prac- tically realistic constraint that the observation times have to be at least 15 minutes apart.

Such a constraint is straightforward to incorporate into the ACE algorithm (see Overstall

and Woods, 2017).

(18)

When applying the probabilistic solution, we assume squared exponential covariance (8) as the functions u(t) are known to be smooth and a discrete evaluation grid, τ , with N = 501.

For each of the NSEL, NAEL and SIG utility functions from Section 3.1, we compare designs found under the exact and probabilistic solutions using ACE to a uniform design with n = 15 equally-spaced time points in [0, 24] hours. Figure 2 presents boxplots of twenty evaluations of the Monte Carlo approximation to the expected utility for the uniform design and the optimal design found for each utility. There is negligible difference between the designs found under the exact and probabilistic solutions, and these designs are clearly superior to the uniform design. Figure 2 also gives the observation time points from each design being compared. The optimal designs appear to favor observation times near the peak of u ₂ (t), at t ≈ 2.5 hours, and then a series of observation times towards the end of the time interval. The optimal design under SIG has two distinct sets of points just before and after the maximum of u 2 (t), whereas the designs under NSEL and NAEL have just one set of points, generally occurring just after the peak response.

4.3 FitzHugh-Nagumo equations

The FitzHugh-Nagumo equations (FitzHugh, 1961 and Nagumo et al., 1962) describe the behavior of spike potential in the giant axon of squid neurons:

˙

u ₁ (t) = θ ₃ [u ₁ (t) − u ₁ (t) ³ /3 + u ₂ (t)] ,

˙

u ₂ (t) = − [u ₁ (t) − θ ₁ + θ ₂ u ₂ (t)] /θ ₃ , u(0) = (−1, 1) ^T ,

where u ₁ (t) is the voltage across the axon membrane, u ₂ (t) is the recovery variable giving a summary of outward current, θ = (θ ₁ , θ ₁ , θ ₃ ) ^T , and t ∈ [0, 20]ms. These equations cannot be solved analytically.

We assume an experiment that measures the voltage, y _l , at time t _l , for l = 1, . . . , n. Fol- lowing Ramsay et al. (2007), y i ∼ N (u 1 (t i ), σ ² ), independently, where σ ∼ Uniform[1/2, 1].

A priori, we assume θ ₁ , θ ₂ ∼ Uniform[0, 1] and θ ₃ ∼ Uniform[1, 5].

As noted by Ramsay et al. (2007), the solution to the FitzHugh-Nagumo equations can

alternate between smooth evolution and sharp changes of direction. Hence, we employ

uniform covariance (9) for the probabilistic solution. The evaluation grid has size N = 200.

(19)

●

●●

●

Exact Probabilistic Uniform

3.8 4.0 4.2 4.4

Shannon information gain

Expected SIG

●

Exact Probabilistic Uniform

−1.6 −1.4 −1.2 −1.0 −0.8

Negative squared error loss

Expected NSEL

Exact Probabilistic Uniform

−1.10 −1.05 −1.00 −0.95 −0.90 −0.85 −0.80

Negative absolute error loss

Expected NAEL

0 5 10 15 20

0 5 10 15 20 25 30

Time (h) Amount of dr ug, u

2

( t )

● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

SIG − exact SIG − approximate

NSEL − exact NSEL − approximate

NAEL − exact NAEL − approximate

Uniform

Figure 2: Results from the compartmental model in Section 4.2. Top row: boxplots of

20 evaluations of the Monte Carlo approximation to the expected utility for the uniform

design and the optimal designs (for the exact and probabilistic solution) found under three

different utility functions. Bottom plot: design points from each of the optimal designs and

the uniform design, along with 100 draws from the exact solution, u 2 (t), giving the amount

of drug at time t, for values drawn from the prior distribution of θ.

(20)

●

Optimal Uniform

3.30 3.35 3.40 3.45 3.50 3.55

Shannon information gain

Expected SIG

Optimal Uniform

−0.36 −0.32 −0.28 −0.24

Negative squared error loss

Expected NSEL

Optimal Uniform

−0.66 −0.64 −0.62 −0.60 −0.58

Negative absolute error loss

Expected NAEL

0 5 10 15 20

−2 −1 0 1 2

Time (ms) V oltage , u

1

( t )

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

●

SIG NSEL NAEL

●

Uniform

Figure 3: Results from the FitzHugh-Nagumo equations in Section 4.3. Top row: boxplots of 20 evaluations of the Monte Carlo approximation to the expected utility for the uniform design and the optimal designs found under three different utility functions. Bottom plot:

design points from each of the optimal designs and the uniform design, along with 100 draws

from the probabilistic solution, u 1 (t), giving the voltage at time t, for values drawn from the

prior distribution of θ.

(21)

The design consists of the n = 21 observation times, t ₁ , . . . , t _n . Similarly to Section 4.2, we stipulate that the observation times must be at least 0.25ms apart, and find designs under the NSEL, NAEL and SIG utility functions. We compare these optimal designs to a uniform design with n equally spaced points in [0, 20]ms. Figure 3 presents boxplots of twenty evaluations of the Monte Carlo approximation to the expected utility for the uniform design and the optimal designs found via ACE under each utility function. In each case, there is a clear improvement to be made over using the uniform design. Also shown in Figure 3 are the four designs under comparison, along with realizations drawn from the solution u ₁ (t). Both the NSEL and NAEL optimal designs have a substantial number of observations near the beginning of the experiment. Both these designs have around one- third of their observation times before 2.5ms; the SIG and uniform designs only make three observations before this time. A feature of all of the optimal designs is that they make no observations between about 2.5 and 6ms, where the voltage is expected to rapidly decrease.

The remaining observation times are close to being evenly spaced. The initial phase of high frequency observations provides information about the steep increase in voltage for small t. The remaining observation times aid efficient parameter estimation, occurring within an interval within which different parameter values can produce very different model solutions.

4.4 JAK-STAT mechanism

Chkrebtii et al. (2016), and authors referenced therein, considered Bayesian inference for the JAK-STAT mechanism. A system of s = 4 equations describes changes in the biochemical reaction states of STAT-5 transcription factors that occur in response to binding of the Erythropoietin hormone to cell surface receptors (Pellegrini and Dusanter-Fourt, 1997):

˙

u 1 (t) = −θ 1 u 1 (t)κ(t) + 2θ 4 u 4 (t − ω) , 

 

 

 

 

t ∈ [0, 60] seconds ,

˙

u ₂ (t) = θ ₁ u ₁ (t)κ(t) − θ ₂ u ₂ (t) ² ,

˙

u ₃ (t) = −θ ₃ u ₃ (t) + ¹ ₂ θ ₂ u ₂ (t) ² ,

˙

u ₄ (t) = θ ₃ u ₃ (t) − θ ₄ u ₄ (t − ω) ,

u(t) = (u ₀₁ , 0, 0, 0) ^T , t ∈ [−ω, 0] ,

with u ₀₁ ≥ 0 unknown and κ(t) an unknown forcing function. The transcription states return

to the initial state after gene activation in the cell nucleus, modeled via the unknown time

delay ω ≥ 0. This system is an example of a delay initial function problem.

(22)

Swameye et al. (2003) conducted an experiment that made measurements on the nonlin- ear transformation of the states given by

g(u, θ) =







θ ₅ (u ₂ + 2u ₃ ) θ ₆ (u ₁ + u ₂ + 2u ₃ )

u ₁ u ₃ /(u ₂ + u ₃ )







=







g ₁ (u, θ) g ₂ (u, θ) g ₃ (u, θ) g ₄ (u, θ)





 .

The experiment made n = 16 (noisy) observations on g ₁ and g ₂ at times t ₁ , . . . , t ₁₆ , one observation on each of g ₃ and g ₄ at t = 0 and t = t ^? , respectively. The design (choices of time points) used in the experiment reported by Swameye et al. (2003) are given in Figure 4.

The following statistical model is assumed

(y _1l , y _2l ) ^T ∼ N [g ₁ (u(t _l ), θ), g ₂ (u(t _l ), θ)] ^T , A _l , y ₃ ∼ N g ₃ (u(0), θ), σ ² ₃

, y ₄ ∼ N g ₄ (u(t ^? ), θ), σ ² ₄ , independently, for l = 1, . . . , n, where A _l = diag {σ _1l ² , σ _2l ² }.

We design a follow-up experiment using information from this previous study, and choose values of t ₁ , . . . , t _n and t ^? to maximize different expected utilities assuming, for simplicity, a single observation of y ₃ will also be made at t = 0 (as in the original experiment). We use the posterior distributions from Chkrebtii et al. (2016) as priors for θ, ω and u ₀₁ . These authors assumed the variance parameters were fixed. Instead, we assume σ ² _1l = σ ₁ ² , σ ² _2l = σ ² ₂ , for all l = 1, . . . , n, and σ ₁ , σ ₂ ∼ Uniform[0, 0.1], σ ₃ ∼ Uniform[0, 20] and σ ₄ ∼ Uniform[0, 0.1].

These prior distributions are consistent with the experimentally determined values used for previous analyses (see Raue et al., 2009). The forcing function κ(t) is assumed unknown but has been measured at 16 time points. We follow Chkrebtii et al. (2016) and assume these measurements are made without error and interpolate with a Gaussian process to allow a probabilistic prediction of κ(t) for any t ∈ [0, 60].

The nature of the delay initial function problem introduces an added complexity to our

implementation of the probabilistic solution. At the end of step 2 of Algorithm 1, we compute

f _r+1 = f (u(τ _r+1 ), τ _r+1 , θ _b ). For this example, to compute f _r+1 , we require u ₄ (τ _r+1 − ω _b ),

where ω _b is a value generated from the prior distribution of ω. If τ _r+1 − ω _b ≤ 0, then

u ₄ (τ _r+1 − ω _b ) = 0 as specified by the initial conditions of the system of equations. For

τ r+1 − ω b > 0, the conditional distribution of u 4 (τ r+1 − ω b ) can be derived in the probabilistic

(23)

solution of Chkrebtii et al. (2016) and a value for u ₄ generated. However, this will be computationally expensive to incorporate in the implementation of the probabilistic solution described in Section 3.2 and would prevent the precomputation in Algorithm 3. Hence, if τ _r+1 − ω _b > 0, we replace u ₄ (τ _r+1 − ω _b ) by u ₄ (τ _¯ _r ), where ¯ r = arg min _r

⁰

_=1,...,r+1 |τ _r+1 − ω _b − τ _r

⁰

|, i.e. from the series of u ₄ (τ ₁ ), . . . , u ₄ (τ _r+1 ) values generated in step 2 thus far, we choose the value for the time τ r ¯ that is closest in absolute value to τ r+1 − ω i .

We employ uniform covariance (9) as the time delay can cause discontinuities in the derivative, as noted by Chkrebtii et al. (2016). The evaluation grid, τ , has size N = 500, and the auxiliary parameters are set to λ = 0.085 and α = 8000, consistent with the posterior distribution from the original analysis.

We use the methodology from Section 3.2 and the ACE algorithm to find designs that maximize each of the NSEL, NAEL and SIG utilities. We compare these designs to the original design used by Swameye et al. (2003). As in the previous examples, we introduce the constraint that the observation times need to be at least 1 second apart, a requirement also satisfied by the original experiment. Figure 4 presents boxplots of twenty evaluations of the Monte Carlo approximation to the expected utility for the original design and the optimal designs found under each utility function. Once again, in each case, the optimal designs are considerably more efficient. Also shown in Figure 4 are the four designs under comparison. The optimal designs favor having a dense set of points early in the observation window, and then a smaller set of times near the end of the experiment. This is especially true for the designs under NSEL and NAEL where 75% of the observation times occur before t = 15 seconds, compared to about 60% for SIG design and 50% for the original design. Early observation times provide information about the peak in g ₁ and the sharp decrease in g ₂ at about 10 seconds. For the single observation time, t ^∗ , on g ₄ , the optimal designs clearly favor making a very early observation. Note that t ^∗ for each of the optimal designs is between 1 and 2 seconds.

5 Application: transport of serine across human placenta

We now use the methodology in Section 3 to redesign the experiment for the human placenta

study introduced in Section 1. The experimental protocol specifies fixed initial amounts of

(24)

●

Optimal Original

5.5 6.0 6.5

Shannon information gain

Expected SIG

Optimal Original

−0.25 −0.20 −0.15 −0.10

Negative squared error loss

Expected NSEL

Optimal Original

−0.50 −0.45 −0.40 −0.35 −0.30

Negative absolute error loss

Expected NAEL

0 10 20 30 40 50 60

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Time (s) G

1

●

● ●

● ●● ●● ●●

●● ●●● ●

●●●●●●●●●●● ● ● ● ● ●

0 10 20 30 40 50 60

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Time (s) G

2

●

● ●

● ●● ●● ●●

●● ●●● ●

●●●●●●●●●●● ● ● ● ● ●

0 10 20 30 40 50 60

0.0 0.2 0.4 0.6 0.8 1.0 1.2

Time (s) G

4

●

● ●

Shannon information gain Negative squared error loss Negative absolute error loss Original

Figure 4: Results from the JAK-STAT example in Section 4.4. Top row: boxplots of 20

evaluations of the Monte Carlo approximation to the expected utility for the original design

and the optimal designs found under three different utility functions. Bottom row: design

points from each of the optimal designs and the original design at which noisy observations

of g 1 (left), g 2 (center), g 4 (right) are made, along with 100 draws from g 1 , g 2 and g 4 , at time

t, for values drawn from the prior distribution of θ, u ₀₁ and ω.

(25)

radioactive serine interior (u ₀₁ ) and exterior (x ₁ ) to the placenta (0 and 7.5µl, respectively).

The original design proposed by the experimenters used n = 7 placentas (runs) with differing amounts of non-radioactive serine interior (u ₀₁ ) and exterior (x ₂ ) to the placenta, see Table 1.

Noisy observations on the amount of interior radioactive serine (u ₁ ) were made at eight times, common to each of the seven placentas. The experimenters expected greater variability in the concentration of interior radioactive serine near the start of the experiment, before convergence to an equilibrium. Therefore, they choose a design containing a large number of early time points. We broadly follow this protocol, but find optimal designs using n = 2, . . . , 7 placentas with each having n _t = 8 observations taken at common times, t ₁ , . . . , t ₈ , chosen from across the interval [0, 600].

A hierarchical statistical model is assumed for the observed responses:

y _jl = u ₁ (t _l ; x _j , θ _j ) + ε _jl , for j = 1, . . . , n; l = 1, . . . , n _t ,

where x _j = (x ₁ , x _2j ) ^T , ε _jl are independent and identically normally distributed with constant variance σ ² , and θ _j holds the p = 4 subject-specific parameters for the jth placenta with elements assumed to follow independent uniform distributions

θ ji ∼ U [θ i (1 − c i ) , θ i (1 + c i )] , c i > 0 , i = 1, . . . , p .

The goal of the experiment is estimation of the population physical parameters θ = (θ ₁ , . . . , θ _p ) ^T . A priori, we assume c _i ∼ Uniform [0, 0.05] and θ _i ∼ Tri[a _i , b _i ], where Tri[a, b] denotes the symmetric triangle distribution on the interval [a, b]. Reflecting prior knowledge from previous experiments, we set a ₁ = a ₃ = a ₄ = 80, b ₁ = b ₃ = b ₄ = 120, a ₂ = 0.02, b ₂ = 0.08 and we assume σ ² ∼ U[0, 1] for the response variance.

We expect the solution to system of equations (2) to be smooth, and so use squared exponential covariance (8) for the probabilistic solution. The evaluation grid, τ , has size N = 601 and we set auxiliary correlation parameter α = 10N .

Specifying a design corresponds to specifying the n experimental conditions x ₂₁ , . . . , x _2n ,

initial values u ₀₂₁ , . . . , u _02n , and the common n _t = 8 observation times t ₁ , . . . , t _n

_t

. Hence for

n = 2, . . . , 7, the design space has between 12 and 22 dimensions. As for the examples in

Section 4, we impose a constraint on the observation times and specify that they must be at

least 5 seconds apart.

(26)

Table 1: Treatments from the optimal and original designs with n = 7 runs for the placenta example in Section 5: initial concentrations (to nearest integer) of interior (u ₀₂ = u ₂ (0)) and exterior (x ₂ ) non-radioactive serine for each run (placenta).

NSEL NAEL Est01 ^† MS01 ^? Original

Placenta x ₂ u ₀₂ x ₂ u ₀₂ x ₂ u ₀₂ x ₂ u ₀₂ x ₂ u ₀₂

1 0 0 0 0 0 0 0 0 0 0

2 0 38 0 0 0 0 0 0 250 0

3 0 50 0 50 0 56 0 0 250 250

4 0 68 0 67 0 58 0 0 250 1000

5 182 1000 160 1000 177 1000 0 38 1000 0

6 185 1000 175 1000 196 1000 0 41 1000 250

7 206 1000 211 1000 210 1000 115 62 1000 1000

† 0-1 estimation utility; ^? 0-1 model selection utility

We find designs for the NSEL, NAEL, 0-1 estimation and 0-1 model selection utility functions defined in Section 3.1. For the 0-1 estimation utility, we set δ = (5, 5, 0.01, 5) ^T ; for utility φ(θ, y, d) to equal 1, the posterior mean for θ must lie in the box set Q 4

i=1 [θ i −δ i , θ i +δ i ], which contains 0.5% of the volume of the prior support. For the model selection utility, we suppose interest is in determining if the reaction rates are equal, i.e. does θ ₃ = θ ₄ ? To answer this question, we define two models: m ₁ (where θ ₃ = θ ₄ ) and m ₂ (where θ ₃ 6= θ ₄ ).

Figure 5 presents boxplots of twenty evaluations of the Monte Carlo approximation to the expected utility for the optimal design found under each utility function for n = 2, . . . , 7. We also present boxplots of the performance of the original design with n = 7. Unsurprisingly, the expected utility increases with n, and the optimal designs are clearly superior to the original design. For each utility function, the optimal design with n = 2 outperforms the original design with n = 7 placentas, with substantial differences in expected utility.

Table 1 gives the treatments for each design found for n = 7. Figure 6 shows the

observation times for the optimal designs under NSEL, NAEL and 0-1 estimation utilities,

along with realizations from the solution to u ₁ (t), for each run of each design. The designs

under NSEL and NAEL utilities have similar treatments and observation times. The initial

concentrations in Table 1 lead to three distinct profiles of u 1 (t) (labeled placentas 1 and 2; 3

(27)

●

●●

2 3 4 5 6 7

−100 −80 −60 −40 −20 0

Negative squared error loss

Number of placentas, n Expected negativ e squared error loss −100 −80 −60 −40 −20 0

●

2 3 4 5 6 7

−15 −10 −5 0

Negative absolute error loss

Number of placentas, n Expected negativ e absolute error loss −15 −10 −5 0

●

2 3 4 5 6 7

0.0 0.2 0.4 0.6 0.8 1.0

Estimation 0−1

Number of placentas, n Expected Estimation 0−1 0.0 0.2 0.4 0.6 0.8 1.0

●

2 3 4 5 6 7

0.0 0.2 0.4 0.6 0.8 1.0

Model selection 0−1

Number of placentas, n

Expected model selection 0−1

●

0.0 0.2 0.4 0.6 0.8 1.0

●

Optimum

●

Original

Figure 5: Results from the placenta example in Section 5. Boxplots of 20 evaluations of the

Monte Carlo approximation to the expected utility for the original design and the optimal

designs found under four different utility functions for n = 2, . . . , 7.

(28)

and 4; 5, 6 and 7; note though that the placentas are exchangeble). The profile for placentas 1 and 2 has a slow steady increase in u ₁ (t) with respect to t. Placentas 3 and 4 have a steep initial increase and subsequent decrease in u ₁ (t) with respect to t. Finally, placentas 5 to 7 have a steep initial increase in u ₁ (t) with respect to t followed by a slow decrease. The optimal observation times are predominantly at the beginning of the observation window, where u 1 (t) is changing most quickly. The designs under the 0-1 estimation utility are also similar, except a non-zero amount (35 µl) of the initial interior non-radioactive serine is applied to placenta 2.

Figure 7 shows the designs from the 0-1 model selection utility, along with realizations of the solutions u ₁ (t) under models m ₁ and m ₂ . The treatments for the optimal design under the 0-1 utility result in two distinct profiles of u ₁ (t). For placentas 1–5, u ₁ (t) has a slow steady increase in u ₁ (t) with respect to t. Placentas 6 and 7 have a steep initial increase and subsequent decrease in u 1 (t) with respect to t. Unlike the other optimal designs, the observation times are predominantly towards the end of the observation window. The u ₁ (t) profiles are similar under both models, with the most substantial differences occurring in the inter-profile variability towards the middle of the time interval. This region is where the majority of observation times are located.

The original design proposed by the experimenters had an unequal spacing of observation times across the entire interval [0, 600]. There are more observations taken near the start of the interval, and the time points are not dissimilar to those in the optimal designs under NSEL, NAEL and 0-1 estimation. However, the original design has treatments that are very different from any of the optimal designs, with an almost factorial structure and some treatments with high values of x ₂ (exterior initial concentration of non-radioactive serine).

None of the optimal designs include treatments with high x ₂ , demonstrating how it is often

difficult to predict by intuition the treatments in a Bayesian optimal design for a complicated

nonlinear model. In addition, the designs for point estimation (under NSEL, NAEL and 0-1

estimation utilities) are quite different to the design for model selection.

Bayesian optimal design for ordinary diﬀerential equation models with application in biological science

Bayesian optimal design for ordinary differential equation models with application in biological science

Antony M. Overstall ∗† , David C. Woods † and Ben M. Parker ‡

† Southampton Statistical Sciences Research Institute, University of Southampton, United Kingdom

‡ School of Computing and Engineering, University of West London, United Kingdom

Abstract

Bayesian optimal design is considered for experiments where the response distribu- tion depends on the solution to a system of non-linear ordinary differential equations.

Keywords: Approximate coordinate exchange algorithm; decision-theoretic design; Gaussian process emulation; nonlinear design.

∗ CONTACT Antony M. Overstall; A.M.Overstall@southampton.ac.uk; Southampton Statistical Sciences

Research Institute, University of Southampton, Southampton, SO17 1BJ, UK.

1 Introduction

We assume equations with s system states u(t; x, θ) = [u 1 (t; x, θ), . . . , u s (t; x, θ)] T mod- eled as a function of time t and v design variables with values held in the treatment vector x ∈ X ⊂ R v . The p-vector θ ∈ Θ ⊂ R p holds the physical parameters requiring estimation.

For notational simplicity, the dependence of the system states on x and θ is usually sup- pressed, with u(t) = u(t; x, θ), unless multiple treatments or parameter vectors are being considered. We mostly find designs for initial value problems, with u(t) defined via equations







˙

u(t) = f (u(t), t, x; θ) for t ∈ T = [T 0 , T 1 ]; 0 ≤ T 0 < T 1 , u(T 0 ) = u 0 ,

Our research is motivated by experiments to study the transport of serine, an amino

acid, within a human placenta. Specifically, interest is in the movement of serine across a

placental cell membrane (called a vesicle). In the experiments, initial amounts (µl) of both

radioactive and non-radioactive serine are placed exterior and interior to the vesicle, and

then the amount of radioactive serine interior to the vesicle is measured at a series of time

points. The experimenters have control over initial amounts of both the interior and exterior

˙

u 1 (t) = x

(u

(t)+θ u

(u(t),t,θ,x) θ

)−u

(t)(x

+θ

θ

) ,

˙

u 2 (t) = x

(u

(t)+θ u

(u(t),t,θ,x) θ

)−u

(t)(x

+θ

θ

) , u 1 (0) = u 01 ,

u 2 (0) = u 02 ,



 

 

 

 

 

 

t ∈ [0, 600], (2)

where

u ? (u(t), t, θ, x) = 1

θ 1 {2x ? 12 u ? 12 (t) + (1 + θ 2 ) [θ 4 x ? 12 + θ 3 u ? 12 (t)] + 2θ 3 θ 4 } ,

u ? 12 (t) = u 1 (t) + u 2 (t), x ? 12 = x 1 + x 2 , and initial conditions u 0 = (u 01 , u 02 ) T ∈ [0, 1000] 2 are the amounts of radioactive and non-radioactive serine interior to the vesicle at time t = 0.

Here, the four physical parameters correspond to the maximum uptake (θ 1 ), the proportion of the reaction occurring through active transport (θ 2 ) and two reaction rates (θ 3 and θ 4 ).

The values of these parameters are of scientific interest. See Panitchob et al. (2015) and Widdows et al. (2017) for further details of the model and experiment.

) T (j = 1, . . . , n). At each time point, observations y jl ∈ Y jl ⊂ R c are taken on c ≤ s different responses. Let y T j = (y T j1 , . . . , y T jn

) be the cn j - vector of observations from the jth run, and y = y T 1 , . . . , y T n T

be the vector of observations

from the whole experiment.

We describe the experimental data y using statistical model

y|θ, γ, d ∼ F (θ, γ; d) , (3)

with F a specified probability distribution, γ ∈ Γ ⊂ R q a q-vector of nuisance parameters, and d ∈ D a vector specifying the design, chosen from the space of possible designs D.

The dependence of (3) on physical parameters θ and design d is through the solution to equations (1). The most common form of this dependence, assumed in this paper, is via the expected response,

E y jl |θ, x j , t jl

= g (u(t jl ), θ) ,

with g : R s × Θ → Y jl an assumed function. However, the methodology developed here is also immediately applicable to other types of dependency.

Here, we find designs for experiments where one or more of the treatments x 1 , . . . , x n , observation times t j1 , . . . , t jn

, for j = 1, . . . , n, and initial conditions u 01 , . . . , u 0n are under the experimenters’ control. In practice, some of these may be fixed by the protocol of the experiment. We also find designs where the initial conditions are unknown, and included in the vector of parameters.

; that is, d = [(x 21 , u 021 ) T , . . . , (x 2n , u 02n ) T , t T 1 , . . . , t T n ] T .

Previous research on optimal design for models formed as the solution of ordinary dif-

ferential equations has focussed on frequentist methods for models with additive normally

distributed errors, with a design selected that maximizes a function of the Fisher informa-

tion matrix for θ (e.g. Atkinson and Bogacka, 2002 and Rodr´ıguez-D´ıaz and S´ anchez-Le´ on,

2014). The inverse of the Fisher information matrix provides an asymptotic approximation

to the variance-covariance matrix for maximum likelihood estimators of θ. As is usual for

nonlinear models, the information matrix depends on the value of θ, which is uncertain

Antony M. Overstall ^∗† , David C. Woods ^† and Ben M. Parker ^‡

We assume equations with s system states u(t; x, θ) = [u 1 (t; x, θ), . . . , u s (t; x, θ)] ^T mod- eled as a function of time t and v design variables with values held in the treatment vector x ∈ X ⊂ R ^v . The p-vector θ ∈ Θ ⊂ R ^p holds the physical parameters requiring estimation.

u(t) = f (u(t), t, x; θ) for t ∈ T = [T ₀ , T ₁ ]; 0 ≤ T ₀ < T ₁ , u(T 0 ) = u 0 ,

u ₁ (t) = ^x

^(u

^(t)+θ _u

(u(t),t,θ,x) ^θ

^)−u

^(t)(x

^+θ

^θ

⁾ ,

u ₂ (t) = ^x

^(u

^(t)+θ _u

(u(t),t,θ,x) ^θ

^)−u

^(t)(x

^+θ

^θ

⁾ , u ₁ (0) = u ₀₁ ,

u ₂ (0) = u ₀₂ ,

u ^? (u(t), t, θ, x) = 1

θ ₁ {2x ^? ₁₂ u ^? ₁₂ (t) + (1 + θ ₂ ) [θ ₄ x ^? ₁₂ + θ ₃ u ^? ₁₂ (t)] + 2θ ₃ θ ₄ } ,

u ^? ₁₂ (t) = u ₁ (t) + u ₂ (t), x ^? ₁₂ = x ₁ + x ₂ , and initial conditions u ₀ = (u ₀₁ , u ₀₂ ) ^T ∈ [0, 1000] ² are the amounts of radioactive and non-radioactive serine interior to the vesicle at time t = 0.

Here, the four physical parameters correspond to the maximum uptake (θ ₁ ), the proportion of the reaction occurring through active transport (θ ₂ ) and two reaction rates (θ ₃ and θ ₄ ).

) ^T (j = 1, . . . , n). At each time point, observations y _jl ∈ Y _jl ⊂ R ^c are taken on c ≤ s different responses. Let y ^T _j = (y ^T _j1 , . . . , y ^T _jn

) be the cn _j - vector of observations from the jth run, and y = y ^T ₁ , . . . , y ^T _n T

with F a specified probability distribution, γ ∈ Γ ⊂ R ^q a q-vector of nuisance parameters, and d ∈ D a vector specifying the design, chosen from the space of possible designs D.

E y _jl |θ, x _j , t _jl

= g (u(t _jl ), θ) ,

with g : R ^s × Θ → Y _jl an assumed function. However, the methodology developed here is also immediately applicable to other types of dependency.

Here, we find designs for experiments where one or more of the treatments x 1 , . . . , x n , observation times t _j1 , . . . , t _jn

, for j = 1, . . . , n, and initial conditions u ₀₁ , . . . , u _0n are under the experimenters’ control. In practice, some of these may be fixed by the protocol of the experiment. We also find designs where the initial conditions are unknown, and included in the vector of parameters.

; that is, d = [(x 21 , u 021 ) ^T , . . . , (x 2n , u 02n ) ^T , t ^T ₁ , . . . , t ^T _n ] ^T .

u _h (t) = [ ˙ u _h (t ₁ ), . . . , u ˙ _h (t _w )] ^T will be multivariate normal N ( ˙ m _0h (t), C ˙ ₀ (t, t)), with n-vector m ˙ 0h (t) having lth entry ˙ m 0h (t l ) and, for vector t ⁰ = (t ⁰ ₁ , . . . , t ⁰ _w

) ^T , w × w ⁰ matrix ˙ C 0 (t, t ⁰ ) having lkth entry ˙ C ₀ (t _l , t ⁰ _k ). A joint Gaussian process prior for both ˙ u _h (·) and the solution function u _h (·) then follows directly, implying the joint distribution

˙ u _h (t) u h (t ⁰ )

˙ m _0h (t) m 0h (t ⁰ )

C ˙ ₀ (t, t) C ¯ ₀ (t, t ⁰ ) C ¯ 0 (t ⁰ , t) C 0 (t ⁰ , t ⁰ )

Figure 1: Plots showing 1000 draws from the probabilistic solution of u ₁ (t) and u ₂ (t) against t for system of equations (2) that describe the transport of serine in a human placenta.

with w-vector m _0h (t) having lth entry m _0h (t _l ) = R t

0 m ˙ _0h (z) dz + u _0h , w × w ⁰ matrix C ₀ (t, t ⁰ ) having lkth entry C ₀ (t _l , t ⁰ _k ) = R t

0 C ˙ ₀ (z, z ⁰ ) dzdz ⁰ , and w × w ⁰ cross-covariance matrix C ¯ ₀ (t, t ⁰ ) having lkth entry ¯ C ₀ (t _l , t ⁰ _k ) = R t

Chkrebtii et al. (2016) allowed covariance function ˙ C ₀ (t, t ⁰ ) to depend on hyperparam-

1 Set Λ 1 = 0 and f ₁ = f (u 0 , T 0 , x; θ)

2 for r = 1, . . . , N − 1 do (a) Set τ _r = (τ ₁ , . . . , τ _r ) ^T (b) Compute

B _r = ( ˙ C ₀ (τ _r , τ _r ) + Λ _r ) ⁻¹ a _r = B _r C ¯ ₀ (τ _r , τ _r+1 )

C ˙ _r+1 = ˙ C ₀ (τ _r+1 , τ _r+1 ) − C ˙ ₀ (τ _r+1 , τ _r )B _r C ˙ ₀ (τ _r , τ _r+1 ) Λ _r+1 = diag{Λ _r , C ˙ _r+1 }