5.2 Materials and methods
5.2.3 Estimation of the elastic properties
In RUS, the elastic properties are estimated from a comparison of the measured resonant frequencies to model-predicted frequencies. The geometry, mass density and stiffness of the specimen entirely determine the frequencies, so that the stiffness can be adjusted until the best match between the model and the experiment has been reached. This is usually done thought the minimization of a quadratic objective function
An issue of that approach is that it requires to correctly pair the measured frequen-cies to their predicted counterpart, while the measurement does not contain the information on which of the many predicted resonant mode have been measured. If the initial guess is far form the actual elasticity and if the predicted resonant modes are not all observed experimentally, as it is the case for high damping materials, the correct pairing is difficult to obtain.
In Chapter4, a Bayesian formulation of the inverse problem was proposed, that automatically solves the joint problem of finding the correct pairing of frequencies and estimating the elastic properties. This approach was used here, with an
exten-5.2. Materials and methods 93
sion allowing one to consider a collection of similar specimens instead of processing each specimen independently.
A transversely isotropic symmetry, which requires 5 elastic parameters, was as-sumed for the elastic properties of tibial cortical bone. In the coordinates system defined by the orthogonal faces of the specimen, which were aligned with the anatom-ical directions, the elasticity matrix takes the form
Cij = bone and axis 1 and 2 orthogonal to the bone axis. The inverted elasticity matrix, containing the engineering moduli, is isotropic in the plane orthogonal to the bone axis. It has been reported to be an accurate assumption for the elastic properties of femoral cortical bone at the mid-diaphysis (Espinoza Orias et al.,2008;Granke et al.,2011). The elastic coefficients Cij are not used directly in the Bayesian estimation, but new parameters mmore consistent with an assumption of a normal distribution are used (see AppendixA).
Bayesian formulation of the inverse problem The pairing of the frequencies is formally represented by a vector a of integers. The length of this vector is equal to the number of predicted modes in the investigated frequency band, and contains a 0 if a particular mode is not observed and an integer is the mode is observed (see Fig. 4.1 in Chapter 4). The joint posterior probability distribution of the elastic propertiesm, measurement noiseσ2 and pairing of predicted and measured resonant modes a (the solution of the inverse problem) is then expressed as the product of prior distributions and the likelihood of the measured data y through Bayes theorem
p(m, σ2,a|y)∝p(m)p(σ2)p(a)p(y|m, σ2,a), (5.5)
with the likelihood
whereg(m,a)includes the forward relation between the elastic parameters and the frequencies and a indicates which output of the model is to be compared with the nth measured frequency.
The posterior distribution (5.5) can not be studied analytically and must be sampled using Markov Chain Monte Carlo methods (MCMC). Gibbs sampling, a particular MCMC, is particularly adapted to that problem. In Gibbs sampling, the conditional distribution of one variable given the other variables in their current state are successively sampled
which asymptotically produces samples of the joint posterior distribution (5.5).
From a large number of samples from the posterior distribution, any characteris-tic of the distribution can be computed (e.g. mean, maximal posterior estimate, confidence interval).
This Bayesian formulation requires the specification of prior probability densities that might have a significant impact on the results but are necessary to constrain the problem. Notably, a multivariate normal prior distribution must be given to the elastic parameters and a prior on a depending on the number of predicted frequencies that have been missed during the measurement must be specified. These distributions could have been fixed once for all before starting the analysis, for example using published data on bone elasticity. This however might have biased the results.
Perhaps the best prior distribution that could be used to analyze the collection of bone specimens at hand is the actual distribution of elastic properties in that particular population. That way, information from the whole population will be used to help the estimation of the parameters for each individual specimen, instead of a priori information from another source. The population distribution is how-ever unknown before the estimation of elasticity for each specimen has been solved.
Bayesian theory provides a tool to estimate jointly the population distribution and the parameters of each specimen. This tool is called hierarchical modeling and is described below.
5.2. Materials and methods 95
Hierarchical Bayes modeling Letmj and yj denote the parameters and data for the j-th specimen of the collection, while mandy are parameters and data for the whole collection (to simplify notations, the dependence on the pairing vector a and experimental error σ2 is implicit in the following equations). Hierarchical modeling assumes that the parametersmj are themselves samples of a distribution characterized by the hyperparametersθ. The joint posterior probability distribution of θ andmwrites
p(θ,m|y)∝p(θ)p(m|θ)p(y|m,θ). (5.8) Common assumptions in hierarchical modeling (Gelman et al.,2013) consider that the hyperparameters influence the likelihood of the data only through the parameters m, i.e.
p(y|m,θ) = p(y|m), (5.9)
and that the j-th data set depends only on the j-th parameter set mj (i.e. the j measurements are independent from each others)
Moreover, Eq. (5.9) implies that the conditional posterior of the hyperparameters is independent of the data
p(θ|m,y) = p(θ|m). (5.11)
The structure of the model is illustrated on figure5.4.
With the above assumptions, the joint posterior distribution (5.8) can be easily sampled using Gibbs sampling. Indeed, given a value of the hyperparametersθ, each set of parameters can be sampled from its conditional distribution. Then, given all the set of parameters, the hyperparameters can be sampled:
step 1: mj1∼p(mj|yj,θ0) for j= 1, . . . , n θ1∼p(θ|m1)
step 2: mj2∼p(mj|yj,θ1) for j= 1, . . . , n . . .
Three hyperparameters θ = (λ, µ,Σ) were introduced. First, the number of missed frequencies in each experimentki is supposed to follow a truncated Poisson distribution of expectation λ
ki! exp(−λ) if ki ≤kmax, p(ki) = 0 if ki> kmax,
wherekmax is the maximal number of missed frequencies permitted by the definition of the pairing vector a. The Poisson’s distribution is adequate to describe the
m1∼p(m∣θ) m2∼p(m∣θ) ... mn∼p(m∣θ)
y1∼p(y∣m1) y2∼p(y∣m2) ... yn∼p(y∣mn) Figure 5.4: Hierarchical modeling assumes that the parameter set mj of each speci-men in the population is a sample of a common population distributionp(m|θ), which depends on the hyperparametersθ. Each data setyjfollows the distributionp(y|mj), indepently of all the other parameters.
number of occurrence of a random event. Then, the elastic parameters m in the population are suppose to follow a multivariate normal distribution of mean µ and covariance matrix Σ. A multivariate normal distribution was used because the sampling strategy described in Chapter4requires a normal prior onmand because this gives simple expressions for sampling the hyperparameters given the elastic parameters for each specimen. The prior onσ2 was not included in the hierarchical approach but was fixed once for all.
The conjugate prior for the mean µ and covariance matrix Σ of a normal dis-tribution is a normal-inverse-Whishart disdis-tribution (NIW; inverse-Whishart distri-bution for the covariance matrix, and normal distridistri-bution for the means given the covariance matrix;Gelman et al.,2013) :
( Σ∼Inv-Wishartν0(Λ−10 ) µ|Σ∼N(µ0,Σ/κ0),
(5.14) where ν0 and Λ0 describe the degrees of freedom and scale matrix for the IW dis-tribution on Σ, µ0 is the prior mean, and κ0 is equivalent to a number of prior measurements. Given m observed sets of elastic parameters m, the posterior dis-tribution p(µ,Σ|m) for the collection mean and covariance matrix is then a NIW distribution (5.14) but with parameters
µm = κ0
κ0+mµ0+ m κ0+mm κm =κ0+m
Λm = Λ0+S+κ0m
5.2. Materials and methods 97 case, µ0 ∝ cst and the mean of the population is equal to the sample mean m.
Noninformative IW priors for covariance matrices are notoriously difficult to specify.
Nevertheless, this prior can be interpreted as equivalent to ν0 prior observations with covariance Λ0. Therefore, we specified these parameters based on previous experimental observation of bone elasticity. The diagonal elastic coefficientsCiiof 21 mid-diaphysis femoral bone specimens measured byGranke et al. (2011) were used to construct Λ0. Transversely isotropic elastic coefficients were inferred (averaging the values of C11 and C22,C44 and C55, and assuming C13 =C12−2C66+ε with ε ∼ N(0,1)). They were then converted to parameters m (see Appendix A) and their empirical covariance matrixΣemp. computed. We obtained
Σemp.= on the covariance matrix of the collection distribution is then an informative prior, constructed from previous data. However, the prior is expected to be weak, as its relative weight in the posterior covariance matrix Λm, expressed in Eq. (5.15), will be6/(59 + 6)≈1/11 (since the sum of square matrix S scales with the number of specimens).
For λ, the expected number of missed frequencies, a gamma prior is adequate p(λ) = βα
Γ(α)λα−1exp(−βλ). (5.19) Given the number of missed frequencies for each of the m experiment: ki, i = 1, ..., m, the posterior distribution ofλ,p(λ|k), is
where the numerator term is due to the truncation of the Poisson’s distribution.
The posterior (5.20) is close to the Gamma distribution Gamma(α+P
ki, β+n) (it would be exactly proportional to it without the truncation on k), therefore to generate samples of the posterior, a sample is generated in the corresponding gamma approximation and a rejection step is applied. In this work we used a non informative prior for λby settingα= 1/2and β = 0 (Yang and Berger,1997).
The noise varianceσ2 was not treated as a hierarchical parameter but its prior was fixed once for all. The conjugate prior for this parameters is
p(σ2) =Inv-χ2(σ2|η0, σ02). (5.21) Parameters were set toη0= 2 andσ20 = 0.052, reflecting the fact that the expected error on the measured frequencies is about 0.5%. This prior is however very weak, as an Inv-χ2 withν0 = 2 degrees of freedom has an infinite variance and no defined mean.
Convergence and post processing Convergence of the Markov Chain to the target distribution was monitored as in Chapter 4, following recommendations of Gelman et al.(2013). 8 chains starting from randomly generated starting points were run in parallel, and the ratios of within-chain variance to between-chain variance where calculated for each scalar parameter of the problem. Ratios of variances lower than 1.1 indicate that all the chains have converged to the same posterior distribution.
After convergence, samples from the 8 chains were pooled. The maxima of the marginal posterior distribution of the elastic parameters were chosen as the estimates. They were obtained from 1D kernel density estimation (Kroese et al., 2011), which provides a smoothed estimate of the marginal distribution (Fig. 5.6 and 5.7). Confidence intervals containing 68% of the marginal distribution were also computed. The boundary of these intervals were placed such that 16%of the samples lie above the upper boundary, and 16%lie below the lower boundary. For a normal distribution, the half-width of this 68% interval would be equal to the standard deviation.