**5.2 Materials and methods**

**5.2.3 Estimation of the elastic properties**

In RUS, the elastic properties are estimated from a comparison of the measured resonant frequencies to model-predicted frequencies. The geometry, mass density and stiffness of the specimen entirely determine the frequencies, so that the stiffness can be adjusted until the best match between the model and the experiment has been reached. This is usually done thought the minimization of a quadratic objective function

F(Cij) =X

n

fn^{exp}−f_{n}^{cal}(Cij)
fn^{exp}

2

. (5.2)

An issue of that approach is that it requires to correctly pair the measured frequen-cies to their predicted counterpart, while the measurement does not contain the information on which of the many predicted resonant mode have been measured. If the initial guess is far form the actual elasticity and if the predicted resonant modes are not all observed experimentally, as it is the case for high damping materials, the correct pairing is difficult to obtain.

In Chapter4, a Bayesian formulation of the inverse problem was proposed, that automatically solves the joint problem of finding the correct pairing of frequencies and estimating the elastic properties. This approach was used here, with an

exten-5.2. Materials and methods 93

sion allowing one to consider a collection of similar specimens instead of processing each specimen independently.

A transversely isotropic symmetry, which requires 5 elastic parameters, was as-sumed for the elastic properties of tibial cortical bone. In the coordinates system defined by the orthogonal faces of the specimen, which were aligned with the anatom-ical directions, the elasticity matrix takes the form

Cij =
bone and axis 1 and 2 orthogonal to the bone axis. The inverted elasticity matrix,
containing the engineering moduli, is
isotropic in the plane orthogonal to the bone axis. It has been reported to be an
accurate assumption for the elastic properties of femoral cortical bone at the
mid-diaphysis (Espinoza Orias et al.,2008;Granke et al.,2011). The elastic coefficients
C_{ij} are not used directly in the Bayesian estimation, but new parameters mmore
consistent with an assumption of a normal distribution are used (see AppendixA).

Bayesian formulation of the inverse problem The pairing of the frequencies
is formally represented by a vector a of integers. The length of this vector is equal
to the number of predicted modes in the investigated frequency band, and contains
a 0 if a particular mode is not observed and an integer is the mode is observed
(see Fig. 4.1 in Chapter 4). The joint posterior probability distribution of the
elastic propertiesm, measurement noiseσ^{2} and pairing of predicted and measured
resonant modes a (the solution of the inverse problem) is then expressed as the
product of prior distributions and the likelihood of the measured data y through
Bayes theorem

p(m, σ^{2},a|y)∝p(m)p(σ^{2})p(a)p(y|m, σ^{2},a), (5.5)

with the likelihood

whereg(m,a)includes the forward relation between the elastic parameters and the frequencies and a indicates which output of the model is to be compared with the nth measured frequency.

The posterior distribution (5.5) can not be studied analytically and must be sampled using Markov Chain Monte Carlo methods (MCMC). Gibbs sampling, a particular MCMC, is particularly adapted to that problem. In Gibbs sampling, the conditional distribution of one variable given the other variables in their current state are successively sampled

which asymptotically produces samples of the joint posterior distribution (5.5).

From a large number of samples from the posterior distribution, any characteris-tic of the distribution can be computed (e.g. mean, maximal posterior estimate, confidence interval).

This Bayesian formulation requires the specification of prior probability densities that might have a significant impact on the results but are necessary to constrain the problem. Notably, a multivariate normal prior distribution must be given to the elastic parameters and a prior on a depending on the number of predicted frequencies that have been missed during the measurement must be specified. These distributions could have been fixed once for all before starting the analysis, for example using published data on bone elasticity. This however might have biased the results.

Perhaps the best prior distribution that could be used to analyze the collection of bone specimens at hand is the actual distribution of elastic properties in that particular population. That way, information from the whole population will be used to help the estimation of the parameters for each individual specimen, instead of a priori information from another source. The population distribution is how-ever unknown before the estimation of elasticity for each specimen has been solved.

Bayesian theory provides a tool to estimate jointly the population distribution and the parameters of each specimen. This tool is called hierarchical modeling and is described below.

5.2. Materials and methods 95

Hierarchical Bayes modeling Letm^{j} and y^{j} denote the parameters and data
for the j-th specimen of the collection, while mandy are parameters and data for
the whole collection (to simplify notations, the dependence on the pairing vector
a and experimental error σ^{2} is implicit in the following equations). Hierarchical
modeling assumes that the parametersm^{j} are themselves samples of a distribution
characterized by the hyperparametersθ. The joint posterior probability distribution
of θ andmwrites

p(θ,m|y)∝p(θ)p(m|θ)p(y|m,θ). (5.8) Common assumptions in hierarchical modeling (Gelman et al.,2013) consider that the hyperparameters influence the likelihood of the data only through the parameters m, i.e.

p(y|m,θ) = p(y|m), (5.9)

and that the j-th data set depends only on the j-th parameter set m_{j} (i.e. the j
measurements are independent from each others)

p(y|m) =Y

j

p(y^{j}|m^{j}). (5.10)

Moreover, Eq. (5.9) implies that the conditional posterior of the hyperparameters is independent of the data

p(θ|m,y) = p(θ|m). (5.11)

The structure of the model is illustrated on figure5.4.

With the above assumptions, the joint posterior distribution (5.8) can be easily sampled using Gibbs sampling. Indeed, given a value of the hyperparametersθ, each set of parameters can be sampled from its conditional distribution. Then, given all the set of parameters, the hyperparameters can be sampled:

step 1: m^{j}_{1}∼p(m^{j}|y^{j},θ0) for j= 1, . . . , n
θ_{1}∼p(θ|m_{1})

step 2: m^{j}_{2}∼p(m^{j}|y^{j},θ_{1}) for j= 1, . . . , n
. . .

(5.12)

Three hyperparameters θ = (λ, µ,Σ) were introduced. First, the number of
missed frequencies in each experimentk_{i} is supposed to follow a truncated Poisson
distribution of expectation λ

p(k_{i})∝ λ^{k}^{i}

ki! exp(−λ) if k_{i} ≤k_{max},
p(k_{i}) = 0 if k_{i}> k_{max},

(5.13)

wherek_{max} is the maximal number of missed frequencies permitted by the definition
of the pairing vector a. The Poisson’s distribution is adequate to describe the

θ∼p(θ)

**m**** ^{1}**∼p(m∣θ)

**m**

**∼p(m∣θ) ...**

^{2}**m**

**∼p(m∣θ)**

^{n}**y**** ^{1}**∼p(y∣m

**)**

^{1}**y**

**∼p(y∣**

^{2}**m**

**) ...**

^{2}**y**

**∼p(y∣m**

^{n}**) Figure 5.4: Hierarchical modeling assumes that the parameter set m**

^{n}^{j}of each speci-men in the population is a sample of a common population distributionp(m|θ), which depends on the hyperparametersθ. Each data sety

^{j}follows the distributionp(y|m

^{j}), indepently of all the other parameters.

number of occurrence of a random event. Then, the elastic parameters m in the
population are suppose to follow a multivariate normal distribution of mean µ and
covariance matrix Σ. A multivariate normal distribution was used because the
sampling strategy described in Chapter4requires a normal prior onmand because
this gives simple expressions for sampling the hyperparameters given the elastic
parameters for each specimen. The prior onσ^{2} was not included in the hierarchical
approach but was fixed once for all.

The conjugate prior for the mean µ and covariance matrix Σ of a normal dis-tribution is a normal-inverse-Whishart disdis-tribution (NIW; inverse-Whishart distri-bution for the covariance matrix, and normal distridistri-bution for the means given the covariance matrix;Gelman et al.,2013) :

( Σ∼Inv-Wishart_{ν}_{0}(Λ^{−1}_{0} )
µ|Σ∼N(µ0,Σ/κ0),

(5.14)
where ν_{0} and Λ_{0} describe the degrees of freedom and scale matrix for the IW
dis-tribution on Σ, µ_{0} is the prior mean, and κ_{0} is equivalent to a number of prior
measurements. Given m observed sets of elastic parameters m, the posterior
dis-tribution p(µ,Σ|m) for the collection mean and covariance matrix is then a NIW
distribution (5.14) but with parameters

µm = κ0

κ0+mµ0+ m κ0+mm κm =κ0+m

νm =ν0+m

Λm = Λ0+S+κ0m

m (m−µ0)(m−µ0)^{t},

(5.15)

5.2. Materials and methods 97 case, µ0 ∝ cst and the mean of the population is equal to the sample mean m.

Noninformative IW priors for covariance matrices are notoriously difficult to specify.

Nevertheless, this prior can be interpreted as equivalent to ν0 prior observations
with covariance Λ_{0}. Therefore, we specified these parameters based on previous
experimental observation of bone elasticity. The diagonal elastic coefficientsC_{ii}of 21
mid-diaphysis femoral bone specimens measured byGranke et al. (2011) were used
to construct Λ_{0}. Transversely isotropic elastic coefficients were inferred (averaging
the values of C11 and C22,C44 and C55, and assuming C13 =C12−2C66+ε with
ε ∼ N(0,1)). They were then converted to parameters m (see Appendix A) and
their empirical covariance matrixΣ_{emp.} computed. We obtained

Σ_{emp.}=
on the covariance matrix of the collection distribution is then an informative prior,
constructed from previous data. However, the prior is expected to be weak, as its
relative weight in the posterior covariance matrix Λ_{m}, expressed in Eq. (5.15), will
be6/(59 + 6)≈1/11 (since the sum of square matrix S scales with the number of
specimens).

For λ, the expected number of missed frequencies, a gamma prior is adequate
p(λ) = β^{α}

Γ(α)λ^{α−1}exp(−βλ). (5.19)
Given the number of missed frequencies for each of the m experiment: k_{i}, i =
1, ..., m, the posterior distribution ofλ,p(λ|k), is

p(λ|k)∝p(λ)

where the numerator term is due to the truncation of the Poisson’s distribution.

The posterior (5.20) is close to the Gamma distribution Gamma(α+P

k_{i}, β+n)
(it would be exactly proportional to it without the truncation on k), therefore to
generate samples of the posterior, a sample is generated in the corresponding gamma
approximation and a rejection step is applied. In this work we used a non informative
prior for λby settingα= 1/2and β = 0 (Yang and Berger,1997).

The noise varianceσ^{2} was not treated as a hierarchical parameter but its prior
was fixed once for all. The conjugate prior for this parameters is

p(σ^{2}) =Inv-χ^{2}(σ^{2}|η_{0}, σ_{0}^{2}). (5.21)
Parameters were set toη_{0}= 2 andσ^{2}_{0} = 0.05^{2}, reflecting the fact that the expected
error on the measured frequencies is about 0.5%. This prior is however very weak,
as an Inv-χ^{2} withν0 = 2 degrees of freedom has an infinite variance and no defined
mean.

Convergence and post processing Convergence of the Markov Chain to the target distribution was monitored as in Chapter 4, following recommendations of Gelman et al.(2013). 8 chains starting from randomly generated starting points were run in parallel, and the ratios of within-chain variance to between-chain variance where calculated for each scalar parameter of the problem. Ratios of variances lower than 1.1 indicate that all the chains have converged to the same posterior distribution.

After convergence, samples from the 8 chains were pooled. The maxima of the marginal posterior distribution of the elastic parameters were chosen as the estimates. They were obtained from 1D kernel density estimation (Kroese et al., 2011), which provides a smoothed estimate of the marginal distribution (Fig. 5.6 and 5.7). Confidence intervals containing 68% of the marginal distribution were also computed. The boundary of these intervals were placed such that 16%of the samples lie above the upper boundary, and 16%lie below the lower boundary. For a normal distribution, the half-width of this 68% interval would be equal to the standard deviation.