The inverse problem solutions and resolutions

(1)

Report

Reference

The inverse problem solutions and resolutions

ALECU, Teodor

Abstract

The purpose of this document is to investigate what can and what cannot be done in terms of accuracy of the reconstruction of a still image (this term is used here with its most general meaning) from distorted measurements. This reconstruction is generally known as the inverse problem...

ALECU, Teodor. The inverse problem solutions and resolutions . Genève : 2003

Available at:

http://archive-ouverte.unige.ch/unige:47992

Disclaimer: layout of this document may differ from the published version.

(2)

UNIVERSITE DE GENEVE

CENTRE UNIVERSITAIRE D’INFORMATIQUE COMPUTER VISION AND MULTIMEDIA LABORATORY

Date:

N

^o

November 13, 2003 03.06

TECHNICAL REPORT

The Inverse Problem Solutions and Resolutions

Teodor Iulian Alecu

Computer Vision Group

Computing Science Center, University of Geneva

24 rue du G´en´eral Dufour, CH - 1211 Geneva 4, Switzerland

e-mail: [email protected]

(3)

1 Introduction

The purpose of this document is to investigate what can and what cannot be done in terms of accuracy of the reconstruction of a still image (this term is used here with its most general meaning) from distorted measurements. This reconstruction is generally known as the inverse problem.

Therefore it starts with a concise presentation of the general class of linear inverse problems¹ and their, more or less, ”classic” solutions.

It then debates the problem of noise distorted measurements and modelling errors, trying to evaluate the statistical lower bound of the achievable in regard to the variance of the estimate, both in amplitude and localization terms.

The results obtained allow then the derivation of equations for optimal sensor placement and consistent solution cell size.

Each section contains a final subsection applying the general theory to a more specific case, exemplifying the main ideas exposed.

1in all that follows the specification linear will be implicit, any reference to ”inverse problem” should be understood as to the ”linear inverse problem”

(5)

2 Inverse Problem. Solutions

This section deals with the definition of an inverse problem, the different mathematical formulations and the associated solutions. It is a comprehensive but not exhaustive review of the topic.

It also aims at emphasizing some of the links between the different classes of methods used to solve the inverse problems, which arise in various fields of science.

Generally an inverse problem has a dual counterpart called the direct problem. The direct problem describes a cause♦ effect relationship, while the inverse problem consists in trying to recover the causes from the effects measured, based on the known direct relationship.

For instance, the direct EEG problem can be formulated like this : knowing the configuration of sources in the brain, what will be the electric potentials measured by this electrode array?

While the inverse problem is : having measured this set of electric potentials, what was the source configuration, which could have produced it?

It can be seen already that this second problem is far more difficult, because of three inherent reasons :

• not all the effects are measured (we don’t have electrodes everywhere), thus the data is incomplete

• different causes may produce the same effects

• measurements may be corrupted

The solution can therefore be non-unique and unstable. This bad characteristic of inverse problems is called ill-posedness.

The purpose of this section is to see how to overcome these drawbacks.

(6)

2.1 General Formulation

The specification of the inverse problem is done through its direct counterpart, which can be written in the most general case as:

y(ry) = Z

h(ry, rx)·x(rx) +n (1) Herey denotes the measurements (effects) on the space spanned byr_y, x denotes the function which needs to be determined (cause) on the space spanned byrx, his the operator which maps xtoy(the cause-effect relationship), and n represents the additive noise.

The goal of the inversion is naturally to estimate the dataxbased on the measurementsy.

The equation (1) can be further simplified under some common assumptions : The mapping operator is linear shift-invariant

h(ry, rx) =h(ry−rx);

h(ry+ ∆, rx+ ∆) =h(ry, rx). (2)

Then the equation (1) becomes

y=h∗x+n (3)

hwill in this case denote the impulse response of the system In the Fourier space the equation (3) takes a very simple form :

Y(ω) =H(ω)X(ω) +N(ω) (4)

The solution space and the measurement space are discrete, then the equation (1) takes the matrix form

y=Hx+n (5)

where y and n are vectors of length N (number of measurement points), x is a vector of length M (number of solution points) and H is a matrix of sizeN×M.

Most of the considerations in this document are based on formulations (4) and (5) of the more general equation (1).

2.2 Ideal case solution

The first solutions given to this problem are based on the simplest ideal case of noise free data. It will be shown that in this case a straightforward solution can be found using only intuitive natural constraints

The equations that will be considered are therefore : Y(ω) =H(ω)X(ω)

y=Hx (6)

The first form has a general obvious solution X(ω) = H^∗(ω)

kH(ω)k²Y(ω) (7)

The second one can be split three wise, following the relationship between the number of measurementsN and the number of solution pointsM².

N=MIdeal system

This is the trivial case when a simple inversion of the matrix suffices to recover the data

x=H⁻¹y (8)

2it is supposed thatrank(H) = min(M, N), which means that the linear system has been reduced to a non- redundant form

(7)

But what if the matrixH is not square?

N > M Overdetermined system

In this case there is a mismatch between the data and the possible solutions. This usually occurs when a complex system is linearized for simplicity. The most known example is linear regression.

The natural constraint is to find a solution that minimizes the misfit between the data and the model.

minx |y−Hx|_L (9)

Usually the norm L is chosen to be the L2 norm, which leads to theleast square solution : x= H^tH₋₁

H^ty (10)

This inversion matrix is known as the Moore-Penrose pseudoinverse.

Proof of (10) :

ky−Hxk²= min⇒^δky−Hxk_δx ² = 0⇔2H^t(y−Hx) = 0

⇔H^ty=H^tHx⇔x= (H^tH)⁻¹H^ty N < M Underdetermined system

Such a situation implies that not enough measurements are taken. The information given by them is not sufficient to discriminate between different configurations of the source data, which all fit perfectly to the measured data. A solution must be chosen, and the criteria will be to choose the one having minimum norm.

minx kxk_L (11)

When using L2 norm (energy minimization) this case yields a different but very similar solution to the previous, known as theminimum norm solution:

x=H^t HH^t₋₁

y (12)

Proof of (12) :

Using the method of lagrangian multipliers : kxk²= min

y=Hx

⇒min

x kxk²⇔min

x,λ

kxk²+λ^t(y−Hx)

Then :

d(^kxk²^+λ^t^(y−Hx))

dx = 0⇔2x−H^tλ= 0

⇒2y=HH^tλ⇔λ= 2 (HH^t)⁻¹y )

⇒x=H^t HH^t₋₁ y

For the sake of simplicity all these three inversion matrices will be denoted byH⁺.The equivalent of equation (7) for the general discrete case is then :

x=H⁺y (13)

(8)

2.3 Singularities

Let’s consider now the influence of noise on the inversion. Considering equations (7) and(13) the error on the estimate can be simply computed as :

E(ω) = ˆX(ω)−X(ω) =_kH(ω)k^H^∗^(ω)2N(ω)

e= ˆx−x=H⁺n (14)

where N(ω) andn stand for additive noise. Although one could think that small noise does not modify our solution, the above formulas show that if special care is not taken the error on the reconstruction can be huge when the transfer function approaches singularities.

This is obvious for the first form :

kH(ω)k →0⇒E(ω)⇒+∞

Figure 1: Transfer functions and singularities

For instance in the Figure 1 out of band noise (left) can dominate the reconstruction. Even worst, if the transfer function has zeros (right figure) the reconstruction will be completely de- graded.

This instability to noise is going to be studied in the simulation subsection.

Even though less intuitive, the same phenomenon can be observed for the discrete system. This property can be shown using singular value decomposition.

H =UΣV^t

withU,Vunitary matrices and Σ square diagonal matrix with values s1≥s2≥...sk...≥srank(H)>0

.It can be proven that thesi are the eigenvalues of both matrices HH^t and H^tH and that the columns of U andV are the eigenvectors associated to the decompositions.

In this setup, the least square solution can be rewritten as : H_LS⁺ = (H^tH)⁻¹H^t=

(UΣV^t)^tUΣV^t ₋₁

(UΣV^t)^t U^tU =V^tV =I_rank(H)

)

⇒ H_LS⁺ = (VΣ^tΣV^t)⁻¹VΣU^t=V (Σ^tΣ)⁻¹ΣU^t

Interestingly enough, one can check that from the SVD point of view, the minimum norm solution and the least square solution are practically identical, performing a projection through the eigenvectors of the system:

(9)

H_LS⁺ =V(Σ^tΣ)⁻¹ΣU^t

H_{M N}⁺ =VΣ^t(ΣΣ^t)⁻¹U^t (15) Considering that Σ⁻¹is the square diagonal matrix with values

0<1/s1≤1/s2≤...1/sk...≤1

srank(H) (16)

and the equation (14) it is obvious that even small noise can do tremendous damage if present in the subspace spanned by the eigenvectors associated with the small eigenvalues of H.

One way to deal with this problem is simply to cut off the smaller singular values from Σ⁻¹, according to some more or less justified criteria, and to perform truncated reconstruction.

1/sk= 0, k > ktrunc

The problem with this abrupt decision is that some small singular values could contain im- portant information, which will be irremediably lost. In what follows, a softer method of noise influence reduction is investigated, the regularization approach.

2.4 Deterministic Regularization

Unlike truncation, regularization does not completely cut off singular values, but rather modifies them through a smooth trade-off operation.

The best known method of regularization, due to its simplicity and physically sound hypothesis is the one generally referred to as Tikhonov regularization. Historically the first idea was to try to obtain a solution with a minimum energy, while keeping the discrepancy with the original data to a low.

This type of trade-off can be put in an equation through the formalism of Lagrange multipliers, which gives as the solution as the answer to the following minimization problem :

minx

ky−Hxk²+λkxk²

The parameterλcontrols the trade-off between accuracy (data fit) and prior assumption (energy bound).

More complicated constraints could be imposed, such as bounding the energy not of the source data. but of its derivative(s). From the physical point of view this makes sense because not only it constraints the solution to be smooth, but it can also happen that this derivative fits to the real meaningful measure. For instance, if someone tries to characterize an electric field by a measure of the electric potential, it is obvious that a bound on the energy of the signal is meaningless, since the electric potential is measured in respect to an arbitrary value, while a bound on the first derivative of this signal, which is the electric intensity, will be a bound on the physical energy of the field (∼−−→

kEk²), while a bound on the second derivative (laplacian) is nothing else than a bound on the electric charge density(ρ∼∆V).

In a more general sense we could imagine a bound on any linear transformation of the signal.

This yields the following solution : minx

ky−Hxk²+λkRxk²

⇒x= H^ty

H^tH+λR^tR (17) Rdenotes here a regularization matrix, which can take the form of the identity matrix (energy minimization), of the derivative operator (first derivative constraint), of the laplacian operator (second derivative constraint), any combination of the previous or any other meaningful linear operator.

By the fraction sign it is meant multiplication to the left with the inverse of the denominator, as in the proof below.

Proof of (17) :

(10)

minx

ky−Hxk²+λkRxk²

⇒ ^d(^ky−Hxk²^+λkRxk²)

dx = 0⇒

−2H^t(y−Hx) + 2λR^tRx= 0⇔(H^tH+λR^tR)x=H^ty⇔ x= (H^tH+λR^tR)⁻¹H^ty

A similar formula can be derived for the inversion in the frequencies domain : X(ω) = H^∗(ω)Y(ω)

kH(ω)k²+λkR(ω)k² (18) Considering that the derivative operator can be written as :

δ^k

δω^k.= (iω)^k.⇒Xk(ω) = H^∗(ω)Y(ω)

kH(ω)k²+λω^2k (19) which is the inversion obtained while imposing the k-th derivative constraint. A special case is obtained when k= 0, and that is the signal energy bound :

X(ω) = H^∗(ω)Y(ω) kH(ω)k²+λ

One can now clearly understand what is the role of regularization, smoothing out singularities by the simple operation of adding small positive values. It is clear that if λis too big, the solution will completely deviate from the original. In the light of this formulation it is also obvious that derivative constraints work well only if the imaging system or the original data has a low pass character, since it tends to suppress high frequencies, while the energy constraint on the data itself makes no such assumptions, treating all frequencies alike.

It is possible to go even further in the generalization, and to modify the quadratic form of the functionals:

minx

ky−Hxk²+λΦ(x)

(20) The solution to this problem generally can not be found in analytical form and is given through iterative methods (ex : gradient descent). This type of regularization is called half-quadratic regularization, because of the first quadratic form. In order to insure convergence, special care should be paid to the Φ function (monotonicity and convexity).

An ”en vogue” form of half-quadratic regularization is the so-called ”diffusion regularization”

which attempts to minimize:

minx

ky−Hxk²+λΦ (k∇xk)

using an iterative method based on the Euler-Lagrange formalism.

Even more generally, one could attempt to minimize :

minx (Φ1(y−Hx) +λΦ2(x)) (21) In the next chapter it will be proved that this same formulas can be obtained from a very different point of view, stochastical estimators.

2.5 Stochastical Methods

The stochastical estimators appear in a very different framework from that of deterministic regularization. The data and the measurements are considered to be the result of random processes instead of fixed deterministic values.

Thus the estimators do not compute a fixed value, but rather a probability distribution. The final decision is based on this probability distribution and on the cost of a bad choice. The pick of such an estimator is the less risky value. The trade-off, which was previously tuned through theλ value, is now inherent to the construction of the estimator, through the choice of the cost function.

In what follows the focus will be on the two most used stochastic estimators, maximum likelihood andmaximum a posteriori estimators.

(11)

2.5.1 ML : Maximum Likelihood Estimator

The maximum likelihood estimator, as its name indicates, attempts to pick a value for the source data such as to maximize the likelihood of the measured data. In mathematical terms :

maxx (p(y|x)) (22)

Or, considering equation (5), this can be rewritten as :

maxx (pn(y−Hx)) (23)

wherepndenotes the noise probability distribution. Generally, due to the exponential forms of the most frequent probability distributions, it is of use to maximize not the expression in (23) directly, but its logarithm, the so called log-likelihood.

This gives the first link with the deterministic regularization (21), which is equivalent with (23) iff:

Φ1=−logpn;

Φ2= 0; (24)

Moreover, the parallel can be continued, and it is straightforward to show that if the noise distribution is zero-mean gaussian the ML estimator reduces to the least square estimator (9)

pn∼N 0, σ²_n

⇒Φ1(y−Hx) =−log√¹

2πσ_n²e⁻

ky−Hxk2 2σ2

n =const+^ky−Hxk_2σ2 ² n

minx Φ1(y−Hx) = min

x ky−Hxk²

The penalty caused for the use of this estimator resides in the difference between the prior on the noise distribution and the real distribution. If the assumption made is erroneous, the error on the reconstruction is unpredictable.

It will be shown that the MAP estimator displays similar and more comprehensive links to the deterministic regularization.

2.5.2 MAP : Maximum a Posteriori Estimator

Like the ML estimator, the maximum a posteriori estimator, picks a value based on the probability distribution, but goes a step further and integrates source data distribution information. Instead of searching for a maximum of the likelihood function, it looks for the most probable source data that could have produced the measurement data. In mathematical terms :

maxx (p(x|y)) (25)

Using Bayes identity:

p(x|y) = p(y|x)p(x)

p(y) = p(y|x)p(x) R

x

p(y|x)p(x)

Since the denominator is just a normalizing constant term for the optimisation problem, and using again (5), equation (25) can be rewritten as :

maxx (pn(y−Hx)·px(x)) (26)

Using again the natural logarithm, the equivalence with (21) is valid iff : Φ1=−logpn;

λΦ2=−logpx; (27)

It is noteworthy that the regularization has lost one degree of freedom, the lagrangian multiplier, which is now forced by the probability distributions.

(12)

As previously for the ML estimator, the parallel can be prolonged to prove that the MAP estimator reduces to the energy constrained regularization with fixedλif both the source and the data are gaussian :

pn∼N 0, σ²_n px∼N 0, σ²_x

⇒

( Φ1(y−Hx) =const+^ky−Hxk_2σ2 ² n

Φ2(x) =const+^kxk_2σ2² x

minx (Φ₁(y−Hx) + Φ₂(x)) = min

x

ky−Hxk²+^σ_σ²ⁿ2 x kxk²

The imposed lagrangian parameter is the ratio between the noise variance(energy) and the signal variance, a choice that seems perfectly justified (the solution is regularized more if the noise variance is more powerful)

λ=σ²_n σ_x²

Under these assumptions and considering the source to be an image and the measurements its noisy variant (H =I), the MAP estimator yields the Wiener filter according to (17) :

x= σ_x²

σ_x²+σ²_ny (28)

As for the ML estimator, the penalty to pay is the sensibility to the prior validity, which is now augmented by the assumption on the source distribution. If one is not sure about the source information, it is better to assume nothing about the data, or equivalently assume uniform distribution, which reduces the MAP estimator to the ML estimator.

minx (Φ1(y−Hx) +λΦ2(x)) = min

x



Φ1(y−Hx)−log| {z }px const



= min

x Φ1(y−Hx)

2.6 Simulation

In order to illustrate the effect of singularities and regularization on real images, let’s observe the effects of filtering and reconstructing the classical Lena image (Figure 2).

As it can be seen, the image is destroyed completely if no regularization is performed, while a too high regularization parameter blurs the image.

(13)

Figure 2: Effects of regularization on image restoration

(14)

3 Inverse Problem. Resolutions

This section tries to answer the question of precision for a general restoration procedure. Two main factors are to be accounted when attempting an estimate, and these are the value of the estimate and the fidelity in positioning of the reconstruction.

Recall that the general formulation of the inverse problem is (1) : y(ry) =

Z

h(ry, rx)·x(rx) +n

In what follows,ry and rx will be naturally understood as the position of the sensors, and respectively of the source, while x and y will denote the amplitudes of the source and of the measurements.

In this framework, the first step is to investigate the imprecision on the restoration of the amplitudes, when considering the source positions to be known. This assumption reduces the problem to a linear one, formulated in (5). A powerful tool, the Cram´er-Rao lower bound allows the statistical estimation of this imprecision.

A more complicated problem is the evaluation of the spatial resolution, which is the power of an estimator to localize a source. Unfortunately the dependency of the measurements to the source positioning is non-linear. A model based on the interrelationship (mapping) of the source positions to a measurements space will be proposed, and it will be shown that this model can lead to statistical estimation of the bias and variance of the reconstruction. Moreover, principles of optimality for the resolution cell size and for sensor placement will be derived. The link between this model and the Cram´er-Rao bound will be investigated and a simple path for deriving the spatial resolution from the Cram´er-Rao bound applied to this model will be inferred.

(15)

3.1 The Cram´ er-Rao bound

Coming from the stochastic theory field, the Cram´er-Rao lower bound enables the measure of the imprecision on the estimation of unknown parameters from distorted measured data.

Letxbe a set of deterministic parameters that are observed through a set of measurementsy, andϕ=g(x) a function of x. Then the Cram´er-Rao theorem states that the lower bound for the variance of an unbiased estimator ofϕis :

V ar( ˆϕ)≥ (g⁰(x))² E

∂lnp(y|x)

∂x

_t

∂lnp(y|x)

∂x

(29)

Under the regularity condition : E

∂lnp(y|x)

∂x

= 0

And considering that we are estimating directly x, that isg(x) = x, the equation (29) can be rewritten as :

V ar(ˆx)≥ 1 E

−^∂²^ln_∂x^p(y|x)2

(30)

The denominator of the above equation is known as theFisher Information Matrix. Intu- itively, one can see that if the information received (lnp(y|x)) is very sensitive to the parameter variation, then the parameter can be very precisely estimated. If, on a contrary, the information does not change with the parameter variation, it is impossible to accurately estimate.

In what follows the focus will be on the derivation of this lower bound in conditions of noisy observations for our inverse problem.

Gradually it will be taken into account the influence of additive noise, modelling errors seen as multiplicative noise, and source data statistics. For simplicity of computations gaussian distribution models will be preferred, since this will not affect the generality of qualitative considerations.

3.1.1 Additive noise

The leading equation for our inverse problem in this simple case is (5):

y=Hx+n

x is supposed to be deterministic and the transfer matrix H perfectly known. As specified, our noise is gaussian, zero-meann∼N(0, Rnn). With this assumptions and using (30) :

V ar(ˆx)≥ H^tR⁻¹_nnH₋₁

(31) This reduces, of course, to the more simple expression:

V ar(ˆx)≥ σ²_n

H^tH (32)

when the noise is i.i.d., i.e. ni ∼N(0, σ_n²).

The above formulas clearly explain and give a mathematical quantity to the noise sensitivity of inverse non regularized solutions, observed in the previous section.

Proof of (31) :

(16)

V ar(ˆx)≥ E

−^∂²^ln_∂x^p(y|x)2

₋₁

= E

−^∂²^ln^p_∂xⁿ^(y−Hx)2

₋₁

pn= ¹

(2π)N_/

2(detRnn)1_/

2e⁻¹²ⁿ^t^R⁻¹ⁿⁿⁿ





⇒

V ar(ˆx)≥





 E







∂²







constant, its derivative is 0

z }| {

ln

(2π)N_/

2 (detRnn)1_/ 2

+¹₂(y−Hx)^tR⁻¹_nn(y−Hx)







∂x²













−1

V ar(ˆx)≥

E

∂(^−H^t^R⁻¹nn(y−Hx))

∂x

₋₁

=



E





constant z }| { H^tR⁻¹_nnH









−1

V ar(ˆx)≥ H^tR⁻¹_nnH₋₁ 3.1.2 Modelling errors : multiplicative noise

Let’s complicate a bit the model of degradation. First assume that simultaneously to the additive noise, multiplicative noise degrades the data. This can be integrated in the equation as an error term on the transfer matrixH:

y= (H+ ∆H)x+n (33)

For the sake of simplicity, at the beginning suppose that each element of the error matrix is i.i.d. ∆Hij ∼N(0, σ²_H), and that the noise is also i.i.d. ni ∼N(0, σ²_n). It can be proven that in this simple case :

V ar(ˆx)≥ σ_n²+σ_H²x^tx₂

(σ_n²+σ_H²x^tx)H^tH+ 2N σ⁴_Hxx^t (34) If the error matrix is small enough, a first order approximation can be performed, leading to the simple formula:

V ar(ˆx)≥σ²_n+σ²_Hx^tx

H^tH (35)

The equation (34) can be rewritten in order to have a measure in terms of relative values : V ar(ˆxnorm)≥ σ_nnorm² +σ²_Hnorm₂

σ_nnorm² +σ_Hnorm² + 2σ_Hnorm⁴ ^{N xx}_xtx^t

(36) with

ˆ

xnorm= xˆ

x^tx ; σ²_nnorm= σ²_n

H^tH x^tx ;σ²_Hnorm= σ²_H H^tH ;T r

N xx^t x^tx

=T r(I) =N It can be noticed that the last term is an error distribution term, in phase with the energy distribution of the sources.

Of course the equation (35) takes an even simpler form :

V ar(ˆxnorm)≥σ_nnorm² +σ_Hnorm² (37)

(17)

It is obvious that this modelling error puts a bound on the variance of the reconstruction, even in the absence of noise, which is in plus independent of the energy of the signal. This means that however strong the signal is, it is impossible to properly recover it.

For the one dimensional case, or equivalently for a linear shift-invariant operator in the Fourier domain, the bound (36) simplifies to the scalar expression:

V ar(ˆxnorm)≥ σ_nnorm² +σ²_Hnorm₂

σ²_nnorm+σ_Hnorm² + 2σ⁴_Hnorm (38) The values are now normalized with respect tokX(ω)k² andkH(ω)k².

An interesting discussion can be opened with regard to the behaviour of this bound. When the dominant noise is the additive noise, the bound will have the general behaviour of (32), but with a lower bound fixed by the multiplicative noise, as shown in the left side of Figure 3. When the dominant noise is the multiplicative noise, the variance will tend asymptotically to the value of 0.5 (SNR∼ 3 dB).This can be understood when considering that multiplicative noise, even if with huge variance, still keeps an information on the energy of the signal, as it is multiplied by it (see also5.1for the likelihood estimator for multiplicative noise). A typical plot can be seen in the right hand side of Figure 3.

Figure 3: Variation of the bound with the noise variance

But what happens when both noise variances are high? It is clear that a trade-off will appear, since the additive noise pushes the bound to infinity, while the multiplicative noise ties it very closely to the 0.5 value. This can be seen in the Figure 4, where to the left the same plot as above, but for a higher additive noise variance, is displayed, while to the right is the plot of the minimum variance surface.

The valley to the right of the maximum is essentially determined by the multiplicative noise, while on the left hand of the maximum the additive noise is dominant. The point that separates the two can be easily found for a fixed σ_n² :

∂V ar(ˆxnorm)

∂σ_H² = 0⇒σ_H² = σ²_n

4σ²_n−1 (39)

Of course, these inflexion points will appear only forσ²_n ≥0.25, where the corresponding σ²_H tends to infinity and the variance bound equals 0.5, as one can easily check.

But even if this bounding property of the multiplicative noise is very interesting, one has to see that a 3 dB SNR is insufficient for practical applications. Most of them will stay in the linear

(18)

Figure 4: Additive-multiplicative noise trade-off

domain of the bound, plotted in the Figure 5 for relative variances from 0 to 0.1 (∼10 dB SNR), where the form (37) is a valid approximation.

After these appreciations on the bound on the variance of the estimate, let’s go back to the fundamental formula(34), and prove it.

Proof of (34) : Same as previously :

V ar(ˆx)≥

E

−∂²lnp(y|x)

∂x²

−1

=

E

−∂²lnpn(y−Hx)

∂x²

−1

But this timepn contains both additive and multiplicative noise.

∆Hij∼N(0, σ_H²) i.i.d.⇒∆Hx∼N(0, σ_H²x^txI) ; n∼N(0, σ_n²I)

| {z }

⇓

p_n(∆Hx+n)∼N(0,(σ²_Hx^tx+σ_n²)I) pn(y−Hx) = ¹

(2π)N_/

2(^det(^(σ²Hx^tx+σ²_n)I))1_/

2e⁻¹²^(y−Hx)^t(^(σH²x^tx+σ²_n)I)⁻¹^(y−Hx) pn(y−Hx) = ¹

(2π)N_/

2(σ_H²x^tx+σ²_n)N_/

2e⁻¹²^(y−Hx)^t(^(σH²x^tx+σ²_n)I)⁻¹^(y−Hx) From this point, the derivation proceeds in a relatively straightforward way : V ar(ˆx)≥

E

−^∂²^ln^p_∂xⁿ^(y−Hx)2

₋₁

=



E



^∂

2

N

2 ln(^σH²x^tx+σ²_n)⁺¹2

(y−Hx)t(y−Hx) σ2

Hxtx+σ2 n

∂x²









−1

V ar(ˆx)≥





 E







∂







Expression1

z }| { N σ²_Hx

σ_H²x^tx+σ²_n⁺

Expression2

z }| {

−H^t σ_H²x^tx+σ²_n

(y−Hx)−(y−Hx)^tσ²_Hx(y−Hx) (σ²_Hx^tx+σ_n²)²







∂x













−1

(19)

Figure 5: Linear domain of the Minimum Variance Surface

(20)

Now compute this expression term by term.

Expression 1 :

Expr1 =E



d

N_σ2 ^σ²^H^x Hx^tx+σ_n²

dx



=Nσ_H² σ_H²x^tx+σ²_n

−2σ_H⁴xx^t (σ²_Hx^tx+σ_n²)²

Expression 2 can be decomposed in two sub-parts, using the product rule for derivation.

Expression 2.1 : Expr2.1 =E

d(^−H^t(^σ²Hx^tx+σ²_n)^{(y−Hx)−σ}²Hx(y−Hx)^t(y−Hx))

dx

1

(^σH²x^tx+σ²_n)²

=

= ¹

(^σH²x^tx+σ²_n)²E

d(^−H^t(^σH²x^tx+σ_n²)^{(y−Hx)−σ}H²x(y−Hx)^t(y−Hx))

dx

=

= ¹

(^σ²Hx^tx+σ_n²)²







product derivation rule for the first half of the nominator

z }| {

H^tH σ²_Hx^tx+σ_n²

| {z } the expectancy of a constant is

the same constant

+E d −H^t σ²_Hx^tx+σ_n²

dx (y−Hx)

!

| {z }

0







−

expansion of the term(y−Hx)^t

z }| {

−E d σ_H²xy^t(y−Hx) dx

! +

product derivation rule

z }| {

E d σ²_Hxx^tH^t

dx (y−Hx)

!

| {z }

0

− σ_H²xx^tH^tH

| {z } expectancy of a constant using the fact that the mean of y is Hx,or E(y−Hx) = 0, and again the product rule for the remaining term :

Expr2.1 = ¹

(^σH²x^tx+σ²_n)² H^tH σ_H²x^tx+σ²_n

−σ_H²xx^tH^tH+E σ²_Hxy^tH^t

−E σ_H²y^t(y−Hx)

=

= ¹

(^σ²Hx^tx+σ²_n)²



H^tH σ²_Hx^tx+σ_n²

−σ_H²xx^tH^tH+σ_H²xE y^t H^t

| {z }

0

−E σ_H²y^t(y−Hx)



=

= ¹

(^σ²Hx^tx+σ²_n)²





H^tH σ²_Hx^tx+σ_n²

−σ²_HE





(y−Hx)^t(y−Hx) + x^tH^t(y−Hx)

| {z } 0 by expectancy











=

= ¹

(^σH²x^tx+σ_n²)²

H^tH σ_H²x^tx+σ²_n

−σ_H²E

(∆Hx+n)^t(∆Hx+n)

Under the hypothesis of complete independency of the noise components ∆Hij andni, and of their zero-mean characteristic :

E

(∆Hx+n)^t(∆Hx+n)

=E XN

i=1

(∆Hx+n)²_i

!

=N σ²_Hx^tx+σ_n² Therefore :

Expr2.1 = 1

(σ_H²x^tx+σ²_n)² H^tH σ²_Hx^tx+σ_n²

−N σ²_H σ_H²x^tx+σ²_n It can be already seen that :

Expr1 +Expr2.1 = H^tH σ_H²x^tx+σ²_n

−2N σ⁴_Hxx^t (σ_H²x^tx+σ²_n)²

(21)

Due to the simplification of the term inN σ_H² σ²_Hx^tx+σ_n² . Let’s get to Expression 2.2 :

Expr2.2 =E

−

−H^t σ²_Hx^tx+σ_n²

−σ²_Hx(y−Hx)^t

(y−Hx)

4σ²_Hx^t

(^σ²Hx^tx+σ²_n)³

=

=E





 H^t σ_H²x^tx+σ²_n

−σ_H²xx^tH^t

(y−Hx)

| {z }

0 by expectancy

+σ²_Hxy^t(y−Hx)





 ^4σ

H2x^t

(^σH²x^tx+σ²_n)³ =

= ^4σ⁴^H^xx^t

(^σ²Hx^tx+σ_n²)³E(y^t(y−Hx))

The remaining expectancy is computed as previously, for the expression 2.1, yielding : Expr2.2 = 4N σ_H⁴xx^t

(σ²_Hx^tx+σ_n²)² Finally :

V ar(ˆx)≥(Expr1 +Expr2.1 +Expr2.2)⁻¹=

H^tH(^σ²Hx^tx+σ²_n)^{+2N σ}⁴Hxx^t

(^σ²Hx^tx+σ_n²)²

₋₁

V ar(ˆx)≥ (^σH²x^tx+σ²_n)²

H^tH(^σ²Hx^tx+σ_n²)^{+2N σ}⁴Hxx^t q.e.d.

Physical model for multiplicative noise

This part concerns a special class of multiplicative noise, caused by the sensors positioning.

Suppose that the sensor placements are not precisely known. In the hypothesis of small deviations from the normal positions it can be assumed that :

H(ry+dry) =H(ry) +∂H

∂ry.dry notation

= H(ry) +Hry.dry (40) In the above equationdry stands for a vector containing the displacements of all sensors. and the point multiplication means line by line multiplication. Each element of the derivative matrix is the derivative of each cell with respect to the sensor position corresponding to the line :

Hry

ij= ∂Hij

∂ryi

By identification with (33) and with the simplifying hypothesis of i.i.d. (dry)_i∼N

0, σ_r²_y

:

∆H =Hrydry ⇒∆Hx∼N

0, σ²_r_y

x^tH_r^t_yHryx

I

(41) Computations can be lead similarly to the previous proof to obtain :

V ar(ˆx)≥

σ_r²_yx^tH_r^t_yHryx+σ²_n2

H^tH

σ²_r_yx^tH_r^t_yHryx+σ_n²

+ 2N σ_r⁴_y H_r^t_yHry

₂ xx^t

(42) This formula links the physical displacements of the sensors to the variance of the amplitude of the restored signal. It shows how one can compute the minimum variance of the restoration, based on only the knowledge of the sensor positioning imprecision. That is, for small displacements and additive noise free data :

V ar(ˆx)≥σ_r²_yx^tH_r^t_yHryx

H^tH (43)

The inverse problem solutions and resolutions

Report

Reference

The inverse problem solutions and resolutions

ALECU, Teodor

ALECU, Teodor. The inverse problem solutions and resolutions . Genève : 2003

Date:

N

November 13, 2003 03.06

TECHNICAL REPORT

The Inverse Problem Solutions and Resolutions

Teodor Iulian Alecu

Computer Vision Group

Computing Science Center, University of Geneva

24 rue du G´en´eral Dufour, CH - 1211 Geneva 4, Switzerland

e-mail: [email protected]

Contents

1 Introduction

2 Inverse Problem. Solutions

2.1 General Formulation

2.2 Ideal case solution

2.3 Singularities

2.4 Deterministic Regularization

2.5 Stochastical Methods

2.6 Simulation

3 Inverse Problem. Resolutions

3.1 The Cram´ er-Rao bound