Inverse Problems

(1)

Gabriel Peyré

www.numerical-tours.com

Low Complexity Regularization of

Inverse Problems

Cours #2

Recovery Guarantees

(2)

Overview of the Course

• Course #1: Inverse Problems

• Course #2: Recovery Guarantees

• Course #3: Proximal Splitting Methods

(3)

Overview

• Low-complexity Regularization with Gauges

• Performance Guarantees

• Grid-free Regularization

(4)

Inverse Problem Regularization

observations y parameter

Estimator: x(y ) depends only on

Observations: y = x ₀ + w 2 R ^P .

(5)

Inverse Problem Regularization

observations y parameter

Example: variational methods

Estimator: x(y ) depends only on

x(y ) 2 argmin

x 2R

^N

1 2 || y x || ² + J (x)

Data fidelity Regularity

Observations: y = x ₀ + w 2 R ^P .

(6)

J (x ₀ )

Regularity of x ₀

Inverse Problem Regularization

observations y parameter

Example: variational methods

Estimator: x(y ) depends only on

x(y ) 2 argmin

x 2R

^N

1 2 || y x || ² + J (x)

Data fidelity Regularity Observations: y = x ₀ + w 2 R ^P .

Choice of : tradeoff

|| w ||

Noise level

(7)

J (x ₀ )

Regularity of x ₀

x

^?

2 argmin

x2R^Q,Kx=y

J (x)

Inverse Problem Regularization

observations y parameter

Example: variational methods

Estimator: x(y ) depends only on

x(y ) 2 argmin

x 2R

^N

1 2 || y x || ² + J (x)

Data fidelity Regularity Observations: y = x ₀ + w 2 R ^P .

No noise: 0 ⁺ , minimize Choice of : tradeoff

|| w ||

Noise level

(8)

J (x ₀ )

Regularity of x ₀

x

^?

2 argmin

x2R^Q,Kx=y

J (x)

This course: Performance analysis.

Fast computational scheme.

Inverse Problem Regularization

observations y parameter

Example: variational methods

Estimator: x(y ) depends only on

x(y ) 2 argmin

x 2R

^N

1 2 || y x || ² + J (x)

Data fidelity Regularity Observations: y = x ₀ + w 2 R ^P .

No noise: 0 ⁺ , minimize Choice of : tradeoff

|| w ||

Noise level

(9)

Coefficients x Image x

Union of Linear Models for Data Processing

Union of models: T 2 T linear spaces.

Synthesis sparsity:

T

(10)

Coefficients x Image x

Union of Linear Models for Data Processing

Union of models: T 2 T linear spaces.

Synthesis sparsity:

T

Structured

sparsity:

(11)

Coefficients x Image x

Union of Linear Models for Data Processing

D

Image x ^Gradient ^D

^⇤

^x Union of models: T 2 T linear spaces.

Synthesis sparsity:

T

Structured sparsity:

Analysis

sparsity:

(12)

Coefficients x Image x

Multi-spectral imaging:

x _i, _· = P r

j =1 A _i,j S _j, _·

Union of Linear Models for Data Processing

D

Image x ^Gradient ^D

^⇤

^x Union of models: T 2 T linear spaces.

Synthesis sparsity:

T

Structured sparsity:

Analysis

sparsity: Low-rank:

S _1, _· S _2, _· S _3, _·

Figure 3. The concept of hyperspectral imagery. Image measurements are made at many narrow contiguous wavelength bands, resulting in a complete spectrum for each pixel.

Hyperspectral Data

Most multispectral imagers (e.g., Landsat, SPOT, AVHRR) measure radiation reflected from a surface at a few wide, separated wavelength bands (Fig. 4). Most hyperspectral imagers (Table 1), on the other hand, measure reflected radiation at a series of narrow and contiguous wavelength bands. When we look at a spectrum for one pixel in a hyperspectral image, it looks very much like a spectrum that would be measured in a spectroscopy laboratory (Fig. 5). This type of detailed pixel spectrum can provide much more information about the surface than a multispectral pixel spectrum.

x

(13)

Gauge: J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

Gauges for Union of Linear Models

Convex

(14)

x T

Gauge:

, Union of linear models (T ) _T _2T Piecewise regular ball

J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

Gauges for Union of Linear Models

J (x) = || x || ¹ T = sparse

vectors

Convex

(15)

x T

Gauge:

, Union of linear models (T ) _T _2T Piecewise regular ball

J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

Gauges for Union of Linear Models

J (x) = || x || ¹ x ⁰

T ⁰

T = sparse vectors

Convex

(16)

x T

Gauge:

, Union of linear models (T ) _T _2T Piecewise regular ball

J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

Gauges for Union of Linear Models

J (x) = || x || ¹ x ⁰

T ⁰

T = sparse vectors

| x

₁

| + || x

_2,3

||

x ⁰ T x

T ⁰

T = block vectors sparse

Convex

(17)

x T

Gauge:

, Union of linear models (T ) _T _2T Piecewise regular ball

J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

Gauges for Union of Linear Models

J (x) = || x || ¹

T = low-rank matrices

J (x) = || x ||

⇤

x x ⁰

T ⁰

T = sparse vectors

| x

₁

| + || x

_2,3

||

x ⁰ T x

T ⁰

T = block vectors sparse

Convex

(18)

x T

Gauge:

, Union of linear models (T ) _T _2T Piecewise regular ball

J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

Gauges for Union of Linear Models

J (x) = || x || ¹

T = low-rank matrices

J (x) = || x ||

⇤

x x ⁰

T ⁰

T = anti- sparse vectors

J (x) = || x ||

1

x x ⁰

T ⁰

T = sparse vectors

| x

₁

| + || x

_2,3

||

x ⁰ T x

T ⁰

T = block vectors sparse

Convex

(19)

@ J (x) = { ⌘ \ 8 y, J (y ) > J (x)+ h ⌘ , y x i}

Subdifferentials and Models

| x |

(20)

I = supp(x) = { i \ x _i 6 = 0 }

@ J (x) = { ⌘ \ 8 y, J (y ) > J (x)+ h ⌘ , y x i}

Subdifferentials and Models

Example: J (x) = || x || ¹

@ || x || ¹ =

⇢

⌘ \ supp(⌘ ) = I,

8 j / 2 I, | ⌘ _j | 6 1

x

@ J (x) 0

| x |

(21)

Definition:

I = supp(x) = { i \ x _i 6 = 0 }

T _x = VectHull(@ J (x)) ^?

@ J (x) = { ⌘ \ 8 y, J (y ) > J (x)+ h ⌘ , y x i}

Subdifferentials and Models

T _x = { ⌘ \ supp(⌘ ) = I } Example: J (x) = || x || ¹

@ || x || ¹ =

⇢

⌘ \ supp(⌘ ) = I,

8 j / 2 I, | ⌘ _j | 6 1

T _x

x

@ J (x) 0

| x |

(22)

Definition:

I = supp(x) = { i \ x _i 6 = 0 }

T _x = VectHull(@ J (x)) ^?

⌘ 2 @ J (x) = ) Proj _T

_x

(⌘ ) = e _x

@ J (x) = { ⌘ \ 8 y, J (y ) > J (x)+ h ⌘ , y x i}

Subdifferentials and Models

e _x = sign(x)

T _x = { ⌘ \ supp(⌘ ) = I } Example: J (x) = || x || ¹

@ || x || ¹ =

⇢

⌘ \ supp(⌘ ) = I,

8 j / 2 I, | ⌘ _j | 6 1

T _x

x

@ J (x)

0 e

_x

| x |

(23)

Examples

` ¹ sparsity: J (x) = || x || ¹

e _x = sign(x) T _x = { z \ supp(z ) ⇢ supp(x) }

x ⁰

x ^@ ^J ^(x)

(24)

Examples

x ⁰ x ^@ ^J ^(x)

` ¹ sparsity: J (x) = || x || ¹

e _x = sign(x) T _x = { z \ supp(z ) ⇢ supp(x) } e _x = ( N (x _b )) _b _2B

N (a) = a/ || a ||

Structured sparsity: J (x) = P

b || x _b ||

T _x = { z \ supp(z ) ⇢ supp(x) }

x ⁰

x ^@ ^J ^(x)

(25)

Examples

x ⁰ x ^@ ^J ^(x)

` ¹ sparsity: J (x) = || x || ¹

e _x = sign(x) T _x = { z \ supp(z ) ⇢ supp(x) } e _x = ( N (x _b )) _b _2B

N (a) = a/ || a ||

Structured sparsity: J (x) = P

b || x _b ||

T _x = { z \ supp(z ) ⇢ supp(x) }

x

@ J (x)

x ⁰ x ^@ ^J ^(x) e _x = U V ^⇤

Nuclear norm: J (x) = || x || _⇤ _SVD: x = U ⇤V ^⇤

T _x = U A + BV ^⇤ \ (A, B ) 2 ( R ⁿ ^⇥ ⁿ ) ²

(26)

Examples

x ⁰ x ^@ ^J ^(x)

x _x ⁰

@ J (x)

I = { i \ | x _i | = || x || ₁ } Anti-sparsity: J (x) = || x || 1

T

_x

= { y \ y

_I

/ sign(x

_I

) }

e _x = | I | ¹ sign(x)

` ¹ sparsity: J (x) = || x || ¹

e _x = sign(x) T _x = { z \ supp(z ) ⇢ supp(x) } e _x = ( N (x _b )) _b _2B

N (a) = a/ || a ||

Structured sparsity: J (x) = P

b || x _b ||

T _x = { z \ supp(z ) ⇢ supp(x) }

x

@ J (x)

x ⁰ x ^@ ^J ^(x) e _x = U V ^⇤

Nuclear norm: J (x) = || x || _⇤ _SVD: x = U ⇤V ^⇤

T _x = U A + BV ^⇤ \ (A, B ) 2 ( R ⁿ ^⇥ ⁿ ) ²

(27)

Overview

• Low-complexity Regularization with Gauges

• Performance Guarantees

• Grid-free Regularization

(28)

Noiseless recovery: ^min

x= x

₀

J (x) ( P ⁰ )

x = x

₀

Dual Certificates

x ^?

(29)

Noiseless recovery: ^min

x= x

₀

J (x) ( P ⁰ )

Dual certificates:

x = x

₀

Proposition:

D (x ₀ ) = Im( ^⇤ ) \ @ J (x ₀ ) x ₀ solution of ( P ⁰ ) () 9 ⌘ 2 D (x ₀ )

Dual Certificates

⌘ @ J (x ₀ )

x ^?

(30)

Noiseless recovery: ^min

x= x

₀

J (x) ( P ⁰ )

Dual certificates:

x = x

₀

Proposition:

D (x ₀ ) = Im( ^⇤ ) \ @ J (x ₀ ) x ₀ solution of ( P ⁰ ) () 9 ⌘ 2 D (x ₀ )

Proof: ⁽ _P 0 ) () min

2 ker( ) J (x ₀ + )

8 (⌘ , ) 2 @ J (x ₀ ) ⇥ ker( ), J (x ₀ + ) > J (x ₀ ) + h , ⌘ i

⌘ 2 Im( ^⇤ ) = ) h , ⌘ i = 0 = ) x ₀ solution.

x ₀ solution = ) 8 , h , ⌘ i 6 0 = ) ⌘ 2 ker( ) ^? .

Dual Certificates

⌘ @ J (x ₀ )

x ^?

(31)

= interior for the topology of a↵(E ) ri(E ) = relative interior of E

Dual Certificates and L2 Stability

x = x

₀

⌘ @ J (x ₀ ) x ^?

Tight dual certificates:

D ¯ (x ₀ ) = Im( ^⇤ ) \ ri(@ J (x ₀ ))

(32)

= interior for the topology of a↵(E ) ri(E ) = relative interior of E

Dual Certificates and L2 Stability

Theorem: [Fadili et al. 2013]

If 9 ⌘ 2 D ¯ (x ₀ ), for ⇠ || w || one has || x ^? x ₀ || = O( || w || )

x = x

₀

⌘ @ J (x ₀ ) x ^?

Tight dual certificates:

D ¯ (x ₀ ) = Im( ^⇤ ) \ ri(@ J (x ₀ ))

(33)

= interior for the topology of a↵(E ) ri(E ) = relative interior of E

Dual Certificates and L2 Stability

[Grassmair 2012]: J (x ^? x ₀ ) = O( || w || ).

[Grassmair, Haltmeier, Scherzer 2010]: J = || · || ¹ .

Theorem: [Fadili et al. 2013]

If 9 ⌘ 2 D ¯ (x ₀ ), for ⇠ || w || one has || x ^? x ₀ || = O( || w || )

x = x

₀

⌘ @ J (x ₀ ) x ^?

Tight dual certificates:

D ¯ (x ₀ ) = Im( ^⇤ ) \ ri(@ J (x ₀ ))

(34)

= interior for the topology of a↵(E ) ri(E ) = relative interior of E

Dual Certificates and L2 Stability

[Grassmair 2012]: J (x ^? x ₀ ) = O( || w || ).

[Grassmair, Haltmeier, Scherzer 2010]: J = || · || ¹ .

! The constants depend on N . . .

Theorem: [Fadili et al. 2013]

If 9 ⌘ 2 D ¯ (x ₀ ), for ⇠ || w || one has || x ^? x ₀ || = O( || w || )

x = x

₀

⌘ @ J (x ₀ ) x ^?

Tight dual certificates:

D ¯ (x ₀ ) = Im( ^⇤ ) \ ri(@ J (x ₀ ))

(35)

Random matrix: 2 R ^P ^⇥ ^N , _i,j ⇠ N (0, 1), i.i.d.

Compressed Sensing Setting

(36)

Theorem:

Random matrix: _i,j ⇠ N (0, 1), i.i.d.

Sparse vectors: J = || · || ¹ . Let s = || x ₀ || ⁰ . If

Then 9 ⌘ 2 D ¯ (x ₀ ) with high probability on . 2 R ^P ^⇥ ^N ,

[Chandrasekaran et al. 2011]

P > 2s log (N/s)

[Rudelson, Vershynin 2006]

Compressed Sensing Setting

(37)

Theorem:

Then 9 ⌘ 2 D ¯ (x ₀ ) with high probability on . Theorem:

Random matrix: _i,j ⇠ N (0, 1), i.i.d.

Sparse vectors: J = || · || ¹ .

Low-rank matrices: J = || · || ⇤ . Let s = || x ₀ || ⁰ . If

Let r = rank(x ₀ ). If

Then 9 ⌘ 2 D ¯ (x ₀ ) with high probability on . 2 R ^P ^⇥ ^N ,

[Chandrasekaran et al. 2011]

P > 2s log (N/s)

P > 3r (N ₁ + N ₂ r ) x ₀ 2 R ^N

¹

^⇥ ^N

²

[Rudelson, Vershynin 2006]

Compressed Sensing Setting

[Chandrasekaran et al. 2011]

(38)

Theorem:

Then 9 ⌘ 2 D ¯ (x ₀ ) with high probability on . Theorem:

Random matrix: _i,j ⇠ N (0, 1), i.i.d.

Sparse vectors: J = || · || ¹ .

Low-rank matrices: J = || · || ⇤ . Let s = || x ₀ || ⁰ . If

Let r = rank(x ₀ ). If

Then 9 ⌘ 2 D ¯ (x ₀ ) with high probability on . 2 R ^P ^⇥ ^N ,

! Similar results for || · || ^1,2 , || · || 1 .

[Chandrasekaran et al. 2011]

P > 2s log (N/s)

P > 3r (N ₁ + N ₂ r ) x ₀ 2 R ^N

¹

^⇥ ^N

²

[Rudelson, Vershynin 2006]

Compressed Sensing Setting

[Chandrasekaran et al. 2011]

(39)

Phase Transitions

P/ N

0 1

s/N r/ p

N

J = || · || ¹ J = || · || ⇤

THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 7

0 25 50 75 100

0 10 20 30

0 300 600 900

F^IGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empirical probability that the `₁ minimization problem (2.6) identifies a sparse vector x₀ 2R¹⁰⁰ given random linear measurements z₀ = Ax₀. [right] Recovery of low-rank matrices. The empirical probability that the S₁ minimization problem (2.7) identifies a low-rank matrix X₀ 2 R³⁰^£³⁰ given random linear measurements z₀ =A ^(X0). In each panel, the colormap indicates the empirical probability of success (black = 0%; white = 100%). The yellow curve marks the theoretical prediction of the phase transition from Theorem II; the red curve

traces the empirical phase transition.

fixed dimensions. This calculation gives the exact (asymptotic) location of the phase transition for the S₁ minimization problem (2.7) with random measurements.

To underscore these achievements, we have performed some computer experiments to compare the theoretical and empirical phase transitions. Figure 2.2[left] shows the performance of (2.6) for identifying a sparse vector in R¹⁰⁰; Figure 2.2[right] shows the performance of (2.7) for identifying a low-rank matrix in ^R³⁰^£³⁰. In each case, the colormap indicates the empirical probability of success over the randomness in the measurement operator. The empirical 5%, 50%, and 95% success isoclines are determined from the data. We also draft the theoretical phase transition curve, promised by Theorem II, where the number m of measurements equals the statistical dimension of the appropriate descent cone, which we compute using the formulas from Sections 4.5 and 4.6. See Appendix A for the experimental protocol.

In both examples, the theoretical prediction of Theorem II coincides almost perfectly with the 50% success isocline. Furthermore, the phase transition takes place over a range of O(p

d) values of m, as promised.

Although Theorem II does not explain why the transition region tapers at the bottom-left and top-right corners of each plot, we have established a more detailed version of Theorem I that allows us to predict this phenomenon as well. See the discussion after Theorem 7.1 for more information.

2.4. Demixing problems. In a demixing problem [MT12], we observe a superposition of two structured vectors, and we aim to extract the two constituents from the mixture. More precisely, suppose that we measure a vector z₀ 2R^d of the form

z₀ = x₀+U y₀ (2.8)

where x₀,y₀ 2R^d are unknown and U 2R^d^£^d is a known orthogonal matrix. If we wish to identify the pair (x₀,y₀), we must assume that each component is structured to reduce the number of degrees of freedom.

In addition, if the two types of structure are coherent (i.e., aligned with each other), it may be impossible to disentangle them, so it is expedient to include the matrixU to model the relative orientation of the two constituent signals.

To solve the demixing problem (2.8), we describe a convex programming technique proposed in [MT12].

Suppose that f and g are proper convex functions on R^d that promote the structures we expect to find in x₀

THE GEOMETRY OF PHASE TRANSITIONS IN CONVEX OPTIMIZATION 7

0 25 50 75 100

0 10 20 30

0 300 600 900

FIGURE 2.2: Phase transitions for linear inverse problems. [left] Recovery of sparse vectors. The empirical probability that the `₁ minimization problem (2.6) identifies a sparse vector x₀ 2R¹⁰⁰ given random linear measurements z₀ = Ax₀. [right] Recovery of low-rank matrices. The empirical probability that the S₁ minimization problem (2.7) identifies a low-rank matrix X₀ 2 R³⁰^£³⁰ given random linear measurements z₀=A ^(X0). In each panel, the colormap indicates the empirical probability of success (black = 0%; white = 100%). The yellow curve marks the theoretical prediction of the phase transition from Theorem II; the red curve

traces the empirical phase transition.

fixed dimensions. This calculation gives the exact (asymptotic) location of the phase transition for the S₁ minimization problem (2.7) with random measurements.

To underscore these achievements, we have performed some computer experiments to compare the theoretical and empirical phase transitions. Figure 2.2[left] shows the performance of (2.6) for identifying a sparse vector in R¹⁰⁰; Figure 2.2[right] shows the performance of (2.7) for identifying a low-rank matrix in ^R³⁰^£³⁰. In each case, the colormap indicates the empirical probability of success over the randomness in the measurement operator. The empirical 5%, 50%, and 95% success isoclines are determined from the data. We also draft the theoretical phase transition curve, promised by Theorem II, where the number m of measurements equals the statistical dimension of the appropriate descent cone, which we compute using the formulas from Sections 4.5 and 4.6. See Appendix A for the experimental protocol.

In both examples, the theoretical prediction of Theorem II coincides almost perfectly with the 50% success isocline. Furthermore, the phase transition takes place over a range of O(p

d) values of m, as promised.

Although Theorem II does not explain why the transition region tapers at the bottom-left and top-right corners of each plot, we have established a more detailed version of Theorem I that allows us to predict this phenomenon as well. See the discussion after Theorem 7.1 for more information.

2.4. Demixing problems. In a demixing problem [MT12], we observe a superposition of two structured vectors, and we aim to extract the two constituents from the mixture. More precisely, suppose that we measure a vector z₀ 2R^d of the form

z₀ = x₀+U y₀ (2.8)

where x₀,y₀ 2 R^d are unknown and U 2R^d^£^d is a known orthogonal matrix. If we wish to identify the pair (x₀,y₀), we must assume that each component is structured to reduce the number of degrees of freedom.

In addition, if the two types of structure are coherent (i.e., aligned with each other), it may be impossible to disentangle them, so it is expedient to include the matrixU to model the relative orientation of the two constituent signals.

To solve the demixing problem (2.8), we describe a convex programming technique proposed in [MT12].

Suppose that f and g are proper convex functions on R^d that promote the structures we expect to find in x₀

1 0 1

1 P/ N

From [Amelunxen et al. 20013]

(40)

⌘ 2 D (x ₀ ) = )

⇢ ⌘ = ^⇤ q

Proj _T (⌘ ) = e

Minimal-norm Certificate

T = T

_x₀

e = e

_x₀

(41)

⌘ 2 D (x ₀ ) = )

⇢ ⌘ = ^⇤ q

Proj _T (⌘ ) = e

⌘ ₀ = argmin

⌘=

^⇤

q,⌘

_T

=e || q ||

Minimal-norm Certificate

Minimal-norm pre-certificate:

T = T

_x₀

e = e

_x₀

(42)

⌘ 2 D (x ₀ ) = )

⇢ ⌘ = ^⇤ q

Proj _T (⌘ ) = e

⌘ ₀ = argmin

⌘=

^⇤

q,⌘

_T

=e || q ||

T = Proj _T

Minimal-norm Certificate

Proposition: One has

Minimal-norm pre-certificate:

T = T

_x₀

e = e

_x₀

⌘ ₀ = ( ⁺ _T ) ^⇤ e

(43)

⌘ 2 D (x ₀ ) = )

⇢ ⌘ = ^⇤ q

Proj _T (⌘ ) = e

⌘ ₀ = argmin

⌘=

^⇤

q,⌘

_T

=e || q ||

T = Proj _T

Minimal-norm Certificate

Proposition:

Theorem:

the unique solution x ^? of P (y ) for y = x ₀ + w satisfies T _x

^?

= T _x

₀

and || x ^? x ₀ || = O( || w || ) [Vaiter et al. 2013]

One has

Minimal-norm pre-certificate:

T = T

_x₀

e = e

_x₀

⌘ ₀ = ( ⁺ _T ) ^⇤ e

If ⌘ ₀ 2 D ¯ (x ₀ ) and ⇠ || w || ,

(44)

[Fuchs 2004]: J = || · || ¹ .

[Bach 2008]: J = || · || ^1,2 and J = || · || ⇤ .

[Vaiter et al. 2011]: J = || D ^⇤ · || ¹ .

⌘ 2 D (x ₀ ) = )

⇢ ⌘ = ^⇤ q

Proj _T (⌘ ) = e

⌘ ₀ = argmin

⌘=

^⇤

q,⌘

_T

=e || q ||

T = Proj _T

Minimal-norm Certificate

Proposition:

Theorem:

the unique solution x ^? of P (y ) for y = x ₀ + w satisfies T _x

^?

= T _x

₀

and || x ^? x ₀ || = O( || w || ) [Vaiter et al. 2013]

One has

Minimal-norm pre-certificate:

T = T

_x₀

e = e

_x₀

⌘ ₀ = ( ⁺ _T ) ^⇤ e

If ⌘ ₀ 2 D ¯ (x ₀ ) and ⇠ || w || ,

(45)

Compressed Sensing Setting

Theorem:

Random matrix: _i,j ⇠ N (0, 1), i.i.d.

Sparse vectors: J = || · || ¹ . Let s = || x ₀ || ⁰ . If

2 R ^P ^⇥ ^N ,

Then ⌘ ₀ 2 D ¯ (x ₀ ) with high probability on .

[Dossal et al. 2011]

[Wainwright 2009]

P > 2s log(N )

(46)

P ⇠ 2s log(N ) P ⇠ 2s log(N/s)

L ² stability Model stability Phase

transitions: vs.

Compressed Sensing Setting

Theorem:

Random matrix: _i,j ⇠ N (0, 1), i.i.d.

Sparse vectors: J = || · || ¹ . Let s = || x ₀ || ⁰ . If

2 R ^P ^⇥ ^N ,

Then ⌘ ₀ 2 D ¯ (x ₀ ) with high probability on .

[Dossal et al. 2011]

[Wainwright 2009]

P > 2s log(N )

(47)

! Similar results for || · || ^1,2 , || · || ⇤ , || · || 1 .

P ⇠ 2s log(N ) P ⇠ 2s log(N/s)

L ² stability Model stability Phase

transitions: vs.

Compressed Sensing Setting

Theorem:

Random matrix: _i,j ⇠ N (0, 1), i.i.d.

Sparse vectors: J = || · || ¹ . Let s = || x ₀ || ⁰ . If

2 R ^P ^⇥ ^N ,

Then ⌘ ₀ 2 D ¯ (x ₀ ) with high probability on .

[Dossal et al. 2011]

[Wainwright 2009]

P > 2s log(N )

(48)

! Similar results for || · || ^1,2 , || · || ⇤ , || · || 1 .

! Not using RIP technics (non-uniform result on x ₀ ).

P ⇠ 2s log(N ) P ⇠ 2s log(N/s)

L ² stability Model stability Phase

transitions: vs.

Compressed Sensing Setting

Theorem:

Random matrix: _i,j ⇠ N (0, 1), i.i.d.

Sparse vectors: J = || · || ¹ . Let s = || x ₀ || ⁰ . If

2 R ^P ^⇥ ^N ,

Then ⌘ ₀ 2 D ¯ (x ₀ ) with high probability on .

[Dossal et al. 2011]

[Wainwright 2009]

P > 2s log(N )

(49)

⇥x =

i

x _i ( · i)

Increasing :

reduces correlation.

reduces resolution.

J (x) = || x || ¹

1-D Sparse Spikes Deconvolution

x ₀

(50)

⇥x =

i

x _i ( · i)

Increasing :

reduces correlation.

reduces resolution.

0 10

2 support recovery.

J (x) = || x || ¹

()

|| ⌘ _0,I

^c

|| 1 < 1

⌘ ₀ 2 D ¯ (x ₀ )

I = { j \ x ₀ (j ) 6 = 0 }

|| ⌘ _0,I

^c

|| 1

1-D Sparse Spikes Deconvolution

x ₀ x ₀

20 1

()

(51)

Overview

• Low-complexity Regularization with Gauges

• Performance Guarantees

• Grid-free Regularization

(52)

1 When N ! + 1 , support is not stable: N

|| ⌘ _0,I

^c

|| 1 !

N ! + 1 c > 1.

1 c

|| ⌘ _0,I

^c

|| 1

Stable Unstable

Support Instability and Measures

(53)

1 When N ! + 1 , support is not stable: N

|| ⌘ _0,I

^c

|| 1 !

N ! + 1 c > 1.

Intuition: spikes wants to move laterally.

1 c

|| ⌘ _0,I

^c

|| 1

Stable Unstable

! Use Radon measures m 2 M ( T ), T = R / Z .

Support Instability and Measures

(54)

1 When N ! + 1 , support is not stable: N

|| ⌘ _0,I

^c

|| 1 !

N ! + 1 c > 1.

Intuition: spikes wants to move laterally.

1 c

|| ⌘ _0,I

^c

|| 1

Stable Unstable

Extension of ` ¹ : total variation

Discrete measure: m _x,a = P

i a _{i x}

_i

. One has || m _x,a || ^TV = || a || ¹

! Use Radon measures m 2 M ( T ), T = R / Z .

|| m || ^TV = sup

|| g ||

1

6 1

Z

T g(x) dm(x)

Support Instability and Measures

(55)

Sparse Measure Regularization

Measurements: y = (m ₀ ) + w where

8 <

:

m ₀ 2 M ( T ),

: M ( T ) ! L ² ( T ),

w 2 L ² ( T ).

(56)

Acquisition operator:

(m)(x) = Z

T '(x, x ⁰ )dm(x ⁰ ) where ' 2 C ² ( T ⇥ T )

Sparse Measure Regularization

Measurements: y = (m ₀ ) + w where

8 <

:

m ₀ 2 M ( T ),

: M ( T ) ! L ² ( T ),

w 2 L ² ( T ).

(57)

Acquisition operator:

Total-variation over measures regularization:

m 2M min ( T )

1 2 || (m) y || ² + || m || ^TV (m)(x) =

Z

T '(x, x ⁰ )dm(x ⁰ ) where ' 2 C ² ( T ⇥ T )

Sparse Measure Regularization

Measurements: y = (m ₀ ) + w where

8 <

:

m ₀ 2 M ( T ),

: M ( T ) ! L ² ( T ),

w 2 L ² ( T ).

(58)

Acquisition operator:

Total-variation over measures regularization:

m 2M min ( T )

1 2 || (m) y || ² + || m || ^TV

! Infinite dimensional convex program.

! If dim(Im( )) < + 1 , dual is finite dimensional.

! If is a filtering, re-cast dual as SDP program.

(m)(x) = Z

T '(x, x ⁰ )dm(x ⁰ ) where ' 2 C ² ( T ⇥ T )

Sparse Measure Regularization

Measurements: y = (m ₀ ) + w where

8 <

:

m ₀ 2 M ( T ),

: M ( T ) ! L ² ( T ),

w 2 L ² ( T ).

(59)

Measures: min

m 2M 1

2 || m y || ² + || m || ^TV

1 +1

Fuchs vs. Vanishing Pre-Certificates

(60)

a min 2R

^N

1 2 || ^z a y || ² + || a || ¹ Measures: min

m 2M 1

2 || m y || ² + || m || ^TV

On a grid z :

1 +1

Fuchs vs. Vanishing Pre-Certificates

z _i

(61)

a min 2R

^N

1 2 || ^z a y || ² + || a || ¹ Measures: min

m 2M 1

2 || m y || ² + || m || ^TV On a grid z :

⌘ _F = ^{⇤ ⇤} _I ^,+ sign(a _0,I )

1 +1

For m ₀ = m _z,a

₀

, supp(m ₀ ) = x ₀ , supp(a ₀ ) = I :

Fuchs vs. Vanishing Pre-Certificates

z _i

⌘ _F

(62)

a min 2R

^N

1 2 || ^z a y || ² + || a || ¹ Measures: min

m 2M 1

2 || m y || ² + || m || ^TV On a grid z :

⌘ _F = ^{⇤ ⇤} _I ^,+ sign(a _0,I ) ^⌘ ^V ⁼ ^⇤ ^+, _x

₀

^⇤ ^sign(a ⁰ ^), ⁰ ^⇤

1 +1

For m ₀ = m _z,a

₀

, supp(m ₀ ) = x ₀ , supp(a ₀ ) = I :

where

_x

(a, b) = P

i

a

_i

'( · , x

_i

) + b

_i

'

⁰

( · , x

_i

)

Fuchs vs. Vanishing Pre-Certificates

⌘ _V z _i

⌘ _F

(63)

(holds for || w || small enough and ⇠ || w || )

a min 2R

^N

1 2 || ^z a y || ² + || a || ¹ Measures: min

m 2M 1

2 || m y || ² + || m || ^TV On a grid z :

⌘ _F = ^{⇤ ⇤} _I ^,+ sign(a _0,I ) ^⌘ ^V ⁼ ^⇤ ^+, _x

₀

^⇤ ^sign(a ⁰ ^), ⁰ ^⇤

1 +1

For m ₀ = m _z,a

₀

, supp(m ₀ ) = x ₀ , supp(a ₀ ) = I :

where

_x

(a, b) = P

i

a

_i

'( · , x

_i

) + b

_i

'

⁰

( · , x

_i

)

Fuchs vs. Vanishing Pre-Certificates

If 8 j / 2 I, | ⌘ _F (x _j ) | < 1, then supp(a ) = supp(a ₀ )

Theorem: [Fuchs 2004]

⌘ _V z _i

⌘ _F

(64)

(holds for || w || small enough and ⇠ || w || )

a min 2R

^N

1 2 || ^z a y || ² + || a || ¹ Measures: min

m 2M 1

2 || m y || ² + || m || ^TV On a grid z :

⌘ _F = ^{⇤ ⇤} _I ^,+ sign(a _0,I ) ^⌘ ^V ⁼ ^⇤ ^+, _x

₀

^⇤ ^sign(a ⁰ ^), ⁰ ^⇤

1 +1

For m ₀ = m _z,a

₀

, supp(m ₀ ) = x ₀ , supp(a ₀ ) = I :

where

_x

(a, b) = P

i

a

_i

'( · , x

_i

) + b

_i

'

⁰

( · , x

_i

)

then m = m _{x ,a} with

|| x x ₀ || ₁ = O( || w || )

Theorem: [Duval-Peyr´e 2013]

If 8 t / 2 x ₀ , | ⌘ _V (t) | < 1,

Fuchs vs. Vanishing Pre-Certificates

If 8 j / 2 I, | ⌘ _F (x _j ) | < 1, then supp(a ) = supp(a ₀ )

Theorem: [Fuchs 2004]

⌘ _V z _i

⌘ _F

(65)

Numerical Illustration

1 +1 ⌘ _F ⌘ _V ^Zoom

+1

Ideal low-pass filter: '(x, x ⁰ ) = ^sin((2f _sin(⇡

^c

^{+1)⇡(x x} _{(x x}

₀

₎₎

⁰

⁾⁾ , f _c = 6.

⌘ _F

⌘ _V

(66)

Solution path 7! a

Numerical Illustration

1 +1 ⌘ _F ⌘ _V ^Zoom

+1

Ideal low-pass filter: '(x, x ⁰ ) = ^sin((2f _sin(⇡

^c

^{+1)⇡(x x} _{(x x}

₀

₎₎

⁰

⁾⁾ , f _c = 6.

⌘ _F

⌘ _V

(67)

Solution path 7! a

Theorem: [Duval-Peyr´e 2013]

Discrete ! continuous:

If ⌘ _V is valid, then a

neighbors around supp(m ₀ ).

is supported on pairs of

(holds for ⇠ || w || small enough.

Numerical Illustration

1 +1 ⌘ _F ⌘ _V ^Zoom

+1

Ideal low-pass filter: '(x, x ⁰ ) = ^sin((2f _sin(⇡

^c

^{+1)⇡(x x} _{(x x}

₀

₎₎

⁰

⁾⁾ , f _c = 6.

⌘ _F

⌘ _V

(68)

Gauges: encode linear models as singular points.

Conclusion

(69)

Gauges: encode linear models as singular points.

Performance measures L ² error

model di↵erent CS guarantees

Conclusion

(70)

Gauges: encode linear models as singular points.

Performance measures L ² error

model di↵erent CS guarantees Specific certificate:

⌘ ₀ , ⌘ _F , ⌘ _V , . . .

Conclusion

(71)

– Approximate model recovery T _x

^?

⇡ T _x

₀

Inverse Problems

Gabriel Peyré

www.numerical-tours.com

Low Complexity Regularization of

Inverse Problems

Cours #2

Recovery Guarantees

Overview of the Course

• Course #1: Inverse Problems

• Course #2: Recovery Guarantees

• Course #3: Proximal Splitting Methods

Overview

• Low-complexity Regularization with Gauges

• Performance Guarantees

• Grid-free Regularization

Inverse Problem Regularization

observations y parameter

Estimator: x(y ) depends only on

Observations: y = x 0 + w 2 R P .

Inverse Problem Regularization

observations y parameter

Example: variational methods

Estimator: x(y ) depends only on

x(y ) 2 argmin

x 2R

1

2 || y x || 2 + J (x)

Data fidelity Regularity

Observations: y = x 0 + w 2 R P .

J (x 0 )

Regularity of x 0

Inverse Problem Regularization

observations y parameter

Example: variational methods

Estimator: x(y ) depends only on

x(y ) 2 argmin

x 2R

1

2 || y x || 2 + J (x)

Data fidelity Regularity Observations: y = x 0 + w 2 R P .

Choice of : tradeoff

|| w ||

Noise level

J (x 0 )

Regularity of x 0

x

2 argmin

J (x)

Inverse Problem Regularization

observations y parameter

Example: variational methods

Estimator: x(y ) depends only on

x(y ) 2 argmin

x 2R

1

2 || y x || 2 + J (x)

Data fidelity Regularity Observations: y = x 0 + w 2 R P .

No noise: 0 + , minimize Choice of : tradeoff

|| w ||

Noise level

J (x 0 )

Regularity of x 0

x

2 argmin

J (x)

This course: Performance analysis.

Fast computational scheme.

Inverse Problem Regularization

observations y parameter

Example: variational methods

Estimator: x(y ) depends only on

x(y ) 2 argmin

x 2R

1

2 || y x || 2 + J (x)

Data fidelity Regularity Observations: y = x 0 + w 2 R P .

No noise: 0 + , minimize Choice of : tradeoff

|| w ||

Noise level

Coefficients x Image x

Observations: y = x ₀ + w 2 R ^P .

2 || y x || ² + J (x)

Observations: y = x ₀ + w 2 R ^P .

J (x ₀ )

Regularity of x ₀

2 || y x || ² + J (x)

Data fidelity Regularity Observations: y = x ₀ + w 2 R ^P .

J (x ₀ )

Regularity of x ₀

2 || y x || ² + J (x)

Data fidelity Regularity Observations: y = x ₀ + w 2 R ^P .

No noise: 0 ⁺ , minimize Choice of : tradeoff

J (x ₀ )

Regularity of x ₀

2 || y x || ² + J (x)

Data fidelity Regularity Observations: y = x ₀ + w 2 R ^P .

No noise: 0 ⁺ , minimize Choice of : tradeoff

Image x ^Gradient ^D

^x Union of models: T 2 T linear spaces.

x _i, _· = P r

j =1 A _i,j S _j, _·

Image x ^Gradient ^D

^x Union of models: T 2 T linear spaces.

S _1, _· S _2, _· S _3, _·

Gauge: J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

, Union of linear models (T ) _T _2T Piecewise regular ball

J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

J (x) = || x || ¹ T = sparse

, Union of linear models (T ) _T _2T Piecewise regular ball

J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

J (x) = || x || ¹ x ⁰

T ⁰

, Union of linear models (T ) _T _2T Piecewise regular ball

J : R ^N ! R ⁺ ₈ _↵ ₂ _R ⁺ _{, J} _{(↵x) =} _↵J _(x)

J (x) = || x || ¹ x ⁰

T ⁰

x ⁰ T x

T ⁰