Mathematical introduction to Compressed Sensing Lesson 1 : measurements and sparsity Guillaume Lecué

(1)

Mathematical introduction to Compressed Sensing

Lesson 1 : measurements and sparsity

Guillaume Lecué

ENSAE

(2)

Organization of the course

Every Tuesday (2/02 – 9/02 – 23/02 – 30/02 – 2/03 – 9/03 – 16/03 – 23/03) 3 hours course (15:15 to 18:30) 15 minutes break.

No lesson (16/02)

Simulations using python + notebook + cvxopt + cvxpy (toinstall before) All the course material (python and course notebooks) are available onmy webpage

Evaluation:

Python notebook or pdf report (see the details on my webpage) register by group of two students before 22/02on my webpage in the comments section ofmy webpage

Project defense between 29/03 and 2/04.

(3)

Aim of the course: analyze high-dimensional data

1 Understandlow-dimensional structuresin high-dimensional spaces

2 reveal this structure via appropriatemeasurements

3 constructefficient algorithmsto learn this structure

Tools:

1 approximation theory

2 probability theory

3 convex optimization algorithms

(4)

First lesson is about:

Two central ideas:

1 Sparsity

2 measurements through three examples:

1 Single pixel camera

2 face recognition

3 Financial data

(5)

What is Compressed Sensing?

Classical data acquisition system intwo steps:

signal sensing

RAW format file compression

JPEG format file

Compressed sensing makes it inone step:

signal

Compressed sensing

data In french: Compressed Sensing = "acquisition comprimée"

(6)

Q: How is it possible? A: Construct clever measurements!

x: a signal (finite dimensional vector, sayx∈R^N) Takemlinear measurements of signalx:

yi= x,Xi

, i=1, . . . ,m where:

1 Xi: i-thmeasurement vector(inR^N),

2 y_i : i-th measurement (= data = observation).

Problem: reconstructxfrom the measurements(yi)^m_i=1and the measurement vectors(Xi)^m_i=1withmas small as possible.

(7)

Matrix version of Compressed Sensing

We denote y=





 y₁

... y_m





∈R^m and A=





 X₁^>

... X_m^>





∈R^m×N

y:measurements vectorandA:measurements matrix Problem: findxsuch thaty=Axwhenm<<N

A =

x

y

CS = solve a highly undetermined linear system

(8)

Sparsity = low-dimensional structure

Sincem<Nthere isno unique solutionto the problemy=Ax⇒no hope to reconstructxfrom themmeasurementsy_i =

x,X_i .

Idea: Signals to recover have some low-dimensional structure. We assume thatxissparse.

Definition

Support ofx= (x₁, . . . ,xN)^>∈R^N: supp(x) =

j∈ {1, . . . ,N}:x_j6=0 Size of the support ofx:

kxk₀=|supp(x)|

xiss-sparsewhenkxk₀ ≤sandΣs={x∈R^N:kxk₀ ≤s}.

(9)

Sparsity and the undetermined system y = Ax

Idea: Maybe the kernel ofAis in such a position thatthe sparsest solution to y=Axisxitself?

Procedure: Look for the sparsest solution of the systemy=Ax:

ˆx₀∈argmin

At=y

ktk₀ (1)

which looks for vector(s)twith the shortest support in the affine set of solutions

{t∈R^N :At=y}=x+ker(A).

Idea: DenoteΣs={t∈R^N :ktk₀≤s}. IfΣs∩ x+ker(A)

={x}for s=kxk₀then the sparsest element inx+ker(A)isxand soˆx₀=x Definition

ˆ

x₀is called the`₀-minimization procedure (cf. Second lesson)

(10)

Compressed sensing: problems statement

Problem 1: Construct a minimal number of measurement vectors X₁, . . . ,X_m such that one can reconstruct any s-sparse signal x from the m measurements(

x,Xi

)^m_i=1.

Problem 2:Construct efficient algorithmsthat can reconstruct exactly any sparse signal x from the measurements(

x,X_i )^m_i=1.

(11)

Is signal x really sparse?

Sparsity of signalxis the main assumption in Compressed Sensing (and more generally in high-dimensional statistics).

Q.: Is it true that "real signals" are sparse?

Three examples:

1 images

2 face recognition

3 financial data

(12)

Compressed Sensing in images

(13)

Sparse representation of images

An image is a:

1 vectorf ∈R^n×n

2 functionf :{0, . . . ,n−1}²→R

Images can be developed into basis: f =Pn² j=1

f, ψj

ψj

Problem in approximation theory: Find basis(ψj)such that( f, ψj

)ⁿ_j=1² is (approximatively) a sparse vector for real life imagesf.

Solution: Wavelets basis (cf. Gabriel Peyré course) notebook: wavelet decomposition

(14)

Sparse representation of images

Graphics: Representation of log| f, ψj

|n²

j=1in a decreasing order for n=256 (256²=65.536 coefficients).

Conclusion: When developed in an appropriate basis, images have analmost sparse representation.

(15)

Sparse representation of images

Idea: Compression of images by thresholding small wavelets coefficients (JPEG 2000).

Remark: these are the only three slides about approximation theory in this course!

(16)

Compressed sensing and images

Two differences with the CS framework introduced above:

1 images are almost sparse

2 images are (almost) sparse not in the canonical basis but in some other (wavelet) basis.

Two consequences:

1 our procedures will be asked to "adapt" to this almost sparse situation:

stability property

2 we need to introduce astuctured sparsity: being sparse in some general basis.

(17)

Structured sparsity

Definition

LetF ={f₁,· · · ,fp}be adictionaryinR^N. A vectorx∈R^Nis saids-sparse inF when there existsJ ⊂ {1, . . . ,p}such that

|J| ≤sandx=X

j∈J

θjf_j.

In that case,

x=FθwhereF= [f₁| · · · ,|f_p]∈R^N×p andθ∈R^pis as-sparse in the canonical basis ofR^p. For CS measurements, one has:

y=Ax=AFθ

whereθ∈Σsand so one just has to replace the measurement matrixAbyAF.

Conclusion: All the course deals only with vectors that are sparse in the canonical basis.

(18)

What is a photos machine using CS?

It should take measurements like:

We takemmeasurements:

In particular, measurementsy₁, . . . ,ymare real numbers. Each of them can be stored using only one pixel in the camera.

(19)

Single pixel camera from RICE University

DMD: digital micromirror device –randomlyorientated

(20)

Single pixel camera from RICE University

Example of reconstruction of an image using the single pixel camera.

Two problems:

1 How do we choose the measurement vectors: ,· · ·, ?

2 Is there an efficient algorithm to reconstruct the signal from those few measurements?

(21)

CS in face Recognition

(22)

face recognition and Compressed Sensing

Database:D:={(φ_j, `j) :1≤j≤N}where :

1 φj∈R^mis a vector representation of thej-th image, (for instance, concatenation of the images pixels value)

2 `j ∈ {1, . . . ,C}is a label refering to a person

A same person may be represented inDseveral times from various angles, luminosity, etc..

Problem: Given a new imagey∈R^m, we want to label it with an element from the set{`_j,j=1, . . . ,C}

"Classical" solution: use multi-class classification algorithm.

Here:Face recognition as a CS problem.

(23)

The sparsity assumption in face recognition

Empirical observation:If for all of theCindividuals one has:

1 a large enough number of images,

2 enough diversity in terms of angles and brightness

then for any new imagey∈R^mof individual numberi∈ {1, . . . ,C}, one expect that

y≈ X

j:`j=i

φjx_j.

Consequence: We assume that a new imagey∈R^mcan be written as y=Φx+ζ

where:

1 Φ = [Φ₁|Φ₂| · · · |ΦC]andΦi= [φj:`j =i]for anyi∈ {1, . . . ,C},

2 x= [0^>|0^>| · · · |x^>_i |0^>| · · · |0^>]^>wherexiis the restriction ofxto the columns ofΦiinΦ

3 ζ ∈R^merror due to linear approximation ofyby columns inΦ.

(24)

Face recognition as a noisy CS problem

Compare with the benchmark CS setup, one has three difference:

1 there is an additional noise termζ

2 the sparsity assumption onxis stronger here:xis block-sparse

3 depending on the control one has on the database, we may or may not have the ability to choose (in a restricted way) the measurement matrix.

Three consequences:

1 our procedures will be asked to deal with noisy data:robustness property

2 we will design procedures taking advantages of more "advanced"

sparsity like the block-sparsity one

3 when one is in a situation where there is no control on the choice of measurement vectors then one can try several algorithms and see how they behave.

(25)

Construction of a measurement matrix in face recognition

Various angles:

Various brightness:

(26)

CS in Finance

(27)

Finance and CS

Problem: We observe the performances of a portfolio every minute:

y₁, . . . ,y_m. We would like to know how it is structured (shares and quantity).

Data: In addition toy₁, . . . ,y_m, we know the values of all shares at any time:

(28)

Finance and CS

x_i,j: value of sharejat timei. We have the following data:

t=1: y₁: portfolio value (x_1,j)^N_j=1: shares values t=2: y₂: portfolio value (x_2,j)^N_j=1: shares values

· · · ·

t=m: ym: portfolio value (x_m,j)^N_j=1: shares values

Sparsity assumption:The portfolio contains only a limited number of shares and its structure did not change during the observation time.

Problem formulation: findx∈R^Nsuch thaty=Axwhere y= (yi)^m_i=1andA= x_i,j:1≤i≤m,1≤j≤N andxis supposed to be sparse.

(29)

CS and high-dimensional statistics

Definition

We say that a statistical problem is ahigh-dimensional statistical problem when one has to estimate aN-dimensional parameter / vector / object usingm observations andm<N.

1 CS is therefore a high-dimensional statistical problem.

2 Noisy CS is exactly the linear regression statistical model when the noise is assumed to be random.