Applications in image processing - The DART-Europe E-theses Portal

where the quantities labeled with subscripts n and z produce, respectively, the non-zero and zero components ofx. The sparsity constraint

0=Da,zy_a+Dm,zy_m (2.36)

can then be used to obtain one possible estimate of the missing data ˆ

y_m =−(D_m,z)⁺D_a,zy_a. (2.37) The problem with this approach lies in the estimation of the support of xwithout which it is impossible to correctly partitionD^T as in (2.35). Instead one can build an estimateˆy_miteratively by subsequently enforcing (i) an estimate of the sparsity constraint (i.e., of the support of x) and (ii) the available data constraint. Let the diagonal matrices ∆n, ∆a and ∆m have 1/0-valued diagonals built to select, respectively, the non-zero entries of xand the available and missing entries of y.

We thus express the i-th estimate of y as ˆ

It can be shown [Guleryuz 2006] that the estimate yˆⁱ_m thus obtained converges to that in (2.37). Yet the iterative nature of the approach in (2.38) permits an adaptive selection of the support of x wherein the support is re-considered after every few iterations.

2.4.2 Denoising

Sparse representations have also been used to carry out image and video denois-ing [Elad 2006b,Mairal 2008b,Elad 2006a,Protter 2009] by using the sparse prior on the data to formulate the denoising problem as a MAP estimation problem.

The numerical solution of this MAP estimation amounts to obtaining sparse ap-proximations of overlapping image blocks (3-D spatio-temporal blocks, for the case of video) and then averaging over all blocks to obtain the denoised data.

Consider, for example, the vector representation N∈R^r^·^c of a noisy image of size r×c. We can stack all unique, overlappingb×b blocks y(in vectorized form) of the noisy image to form a large vector Y ∈ R^(r⁻^b+1)^·^(c⁻^b+1) and accordingly write the 1/0-valued matrix Ω with i-th row specifying all positions in Y where pixel N_hii occurs. We then build Yˆ by replacing each block y in Y by its sparse approximation ˆy = Dx chosen, eg., so that the error norm |y− ˆy| is similar to the noise variance. The denoised image estimate Nˆ proposed in [Elad 2006b, Elad 2006a] is thus built as:

Nˆ = Ω ˆY+λN

ΩΩ^T +λ. (2.39)

30 Chapter 2. Sparse Representations

Note that the denominator of this expression is a diagonal matrix withi-th diago-nal entry given byni+λ, whereni is the number of unique, overlapping blocks in the image that contain thei-th pixel. Hence the above expression just specifies an averaging of all sparse approximations of any given pixel with a relaxation using the noisy imageN [Elad 2006b].

2.4.3 Texture separation and classification

The application of sparse representations to texture separation has been explored in a series of works [Starck 2004b, Starck 2004a, Peyr´e 2010] that assume that image blocks y consist of a mixture of overlapping component layersu_k:

y=X

αkuk. (2.40)

This model coincides with the case of mixtures between multiple images (eg., , as can occur when taking an image through glass) or to natural images consisting of cartoon-like shape layers with superposed texture [Elad 2005].

Sparse representations can be adapted to this source separation problem by assuming that dictionaries Dk are available that render uk compressible but not u_j∀j 6= k. Two issues come up regarding the usage of this tool: (i) obtaining the Dk and (ii) using them to separate the various image layers. We will focus on the more recent approach presented in [Peyr´e 2010] which uses a combination of off-the-shelf dictionaries (fixed for all the image patches) to model the cartoon layers, while using learned, adaptive (to the neighborhood of each image patch) dictionaries for the more complex texture layer. The formulation of the approach uses the sparse constrain to induce the image layering yet estimates the layers using a separate optimization (H denotes a loss matrix as in (2.22) and is set to identity for the texture separation application):

argmin

Here we have used subscript k to denote cartoon layers that use a fixed Dk and subscript l to denote the layers using adaptive dictionariesDl. The adaptivity of this approach hence concerns the third term in the expression: rather than carrying

2.4. Applications in image processing 31

out a sparse decomposition of a given texture layer ul directly, the approach decomposes each such layer into many overlapping sub-blocksRj(ul),j = 1,2, . . ., which will serve as a training set for the adaptive dictionary D_l. The above problem is solved using three iterative stages (i) solving for the x using linear programming methods, (ii) solving for the u using conjugate gradient descent and (iii) solving for the Dl using gradient descent with projection unto a convex optimization domain of bounded-norm possibilities.

The learned MCA approach above described shares similarities with a new dic-tionary structure presented in this manuscript (Chapter 4 and Chapter 5). The novel structure proposed also assumes that the signal patches are composed of lay-ers that become more textured with the layer index. Hence the dictionary structure is composed of multiple dictionaries (one or more per layer of the structure), each concerned with one of the different layers of the signal being decomposed.

2.4.4 Image compression

Yet another application of sparse representations is that of image compression, where sparse representations consisting of only a few non-zero coefficients conve-niently produce compact representations of image blocks. Indeed the JPEG [Wal-lace 1991] standard is based on the premise that natural images are compressible in the DCT basis. Its successor, the JPEG2000 standard, instead substitutes a wavelet basis that better satisfies the compressibility requirement for natural im-ages. An adaptation of the JPEG standard is still very much in use for intra-coding of video frames in the H.264 standard [Sullivan 2005].

2.4.4.1 The K-SVD facial image encoder

Recent successful attempts have been made towards using learned overcomplete dictionaries adapted to a signal class for the purpose of image compression. By us-ing a learned dictionary, the image encoder can benefit from the resultus-ing greater compressibility of the considered signal class. An example of this approach is em-bodied in a facial image codec introduced by Bryt and Elad [Bryt 2008] based on the learnedK-SVD dictionary [Aharon 2006b] (cf.Section2.3.3). Their approach nonetheless employs a piecewise-affine warping of the face that ensures that the various facial features coincide with those of a pre-specified face template. Each non-overlapping block (taken over a regular grid) of the face template (correspond-ing roughly to a facial feature such as the nose) thus defines a class of signals that is then represented with a corresponding K-SVD dictionary. Thus the compress-ibility of the image blocks in that approach relies, to a large extent, not on the K-SVD dictionary but rather on the affine warping procedure. This warping

pro-32 Chapter 2. Sparse Representations

cedure in turn increases the codec complexity and is further sensitive to image variations encountered in practical scenarios (eg., in lighting conditions, pose and particularities of the subject). The approach nonetheless paid off and enjoyed a wide improvement in PSNR over the state-of-the-art JPEG2000 [Adams 2005]

algorithm, although this comes at the expense of a large storage overhead related to the feature-dependent dictionaries.

2.4.4.2 Sparse orthonormal transforms for image encoding

Another example of an image compression system based on trained overcomplete dictionaries is that developed by Sezer et al. [Sezer 2008]. Their dictionary struc-ture (cf. Section 2.3.2) consists of a concatenation of orthonormal bases. Each image block is encoded using a single one of these bases. This approach reduces the overhead related to coding the atom indices at the expense of reducing the effective size of the dictionary.

Dans le document The DART-Europe E-theses Portal (Page 46-49)