Sparse representations over learned dictionary for document analysis

(1)

HAL Id: tel-01750679

https://hal.univ-lorraine.fr/tel-01750679

Submitted on 29 Mar 2018

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Sparse representations over learned dictionary for

document analysis

Thanh Ha Do

To cite this version:

Thanh Ha Do. Sparse representations over learned dictionary for document analysis. Other [cs.OH].

Université de Lorraine, 2014. English. �NNT : 2014LORR0021�. �tel-01750679�

(2)

AVERTISSEMENT

Ce document est le fruit d'un long travail approuvé par le jury de

soutenance et mis à disposition de l'ensemble de la

communauté universitaire élargie.

Il est soumis à la propriété intellectuelle de l'auteur. Ceci

implique une obligation de citation et de référencement lors de

l’utilisation de ce document.

D'autre part, toute contrefaçon, plagiat, reproduction illicite

encourt une poursuite pénale.

Contact : ddoc-theses-contact@univ-lorraine.fr

LIENS

Code de la Propriété Intellectuelle. articles L 122. 4

Code de la Propriété Intellectuelle. articles L 335.2- L 335.10

http://www.cfcopies.com/V2/leg/leg_droi.php

(3)

D´

epartement de formation doctorale en informatique

Ecole doctorale IAEM Lorraine

´

Sparse Representations over Learned

Dictionary for Document Analysis

TH`

ESE

pr´esent´ee et soutenue publiquement le 4 Avril 2014

pour l’obtention du

Doctorat de l’universit´

e de Lorraine

(sp´

ecialit´

e informatique)

par

DO Thanh Ha

Composition du jury

Rapporteurs :

Christian Viard-Gaudin

Professeur, Institut Universitaire de

Technologie de Nantes

Rolf Ingold

Professeur, Universit´e de Fribourg

Examinateurs :

Jean-Marc Ogier

Professeur, Universit´e de La Rochelle

Laurent Wendling

Professeur, Universit´e Paris Descartes

Directeur de th`

ese :

Salvatore Tabbone

Professeur, L’universit´e de Lorraine

Co-directeur de th`

ese :

Oriol Ramos Terrades

Professeur associ´e `a l’UAB Barcelone

(4)

(5)

(6)

(7)

Iwouldliketo a knowledgewithmu happre iationtheMinistryofEdu ationofVietNam,who

supporteda nan efor mystudy. A spe ialthanksgoesto myuniversity,Ha NoiUniversity of

S ien e- VNU, whogave methe permissionto study at LORIAinFran e.

A spe ial gratitude I give to my supervisor, Mr. Salvatore Tabbone, whose ontribution

in orientations, suggestions and en ouragement. He has dedi ated so mu h time and patien e

during theyears of my study. I thank him not only for theinvaluable advi e, for the valuable

expertisethathe shared withmebut alsofor the supportwhi h allowed meto make this thesis

inthe best working onditions.

Furthermore, a spe ial thanksgoesto my o-supervisor, Mr. Oriol Ramos Terrades, for his

s ienti advi easwell ashis availability despitethedistan e. The dis ussionswithhim allows

meto understand better theresear h and to getmore onden e.

I would like to express my deepest appre iation to Mr. Christian Viard-Gaudin and Mr.

RolfIngold for a epting to review mythesis. I am also grateful to Mr. Jean-Mar Ogier, Mr.

Laurent Wendling for a eptingto bepart ofthe jury.

I would like to thanks all the members of the team QGAR for their friendship, their help

duringmystudy withinthe team,espe iallygratefulto PhilippeDos hfor hisoutstanding

te h-ni alsupport. Thankstoallmy olleagueswithwhomIsharedtheo eforthegoodatmosphere.

Iamalso verygratefulto allmyfriendswho, loselyor remotelyhelped anden ouraged me

atthe onvenient moments. Ithankthem for allthetimepre ious thatwe spent together.

Inaddition, Iwouldalso liketo thank myhusbandand mydaughter fortheir love, kindness

andsupportthat they have shown during thepast three years of mystudy. Last but not least,

(8)

(9)

Inthisthesis,wefo usonhowsparserepresentations anhelptoin reasetheperforman eofnoise

removal,textregionextra tion,patternre ognitionandspottingsymbolsingraphi aldo uments.

Themaingoalistoprovidenewalgorithmsandappli ationsofsparserepresentationinredundant

di tionariesfor graphi al images, byaddressing the problems fromvarious perspe tives.

To do that, rst of all, we give a survey of sparse representations and its appli ations in

image pro essing. Then, we present the motivation of building learningdi tionary and e ient

algorithms for onstru ting a learning di tionary. The te hniques used to solve the sparsity

problemover learningdi tionaryarealso shown.

After des ribing the general ideaof sparserepresentations and learned di tionary, we bring

some ontributionsintheeldofsymbolre ognitionanddo umentpro essingthata hievebetter

performan es omparedtothestate-of-the-art. These ontributionsbeginbyndingtheanswers

tothe following questions.

- The rst question is how we an remove the noise of a do ument when we have no

as-sumptions about themodel ofnoise found in these images? To addresstherst question,

we believe that there is a link between the model of noise and the re onstru tion error

of the signal in the learning di tionary. Therefore, we propose to al ulate the model of

noise automati ally from the database based on thenormalized- orrelation between pairs

of noisy and non-noisy images, and then using this value as the value of re onstru tion

error in the basis pursuit denoising algorithm with a learned di tionary. The e ien y

of the proposed method has been also approved experimentally on dierent datasets for

dierent resolutions and dierent kinds of noise. All experimental results show that the

proposed method outperformsexisting onesinmost ofthe ases.

- These ondquestionishowsparserepresentationsoverlearneddi tionary anseparatethe

text/graphi parts in the graphi al do ument. In fa t, we have been strongly motivated

by thegood performan e of themorphologi al omponent analysis(MCA) method when

applied for separating textures and artoons and the text dete tion from s eni images.

In MCAmethod,thesignalis dis riminatedwithotherbased onthe omparisonbetween

theirs sparse representations in two di tionaries. However, when working with graphi

images, text hara ters are in dierent sizes in luded in the same do ument, and they

tou h eitherthemselves or the graphi s; therefore prior methods asMCA annot be used

e ientlyinthis ase. Asaresults,wehavedevelopedtheassumptionofMCAbyproposing

the strategyusing multi-learneddi tionariesfor separatingthetextregionsfromgraphi al

partinsteadof two di tionaries. Theexperimentalresultsshowthattheproposedmethod

ould be a good hoi e for the segmentation problem with omplex graphi al do uments

whileit over omes therestri tionsof theexistingmethods onsome do uments only.

- This result en ourages us to ontinue with the hallenge in symbol re ognition. On e

again, we desire to answer the question how we an apply the sparse representation for

symbolre ognition? Now,thedi ultyariseswhenitseemsthereisno onne tionbetween

symbol re ognition and sparsity, and in the best of our knowledge, there is not previous

worksinthegraphi s ommunityusingsparserepresentationstodes ribegraphi alsymbols.

Fortunately, after tireless resear h, we nd the bridge between the literature of sparse

representation and the visual vo abulary onstru tion. More spe i ally, we apply the

learned di tionaryalgorithm forlearning avisualvo abularybased onlo aldes riptors of

(10)

inthe onstru ting ve tor model an helpto improve the retrieval performan e. We hope

that this work will open a new range of appli ations for the symbol re ognition sin e in

our method other kindof lo aldes riptor an be used.

We omplete this thesis by proposing an approa h of spotting symbols that usesparse

rep-resentations for the oding of a visual vo abulary. This approa h also uses learning te hniques

to adapt su h visual vo abulary to theintrinsi properties of thedo ument datasets. It allows

a hievingarepresentationbeingsparserthantheoneobtainedbyusingapre-xedbasisinstead.

The ontribution done in this work fo uses on the symbol retrieval pro ess. The proposed

ap-proa h follows a two steps ar hite ture in luding the re all and the rening steps. The main

goals of this ar hite ture are to speed up the retrieval pro ess using sparse representation and

indexingte hniquesandtoleavemore omputationalexpensivemat hingmethodsonlyforthose

regions inwhi h the queried symbolmayappear. The rstexperiments onthe SESYDdataset

for asymbolspotting appli ation seemsto agree, and theobtained resultsare promising.

Key words: sparse representations, learned di tionary, learning algorithms, removal noise,

(11)

Dans ettethèse,nousnous on entronssur ommentlesreprésentationspar imonieusespeuvent

aider à augmenter les performan es pour réduire le bruit, extraire des régions de texte,

re on-naissan edesformesetlo aliser dessymboles dansdesdo umentsgraphiques. Le but prin ipal

est de fournir de nouveaux algorithmes et d'appli ations de représentation par imonieuses par

des di tionnaires redondants pour des images graphiques, en abordant les problèmes selon des

perspe tivesdiverses.

Pour e faire, tout d'abord, nous donnons une synthèse des représentations par imonieuses

et ses appli ations en traitement d'images. Ensuite, nous présentons notre motivation pour

l'utilisation de di tionnaires d'apprentissage ave des algorithmes e a es pour les onstruire.

Leste hniquesemployantàrésoudreleproblèmedepar imoniesurledi tionnaired'apprentissage

sont présentées aussi.

Aprèsavoirdé ritl'idéegénéraledesreprésentationspar imonieusesetdudi tionnaire

d'appr-entissage,nousprésentonsnos ontributionsdansledomainedelare onnaissan edesymboleset

dutraitement desdo umentsen les omparant auxtravauxde l'étatde l'art. Ces ontributions

s'emploient à répondreauxquestionssuivantes:

- Lapremièrequestionest ommentnouspouvonssupprimerlebruitdesimagesoùiln'existe

au une hypothèse sur le modèle de bruit sous-ja ent à es images? Pour aborder ette

première question, nous royons qu'il y a un lien entre le modèle de bruit et l'erreur de

re onstru tion du signal sur le di tionnaire d'apprentissage. Don , nous proposons de

al uler automatiquement le modèle de bruit à partir de la orrélation normalisée entre

les paires d'images bruitée et non bruitée, et ensuite en utilisant ette valeur omme la

valeur d'erreur de re onstru tion dans l'algorithme BPDN (basis pursuit denoising) ave

un di tionnaired'apprentissage. L'e a itéde laméthode proposéea étéaussiapprouvée

expérimentalement surunensemblede données endiérentesen résolutionseten type de

bruit. Tous les résultatsexpérimentaux montrent que laméthode proposéesurpasse dans

laplupart des aslestravaux existants.

- La deuxième question est omment les représentations par imonieuses sur le di tionnaire

d'apprentissage peuvent être adapter pour séparer le texte du graphique dans des

do u-ments? Notre ontribution a étémotivée parl'appro he MCA(Morphologi alComponent

Analysis) qui a été appliquée ave su ès dans la séparation de textures et des dessins

animés etla déte tion de texte d'images s éniques. Dans la méthode MCA, le signal est

séparé d'un autre par la omparaison de leurs représentations par imonieuses sur deux

di tionnaires. Cependant, dans les images graphiques, diérentes tailles de ara tères de

texte peuvent être in lus dansun même do ument etils peuvent setou her ou tou herle

graphisme;par onséquent,lesméthodesantérieures ommeMCAnepeuventpasêtre

util-isées e a ement dans e as. Ainsi, nousnoussommes appuyés surMCApour proposer

unestratégie utilisantdesmulti-di tionnairesd'apprentissageaulieudedeuxdi tionnaires

uniquement pour séparerlapartiede textedelapartie degraphique. Lesrésultats

expéri-mentauxmontrentquelaméthodeproposéepourraitêtreunbon hoixpourleproblèmede

segmentation texte/graphique dans do uments omplexes surmontant les limitations des

méthodesexistantes etleurs appli ationsà desimple do uments.

- Cesrésultatsnousen ourageà ontinuerave ledédelare onnaissan edesymbole. Nous

désironsrépondreàlaquestionde ommentnouspouvonsappliquerlareprésentation

(12)

utilisant des représentations par imonieuses pour dé rire les symboles graphiques. Notre

ontribution est de proposer une solution qui passe par l'établissement d'unlien entre la

représentation par imonieuse et la onstru tion d'un vo abulaire visuel. Plus

spé ique-ment, nousappliquons l'algorithme d'apprentissage pour apprendre unvo abulaire visuel

basésurdesdes ripteurslo auxdénis surlesymbole. Ensuite,nousproposonsune façon

originale de onstruire un modèle ve toriel pour haque symbole à partir de sa

représen-tation par imonieusedanslevo abulaireetadaptonsl'appro he tdf-ifauxreprésentations

par imonieuses. Lesrésultats obtenus ont approuvé ette représentation par imonieuse et

montrequelemodèleve torielpeutaideràaméliorerlaperforman edelare her he. Nous

pensons que e travail ouvrira la voie à de nouvelles appli ations de re onnaissan e de

symbole étant donné que d'autres des ripteurs peuvent être appliqué pour la des ription

lo ale.

Nous omplétons ettethèseen proposantune appro he delo alisationde symboles dansles

do uments graphiques qui utilise les représentations par imonieuses pour oderun vo abulaire

visuel. Cette appro he utilise aussi les te hniques d'apprentissage pour adapter un

vo abu-laire visuel aux propriétés intrinsèques des ensembles de données du do ument. Ce i permet

d'atteindre une représentation plus par imonieuse que la repr±entation obtenue par utilisation

de bases prédenies. Dans e travail, notre ontribution se on entre surle pro essus de

lo al-isation des symboles. L'appro he proposée se dé ompose en deux étapes : l'étape de rappel et

deranage. Lebut prin ipalde ette ar hite ture estd'a élérerlepro essusde lo alisationen

utilisantlesreprésentationspar imonieusesetleste hniquesd'indexationetderéserverles

traite-ments oûteux en d'appariement uniquement pour les régions sur lesquellesle symbolerequête

peutapparaître. Lespremièresexpérien essurl'ensemblededonnéesSESYDpour l'appli ation

de lo alisationde symbolessemblent être ohérents etlesrésultatsobtenus sont prometteurs.

Mots lés: représentationspar imonieuses, di tionnaired'apprentissage,algorithme

appren-tissage,rédu tiondubruit,séparationtexte/graphique,re onnaissan edesymboles,lo alisation

(13)

List of Figures xv

List of Tables xix

1 Introdu tion 1

Related Works 7

2 Sparsity and Learning Di tionary 9

2.1 Sparse Representation . . . 9

2.2 PursuitAlgorithms . . . 12

2.2.1 Greedy Mat hingPursuits . . . 12

2.2.2 Basis Pursuit . . . 14

2.2.3

l

1

Lagrangian Pursuit . . . 16

2.3 Learning Di tionaries . . . 18

2.3.1 Core Idea for Learning Di tionary . . . 18

2.3.2 The K-SVDAlgorithm . . . 19

2.3.3 The MODAlgorithm . . . 20

2.3.4 The OnlineLearning Di tionary Algorithm . . . 22

2.3.5 The RLS-DLA algorithm . . . 22

2.3.6 Numeri al Demonstration ofLearning Algorithms . . . 22

(14)

Contributions on Do ument Analysis 25

3 Denoising Graphi al Do uments 27

3.1 Introdu tion . . . 27

3.2 ProblemStatement . . . 29

3.3 Do ument Degradation Models . . . 30

3.4 ProposedApproa h . . . 31

3.4.1 Learning Di tionaryfor Do umentsPat hes. . . 31

3.4.2 Energy NoiseModel . . . 32

3.5 Experimental Validation. . . 35 3.6 Con lusion . . . 40 4 Text/Graphi Separation 43 4.1 Introdu tion . . . 43 4.2 ProblemStatement . . . 45 4.3 ProposedApproa h . . . 47

4.3.1 LearnedDi tionaries for Text andGraphi Parts . . . 47

4.3.2 Text Regions Extra tion byLearned Di tionaries . . . 49

4.4 Experimental Validation. . . 51 4.5 Con lusions . . . 53 5 SymbolRe ognition 57 5.1 Introdu tion . . . 58 5.2 Shape Context . . . 60 5.3 Interest Points . . . 61

5.4 Shape Context of Interest Points . . . 64

5.5 ProposedApproa h . . . 65

5.5.1 LearnedDi tionary of SCIPs . . . 65

5.5.2 Visualve tor model . . . 66

5.5.3 RetrievalSymbol. . . 68

5.6 Experimental Validation. . . 68

5.6.1 Datasets andPerforman e Evaluation . . . 68

5.6.2 Studyof Parameters . . . 69

5.6.3 Invarian e and Robustness . . . 71

5.6.4 Unseensymbols . . . 72

(15)

6.1 Introdu tion . . . 79

6.2 Extension of Shape Context of Interest Points forDo uments . . . 81

6.3 Learned Di tionary ofESCIPs . . . 83

6.4 Do ument IndexingbySparsity overLearned Di tionary . . . 84

6.5 Lo ation symbolsinthe graphi al do uments . . . 85

6.5.1 Symbol Re all . . . 85 6.5.2 Symbol Rening . . . 87 6.6 ExperimentalValidation . . . 91 6.7 Con lusion . . . 93 7 Con lusions 95 Bibliography 99

(16)

(17)

1.1 Illustration of the image de omposition problem with sparse representation (ex-tra ted from[161℄) . . . 3

2.1 Left: Minimun

l − 1

solution of

Ax = h

for

M = 2

. Right: Geometry of

l

1

minimization for

M = 3

. . . 11

2.2 Performan e ofthe greedy mat hingpursuits algorithms . . . 14

2.3 Performan e ofIRLS andBP(using Matlab's Linear-Programming) algorithms . 15

2.4 The quality oftheobtained solutions . . . 17

2.5 Averagerepresentation errorsobtained at ea hiteration . . . 23

3.1 Classi ation of ImageDenoising Methods (extra tedfrom[124℄). . . 28

3.1 (a): Original binary symbol; from (b) to (g) examples of six levels of Kanungo noise of the GREC2005 dataset. . . 31

3.2 Learneddi tionaryobtainedbyusingthe K-SVDalgorithm onpat hesofthesize

8 × 8

extra ted from DIBCO images after 50 iterations (used in the experiment result). . . 32

3.3 Illustration thePoint SpreadFun tion. . . 34

3.4 Normalized ross- orrelationbetween two images. . . 35

3.5 Results of denoising the noisy images with Kanungo model at levels 1 and 2 of degredation. Columns 2 and 3 are the denoised images following ea h method. Columns4and5arethebinarizeddenoisedimagesof olumns2and3,respe tively. For the medianand OC, imagesarealreadybinarized in olumns2 and 3. . . 36

3.6 (a), ( ): zoom of denoised images by urvelet and our method,respe tively. (b), (d)denoised binaryversionrespe tively of(a)and ( ). . . 38

3.7 One ofthe s anned do uments. . . 39

3.8 Learneddi tionaryobtainedbyusingthe K-SVDalgorithm onpat hesofthesize

8 × 8

pixels extra ted fromreal s anneddo uments after50 iterations. . . 39

3.9 The denoised do ument of do ument inFigure 3.7gotbyour approa h. . . 41

3.10 (a) noisydo uments in DIBCO dataset used inTable3.6 ,(b) thedenoised do -uments got by our approa h before binarization and ( ) after binarization using Otsu's method . . . 42

4.1 Examples of graphi -ri h do uments. . . 46

4.2 Examples whenusingMCAwithunde imatedwaveletsand urveletsastwo over- omplete di tionaries

A

t

,

A

g

to extra t the Text/Graphi parts: (a)original do -ument,(b)thetext omponent, and ( )graphi omponent (extra tedfrom[72℄). 47

4.3 Dis riminativedi tionarytrainingsamples: (a)texttrainingsamples;(b)graphi training samples . . . 48

(18)

4.4 Azoomofthetraineddi tionaries with(a)

√

_s

k

= 8

,(b)

√

_s

k

= 16

;Graphi (left) and Text (right) . . . 50

4.6 Theoptimalvalueof

k

0

followingthesizeofthepat h forthegraphi di tionaries (left)and text di tionaries (right) interm of averagerepresentation error . . . 51

4.5 Example of de omposing input image into

K

sets of non-overlapped pat hes by using

K

sliding windows: (a) inputimage, (b)

K

sets of non-overlapped pat hes for therstfour pat hes inea h set. . . 51

4.7 Examples using two sequen es of over- omplete learned di tionaries to separate the Text/Graphi parts: (a) original do ument, (b) nal graphi layer, and ( ) naltext layer. . . 53

4.8 Behaviour of the sparsity of the texts and the noise omponents in the text di -tionaries (left)and graphi di tionaries (right). . . 54

4.9 Exampleto illustrate how tofurther lter outnoise omponents, thevalueofthe thresholdis

6

. . . 54

4.10 Do uments usedinthe evaluationof Table 4.2: Origial images (left),Textlayers (midle),Textextra tion (right). . . 55

5.1 Illustrationabout howto ompute the shape ontext . . . 61

5.2 Example of a nite multi-model for

W = 10

and

θ = π/4

(above). The orre-spondingLapla e ofGaussian attwo s ale

σ = 1

(left),

σ = 5

(right). . . 63

5.3 Pro ess to reatethe DoGimages. . . 63

5.4 Maxima and minima of the DoG images are dete ted by omparing the pixel (marked byred olor) neighbors in the urrent and adja ent images (marked by

bla k olor) . . . 64

5.5 Therelative log-polar oordinates of

c

j

withregard to

p

i

. . . 65

5.6 The

24

atoms ( olumns) of learned di tionary built from SCIP des riptors of symbols inGREC2003 database. . . 66

5.7 Examples aboutthe symbolsin dierent datasets . . . 69

5.8 Retrievalsymbolson CVC dataset whenrequest (rst olumn)isrotated, s aled. 74

5.9 Examples of querying symbols a hieving the best and worst retrieval results in termsof the AUC-PR value. Values orrespond to the osine distan e. . . 75

5.10 SomeretrievalexamplesinCVCdataset: the querysymbolisintherst olumn; other olumns arethe nearestmat hesranked from leftto right.. . . 76

5.11 SomeretrievalexamplesinGRECdataset: thequerysymbolisintherst olumn; other olumns arethenearestmat hesranked from leftto right.. . . 77

6.1 Exampleof thegraphi do uments . . . 80

6.2 Oneof the s anneddo uments . . . 81

6.3 An exampleabout ESCIPdes riptor at interestpoint

p

i

j

of do ument. . . 82

6.4 The

36

atoms( olumns)oflearneddi tionaryobtainedbyusingESCIPdes riptors asthetrainingdataset. . . 83

6.5 Illustrationof a solutionto the opimization problemover learned di tionary. . . . 84

6.6 Theinverted lestru ture . . . 85

6.7 Example about how to lo ate an interest region in the do ument (right) being orrespondingto therequest symbol(left) . . . 86

6.8 (a): request symbol,(b): orrespondinginterestregions inthedo uments . . . . 87

6.9 Exampleof situation where ontourpointsdo not belongto theinterest region. 88

(19)

(20)

(21)

1.1 Classi ation a ura y for the digitsand texture data

[%]

. Onlytesting samples are randomly transformed for the digits(su htransformations arenatural in the

texture dataset)(extra ted from[6℄) . . . 4

2.1 Time

×10

−2

(se onds) performan e of dieren e methods orresponding to the ardinality ofthe true solution . . . 16

3.1 The valueof

r

¯

at sixlevel ofnoise. . . 36

3.2 Summary of the denoising results inGREC 2005. (a), (b) are level 1, level 2 of noise, respe tively. . . 37

3.3 Averagevalue gainedbyJa ard's similaritymeasure. . . 38

3.4 Averagevalues gainedbyMSE. . . 38

3.5 The obtained resultswith s anneddo umentsusing SSIMmeasure . . . 40

3.6 The obtained resultswith DIBCO2009 dataset using SSIMmeasure. . . 41

4.1 The sizesof thetrainingdatabases. . . 49

4.2 Performan eevaluation: (seeFigure4.10)with

T

0

setinFigure4.6and

T

k

= 16; 32

for

√

_s

k

= 8; 16

. . . 53

5.1 AverageAUC-PR valuesfor the rotation ands ale dataset. . . 70

5.2 AverageAUC-PR valuesfor the CVCdataset. . . 70

5.3 Bestvalues for the datasets. . . 71

5.4 Retrievalee tiveness withAUC-PR values indierent datasets . . . 71

5.5 Averageof the AUC-PRvalues onsidering several per entages of trainingset size. 72 6.1 Spotting results forqueries inthe olumn

1

. . . 92

(22)

(23)

Introdu tion

The exponential development of the storage apa ity of omputers over the last few de ades

allowed to multiply the digital resour es and soto fa ilitatenumeroustasks. However, infront

of this in rease, raises a signi ant hallenge of nding the relevant information on the huge

amount ofdata. Theneed to develop fastand e ient methods is thus ompulsory.

In the eld of do ument analysis, the pro ess of retrieving the information has been done

basedon the analyses ofthe stru tural information ontained inthedo ument. Ingeneral, the

stru tural information in digital do uments are partitioned into a hierar hyof physi al

ompo-nents, su h aspages, olumns, paragraphs, text lines, words, tables, gures,et .; a hierar hy of

logi al omponents,forexampletitles,authors,aliations,abstra ts,se tions,et .;orboth.

De-pendingonea hkindofstru tural information,andtheformstorepresent thedo ument layout,

thereexist dierent layout analysis algorithms. For example,with thephysi al layout analyses,

algorithms anbe ategorizedinto three lassesin ludingtop-downapproa hes[127,5℄, bottom-up approa hes [131, 88, 176,59,75℄and hybrid approa hes [138℄. With the logi al layout rep-resentationsand analyses,some typi al algorithms an be mentioned in ludingthemodel-based

system [181, 89℄, rule-based system [58℄, or frame-based system [32℄, et . More details about do ument stru ture analysisalgorithms an be foundinthe survey ofHarali k[68℄, Nagy[126℄, Jain etal [76℄,and Mao[111℄.

Our work in this domain fo uses on the te hni al do uments. In parti ular, we develop

pre-pro essing te hniquesto upgrade the qualityof the do ument image or to redu e the noise

generated from the input devi es; we study a new method in text/graphi segmentation, in

des ribing graphi al symbols; and nally a new approa h for the spotting problem have been

also proposed. However, instead of developing the methods mentioned above, we approa h

these problems in dierent dire tion that have been developing widely in the omputer vision

ommittee when dealing with s ene images, but not really in te hni al do uments. It is the

approa hesusingsparserepresentation overlearned di tionary. Ingeneral, inthedomain ofthe

sparserepresentation, elementary atoms hoseninafamily fun tions alleddi tionaryrepresent

the relevant information. As theresults, sear hing this information means nding these atoms.

However, how obtaining an ideal di tionary adapted to all images in the large database, how

nding good sparse representations over this di tionary are fundamental questions and have a

su essful history. One of the rst su esses, onsidered the key to open the door to a huge

jungle, is the dis overy of wavelet orthogonal bases and lo al time-frequen y di tionaries. The

su esses of sear hing good sparse representations over redundant di tionaries have helped to

improve the performan e not only of image denoising, but also of thesour e/image separation

(24)

Image Denoising

A su essfuldenoising method using sparsity is rst introdu ed by Eladet al. [43,42℄. In that approa h, the authors used the assumption that pat h of lean image an be approximated by

a sparse linear ombination of elements from a di tionary

A

. Another pat h-based approa hes an be foundin the worksof Buades et al. [16℄,Roth et al. [144℄. In general, denoisinga pat h

h ∈ R

L

with a learned di tionary

A ∈ R

L×M

orresponds to solve the sparse de omposition

problem [169,20℄ with

h

is thepat h of noisy image and

A

iseither thepre-dened di tionary as wavelets, ridgelets, urvelets,..., or a learned di tionary adapted to the pat hes of images.

Following [43,41,107,106,105℄,theenergyof noise

ǫ

anbe hosena ordingto the(supposed known) standard deviation

σ

of the noise. The value of

ǫ

is proportional to both the noise varian e andimage size [41,161,109℄. Mairal et al. [107,106℄ usedthe umulative distribution fun tion

F

m

ofthe

χ

2 m

distributionand hoose

ǫ = σ

2 _F

−1

m

(τ )

,where

F

−1

m

(τ )

istheinverse of

F

m

. However,inthespe i appli ationsaboutnoiseredu tiononimages,weusuallydonotknow

pre isely the modelof noise found inimages, or inother words, thenoise varian e is unknown.

In addition, the noise in do uments is dierent ompared to the noise in natural images that

aregenerated bydevi es likedigital ameras orsimilar. Thus, we annotusethenoise varian e

to de ide thevalue for

ǫ

. In Chapter(3) , we propose anenergy noise modelwhi h allows us to easierset the thresholdrequired for noise removaleven ifthenoise modelis unknown.

Image Separation

Image separation problemis the extention of the sour e separation problem being fundamental

inpro essing a ousti signals. Ingeneral, suppose thattheobserved signal

h

isa superposition of two dierent sub-signals

h

1

,

h

2

,or

h = h

1 + h

2

where

h

1

is sparsely generated using the model with the di tionary

A

1

and

h

2

is sparsely generatedusingothermodelwiththedi tionary

A

2

. Iftheoptimalsolutions

(¯

x

1 , ¯

x

2 )

areobtained by solvingtheEquation (1.1) ,

(¯

x

1 , ¯

x

2 ) = min

x

1 ,x

2 kx

1 k

0 + kx

2 k

0

subje tto

kh − A

1 x

1 − A

2 x

2 k

2 ≤ ǫ

1 + ǫ

2

(1.1) then,thesolutions to the separation problemare al ulated by

¯

h

1 = A

1 × ¯x

1

,

¯

h

2 = A

2 × ¯x

2

. This is the basis idea of Morphologi al Component Analysis (MCA) algorithm. The su ess of

theseparation MCA algorithm is the rolesof di tionaries

A

i

in adis riminant between ontent types,preferring the omponent

h

i

overtheotherparts. Figure(1.1) presentsone examplehow sparserepresentation is usedto solve the image de omposition problem.

Clearly,inMCAalgorithm,oneoftheimportantquestionsiswhataretheproperdi tionaries

forthesekindsof ontents. Toidentifysu hdi tionaries,weneed toknowthe ontentsofimages.

In fa t, in[41℄ the separation of the ontent of images hasbeen done basedon the assumption thatimages are ombined linearlyof artoon and textureparts and thepre-dened di tionaries

are hosenbyexperien eofthe authors. Following theworksin[160,44,14,159℄, andidate pre-deneddi tionariesfortexturepart anbe(lo al)dis rete osinetransform,orGabortransform,

while the andidate for artoon parts in lude the bi-orthogonal wavelet transform, isotropi à

trous algorithm, the lo al ridgelet transform, or the urvelet transform. Be ause of making a

hoi efromonetransformtoanotherisusuallydonebyexperien e,soitisnotadaptedtovarious

(25)

from[161℄)

the di tionaries are learned by applying K-SVD algorithm on the orrupted pat hes of images

itself. The separation is very su essful, and one of the other remark is that this results are

a hieved without the need to pre-spe ifythedi tionaries for thetwo parts. Learned di tionary

ombinedwithsparsityisalso found inthe work ofPan et al. [135℄andZhao et al. [192℄. However, the performan e of above methods depends strongly on the size of the pat h. In

fa t, ifthe size of the pat h is too large, its sparserepresentation ve tor is large. It means the

omputing osttimewillin rease. Ifthissizeistoosmall,it ouldnot ontainenoughinformation

fordis rimination. Therefore,toover omethisshort oming,weproposeamethodintheChapter

(4) using multi-resolution learned di tionaries for separating text parts from graphi alones. To the best of our knowledge, it is the rst times multi-resolution learned di tionaries have been

usedfor the separation task. In general, the proposed method is a pat h-based approa h using

the assumption that the representations of text andidate pat hes in text learned di tionaries

aresparse,but they arenot sparseenough ingraphi learned di tionaries.

Classi ation

Therearesomepreviousworksthatusesparsityoverlearneddi tionarytoenterin lassi ation

tasks and then improve the lassi ation performan e. To the best of our knowledge, the rst

approa h is the work of Raina etal with theself-taught learning (STL) method [141℄. In [141℄, thedi tionaryislearned fromanunlabeleddataset, thenthesparse oding oe ientsobtained

when odingelementsofthelabeleddatasetserveasfeatureswhi harefedintoanSVM lassier

(seeAlgorithm(1) ). STLalgorithm isverye ientinthe asethatthetraining andtestingsets arealigned.

Algorithm 1 STLalgorithm

1:

(A, X) ←

TrainDi tionary

(H)

2: Learn a lassierC bya linearSVMwith inputisX

3:

X

test

←

Lars

(H

test

, A)

%Larsisafun tionusedtonda sparsesolution(see Chapter2) 4: Classify the set

X

test byC

(26)

Dataset

θ[

0 _]

S ale STL HIA 10digits

±50

1 59.1 86.0

±50

±0.2

55.6 83.6 3 Textures

−

76.4 94.7 4 Textures

−

75.6 91.2

Table1.1: Classi ation a ura y for the digits and texturedata

[%]

. Onlytesting samples are randomly transformed for the digits (su h transformations are natural in the texture dataset)

(extra tedfrom [6℄)

etal.[6℄. Thisapproa h integratestheideas ofthesparserepresentation theoryandhierar hi al stru tures. Infa t, theauthors usedalog-polar mappingto onvertrotatedand s aledpatterns

into shiftedpatternsinthe newspa ewhere theyoperateforlearningthedi tionaryandadding

hierar hy. This approa h is the extension of the self-taught learning (STL) method [141℄ to a hierar hi alar hite tureofdi tionarylearning. Authorsapprovedthatthishierar hi alapproa h

performsbetterthan thelayeredone,andbyusinglearned di tionaryinsteadofpre-dened one.

The results for lassifying the handwritten digits and texture images are improved (see Table

(1.1) ).

Thereis otherstate-of-the-art algorithm based ondis riminative di tionary learningmodels

thatgivesgoodperforman esin lassi ationtasks[114,115,112℄. Theauthorsin[114,115,112℄ proposed a formulation for learning a di tionary tuned for a lassi ation task whi h is alled

the supervised di tionary learning. It is a dis riminative approa h that ee tively exploits the

orresponding sparse signal de ompositions in image lassi ation tasks. This work aorded

an ee tive method for learning a shared di tionary and multiple (linear or bilinear) de ision

fun tions. The experiments done on MNIST [91℄ and USPS [73℄ handwritten digit datasets showspromisingresultsforthisapproa h. Infa t,thisapproa ha hievedstate-of-the-artresults

onMNISTwitha

0.54

per enterrorrate,whi hwassimilartothe

0.60

per enterrorratein[142℄, and

2.84

per ent errorrateon USPSwhi hwasslightly behindthe

2.4

per enterror ratein[65℄. Theauthorsin[182℄presentedamethodthat omputesaspatial-pyramid image representa-tionbasedonsparserepresentation oflo alfeaturesforimage lassi ation. Thismethodusesa

sele tive sparse representation insteadof traditionalve tor quantization to extra t salient

prop-ertiesoflo aldes riptors. Inaddition,sparserepresentationisusedtooperatelo almaxpooling

onmultiple spatials alesto in orporatetranslationand s aleinvarian e. An en ouragingresult

ofthis paperisthelo aloffeaturesusedinsteadofan imageor pat hesofimage inthesparsity

framework. This method is approved and works well when ombined with simple linear SVMs

for improving the s alability oftraining, the speed of testing, andthe lassi ation a ura y.

The mentioned methods above usually working on the images, the pat hes extra ted from

images, or even being developed to adapt to lo al features of images as in [182℄. However, in the best of our knowledge, these methods gave no hara teristi of the invarian e of sparse

representations. Itmeansiftheimage/pat his hangedundersometransformationssu hass ale,

rotation,degradation, et .,thenits orrespondingsparserepresentationisnotsimilar(oralmost

similar)to thesparserepresentationoforiginal imageinthesamedi tionary. Inaddition,inthe

ontext ofsymbolre ognition, thedes riptionof the symbolneeds theinvarian e riteria under

theanetransformations. Thus,inthe Chapter(5),ourworkinghypothesisisthat'ifalearned di tionaryisoptimizedbytakingintoa ountthedatapropertiesderivedfromades riptor,thus

it not only spe i ally adapted to the des riptor but also provides the optimal approximation

(27)

approa h have been approved that it isnot only keeping theinvariant riteria underthe ane

transformationsbut alsoimproving theperforman eof thesymbolretrievalsystem.

Contributions and Organization of the thesis

The ontributionspresentedinthisthesisfollowthelinesofhowgoodsparserepresentationsover

redundant di tionaries an help to in rease the performan e of noise redu tion, text/graphi al

separation,pattern re ognition, and inlo alization of elements into graphi do uments su h as

ar hite tural or ele tri planes. Before presenting our ontributions intheeldofsymbol

re og-nition, do ument pro essing and the good performan es a hieved by our algorithms ompared

tothestate-of-the-art, wereviewinthisthesisthemotivationof onstru ting thelearned

di tio-naryinsteadofusingpre-deneddi tionaryande ientalgorithmi toolsforbuildingalearning

di tionary. The methods used to solve the sparsity problem over learned di tionary are also

shown.

Ourworksinthisthesis willbe reviewed infollowing organization:

•

The rst part in ludes one hapter, hapter number (2), whi h des ribes the ba kground of sparse representation and related worksinvolving to thesparsity. In this hapter, the

denitions about the sparse representation and learned di tionary are presented. Detail

reviews ofstate-of-the-artalgorithmson ndingthesparserepresentation, onbuildingthe

di tionary are also shown and dis ussed arefully. Numeri al experiments areperformed

with the purpose of evaluating the omplexity and nding out the advantage as well as

disadvantageof thesealgorithms for ea hparti ular problem.

•

The se ond partisa lowpro essingfor do ument ltering and text/graphi separation: - Wepropose in hapter (3) analgorithm for de-noising do ument imagesusingsparse

representations. Following a training set, this algorithm learns not only the main

do ument hara teristi s, but also the noise in luded into the do uments. In this

perspe tive, we propose to model the noise energy based on the normalized

ross- orrelation between pairsof noisyand non-noisy do uments.

- A new approa h to extra t text regions from graphi al do uments is presented in

hapter (4) . In the proposed method, we rst empiri ally onstru t two sequen es of learned di tionaries for the text and graphi al parts respe tively. Then, we

om-pute the sparse representations of all dierent sizes and non-overlapped do ument

pat hes in these learned di tionaries. Based on these representations, ea h pat h

an be lassied into the text or graphi ategory by omparing its re onstru tion

errors. Same-sizedpat hesinone ategoryarethenmergedtogethertodenethe

or-responding text or graphi layers, whi h are ombined to reate a nal text/graphi

layer. Finally, in a post-pro essing step, text regions are further ltered out using

some learned thresholds.

•

The third partisour dedi ations insymbolre ognition and spottingsymbols:

- In hapter (5),we propose anew approa h for symboldes riptionbasedon the om-binationbetween shape ontextsof interestpoints(SCIP)des riptor withsparse

rep-resentation over learned di tionary. More spe i ally, a di tionary is learned from

(28)

is onsidered as a visual word. Next, a ve tor model for the symbol is onstru ted

based on sparserepresentations of its SCIPs in thelearned di tionary and the tf-idf

approa h adaptedto thesparsity.

- We propose an approa h to deal with theproblem of symbol spotting for graphi al

do uments using sparse representations in hapter (6). In the proposed method, a di tionary is learned from a training database of lo al des riptors dened over the

do uments. Following their sparse representations, interest points sharing similar

propertiesareusedtodeneinterestregions. Usinganoriginaladaptationof

informa-tionretrievalte hniques,ave tormodelforinterestregionsandforaquerysymbolis

built basedonits sparsityinavisual vo abularywhere thevisualwordsare olumns

in the learned di tionary. The mat hing pro ess is then performed omparing the

similaritybetween ve tor models.

Publi ations

The results showninthis thesisarereported fromour following publi ations:

1. T-H Do, S. Tabbone and O. Ramos-Terrades, Noise suppression over bi-level graphi al

do umentsusing a sparserepresentation,CIFED 2012,Bordeaux, Fran e.

2. T-H Do, S.Tabbone andO. Ramos-Terrades, Text/graphi separation usinga sparse

rep-resentation withmulti-learned di tionaries,ICPR2012, Tsukuba,Japan.

3. T-H Do, S. Tabbone and O. Ramos-Terrades, New Approa h for Symbol Re ognition

Combining Shape Context of Interest Points with Sparse Representation. ICDAR 2013,

Washington DC,USA.

4. T-HDo, S.Tabbone andO.Ramos-Terrades, Do ument NoiseRemovalusingSparse

Rep-resentationsoverLearned Di tionary,Do Eng 2013, Floren e,Italy.

5. T-HDo,S.TabboneandO.Ramos-Terrades,SpottingSymbolusingSparsityoverLearned

Di tionary of Lo al Des riptors, the 11th IAPR International Workshop on Do ument

Analysis Systems,2014, Tours, Fran e.

6. T-H Do, S.Tabbone and O. Ramos-Terrades, SparseRepresentation over Learned

(29)

(30)

(31)

Sparsity and Learning Di tionary

Contents

2.1 Sparse Representation . . . 9

2.2 PursuitAlgorithms . . . 12

2.2.1 GreedyMat hingPursuits . . . 12

2.2.2 BasisPursuit . . . 14

2.2.3

l

1

LagrangianPursuit. . . 16

2.3 LearningDi tionaries. . . 18

2.3.1 CoreIdeaforLearningDi tionary . . . 18

2.3.2 TheK-SVDAlgorithm . . . 19

2.3.3 TheMODAlgorithm . . . 20

2.3.4 TheOnlineLearningDi tionaryAlgorithm . . . 22

2.3.5 TheRLS-DLAalgorithm . . . 22

2.3.6 Numeri alDemonstrationofLearningAlgorithms . . . 22

2.4 Con lusion . . . 23 This hapterdes ribestheba kgroundofsparserepresentationandrelatedworksinvolvingto

thesparsity. Inparti ular,the denitionsaboutthesparserepresentation andlearneddi tionary

arepresented. Detailreviews ofstate-of-the-art algorithmson ndingthesparserepresentation,

on building the di tionary are also shown and dis ussed arefully. Numeri al experiments are

performed with the purpose of evaluating the omplexity, nding out theadvantage as well as

disadvantageof these algorithmsfor ea h parti ular problemand to setthe hoi efor our work.

2.1 Sparse Representation

A signal

h ∈ R

L

is stri tlyor exa tlysparseifmost of its entries areequal to zero, or the

ardi-nality ofthe supportofthe signal

#{1 ≤ i ≤ L|h

i

6= 0}

is mu hlessthan

L

. A

k−

sparsesignal is a signalthat has exa tly

k

nonzero entries. We an present a signal as a linear ombination of

k

olumns(oratoms)of a given over- omplete di tionary

A

,su h as

h = Ax =

k

X

i=1

a

i

x

i

(2.1) Mathemati ally, if

A = {a

1 , a

2 , . . . , a

M

} ∈ R

L×M

_{, M ≫ L}

is a full

−

rank matrix, then the underdeterminedlinear systemof equations (Equation (2.1)) will have innitely many dierent setsof valuesfor the

x

's(solutions) thatsatisfyitsimultaneously.

(32)

For example,the following system

(

x

1 + 3x

2 − 2x

3 = 5

3x

1 + 5x

2 + 6x

3 = 7.

Thesolutionset

x = {x

1 , x

2 , x

3 }

tothissystem anbedes ribedbytheequations:

x

1 = −7x

3 −1

and

x

2 = 3x

3 + 2

with

x

3

isthe freevariablewhile

x

1

and

x

2

aredependent on

x

3

. Thedierent options for the free variables may lead to dierent des riptions of the same solution set: If

x

1

is the free variable, and

x

2

and

x

3

are dependent, then we have

x

2 = −3/7x

1 + 11/7

and

x

3 = −1/7x

1 − 1/7

. Thus, ea h freevariablegivesthe solutionset.

In above example, the sets of su h these

x

an be des ribed using mathemati al language. However, from the appli ation point of view,one ofthemain tasks indealing theabove system

ofequations istond theproper

x

that an des ribe

h

well omparingwithothers. Infa t,this task is the omputing and optimizing a signal approximation by hoosing the best di tionary

sub-set olumns. Asthe results, to gain this well-dened solution, a fun tion

f (x)

is added to assess thedesirability ofa would-be solution

x

,withsmallervalues being preferred:

(P

f

) : min

x

f (x)

subje tto

Ax = h

If

f (x)

isthe

l

0

pseudo-norm

kxk

0

(numbernonzeroelementsinve tor

x

),thentheproblem

(P

f

)

be omes ndingthesparserepresentation

x

of

h

satisfying:

(P

0 ) : min

x

kxk

0

subje tto

Ax = h

(2.2)

In general, solving Equation (2.2) is often di ult (NP-hard problem) and, therefore, is omputationally intra table. It is ne essary to nd su iently fast algorithms omputing

sub-optimally, but 'good enough' solution. One of the hoi es is using greedy algorithms, su h as

Mat hingPursuit (MP)[110℄,Orthogonal-MP (OMP) [137℄, Weak-MP [167℄,and Thresholding algorithm. In general, a greedy algorithm is an algorithm that follows solving the problem

heuristi of making the lo ally optimal single term updates with the hope of nding a global

optimum. Inthis ase,thesetofa tive olumnsstartedfromemptyismaintainedandexpanded

by one additional olumn of

A

at ea h iteration. The olumn hosenis olumn that maximally redu estheresidualerror

l

2

inapproximating

h

fromthe urrentlya tive olumns. Theresidual error

l

2

isevaluatedafter onstru tinganapproximantin ludingthenew olumn;ifitfallsbelow a spe iedthreshold, the algorithm terminates.

The other hoi e is to relax

l

0

-norm by repla ing it with

l

p

-normsfor some

p ∈ (0, 1]

or by smoothfun tionssu has

P

i

log(1 + αx

2 i

),

P

i

x

2 i

/(α + x

2 i

)

,or

P

i

(1 − exp(−αx

2 i

))

. Theex iting algorithmofthisfamilyistheFO alUnderdeterminedSystemSolver(FOCUSS)byGorodnitsky

andRao[64℄. Inthisalgorithm,the

l

p

-norm(forsomexed

p ∈ (0, 1]

)isrepresentedasaweighted

l

2

-norm by usingIterative-Reweighed-Least-Squares (IRLS)method.

Another popular strategy is to repla e the

l

0

-norm by the

l

1

-norm proposed by Donoho et al [35℄

(P

1 ) : min

x

kW

−1

_xk

1

subje t to

Ax = h

(2.3)

Thematrix

W

is adiagonal positive-denite matrix. A natural hoi efor theentry in

W

is

w(i, i) = 1/ka

i

k

2

. Let

x = W

˜

−1

_x

,thenEquation(2.3) isre-formulated as

(P

1 ) : min

˜

x

k˜xk

1

(33)

x

2 x

1 h

¯

x

B

τ

x

2 x

1 x

3 ¯

x

B

τ

Figure 2.1: Left: Minimun

l − 1

solution of

Ax = h

for

M = 2

. Right: Geometry of

l

1

minimizationfor

M = 3

.

inwhi h

A

˜

isthe normalizedversionof

A

. Equation(2.4)isthe lassi basispursuitformat,and thesolution

x

an be foundbyde-normalizing

x

˜

. Thus,

(P

1 )

usuallybeused withanormalized matrix.

Be ause the purpose is to nd the sparse representation in the solution, so, when onvex

l

0 −

norm to

l

1 −

norm thenbasis pursuit hasto ensure to re over sparse solutions and that an be explained using a geometri interpretation of basis pursuit as following: Let

P

is the ane subspa eof

R

M

of oordinateve torsthat re over

h ∈ R

L

P = {x ∈ R

M

: Ax = h}

A basis pursuit nds in

P

an element

x

¯

of minimum

l

1

-norm. It an be found by inating the

l

1

-normball

B

τ

byin reasing

τ

until

B

τ

interse ts

P

.

B

τ

= {x ∈ R

M

: kxk

1 ≤ τ} ⊂ R

M

This geometri onguration is depi ted for

M = 2

and

M = 3

in Figure (2.1) . Thus, the optimal solution

x

¯

is likely to have zeros or oe ients lose to zeros when it is omputed by minimizing an

l

1 −

norm.

The solution for

(P

1 )

problem an by found by some existing numeri al algorithms, su h as Basis Pursuit by Linear Programming [20℄, IRLS (Iterative Reweighed Least Squares) (for

p = 1

) [28℄. Following [109℄, basis pursuit is omputationally more intense than a mat hing pursuit,butitisamoreglobal optimizationthatyields representationsthat anbe moresparse.

Ifthere existssomeappropriate onditions on

A

and

x

,like

kxk

0 ≤

1

2 (1 +

1 max

i6=j

|a

T

i

a

j

|

ka

i

k

2 ka

j

k

2 )

thenBasi PursuitandOMPgive theunique solutionof (2.4) and itisalso theunique solution of

(P

0 )

.

Sometimes, the exa t onstraint

h = Ax

is hanged by relaxed one by using the quadrati penaltyfun tion

Q(x) = kAx−hk

2

2 ≤ ǫ

,with

ǫ ≥ 0

istheerrortoleran e. Thus,anerror-tolerant versionof

(P

0 )

isdened by:

(P

₀

ǫ

) : min

x

kxk

0

subje tto

kAx − hk

2 ≤ ǫ

In

(P

ǫ

0 )

the

l

2

-normusedforevaluationthe errorof

Ax − h

anberepla edbyotheroptions, as

l

1 , l

2

, or

l

∞

. We an see one of advantages of this hange through noise removal. Assuming

(34)

that signal

h

has noise

e

withnite energy

kek

2

2 ≤ ǫ

2

,

h = Ax + e

. Solving

(P

ǫ

0 )

an help us to nd thesolution

x

¯

thatbases on itwe an ndthe unknown denoised signal

¯

h

by

h = A¯

¯

x

.

Similarly, when relaxing

l

0

-norm to a

l

1

-norm, we get

(P

ǫ

1 )

known in the literature asbasis pursuitdenoising (BPDN)

(P

₁

ǫ

) : min

x

kxk

1

subje t to

kAx − hk

2 ≤ ǫ

(2.5) The solution to

(P

ǫ

1 )

is pre isely the solution to the following un onstrained optimization problem

(Q

λ

₁

) : min

x

λkxk

1 +

1

2 kAx − hk

2

(2.6)

wheretheLagrangemultiplier

λ

isafun tionof

A, h

and

ǫ

. Inthestatisti alma hinelearning ommunity, theproblem

(Q

λ

1 )

is usedfor regression over a omprehensive set of features being the olumns of

A

. Its goal is to nd a simple linear ombination of a few features that ould explain the ve tor output of a omplex system

h

. Thus, solving

(Q

λ

1 )

not only provides a way to get su h regression, but it also sele ts a small subset of features.

(Q

λ

1 )

is well-known under the name LASSO (Least Absolute Shrinkage and Sele tion Operator) that was on eived by

Friedman, Hastieand Tibshirani [170℄. The LASSO team and Efron also proposed an ee tive algorithm, alled LARS (Least Angle Regression Stagewise) [40℄ that guarantees the solution path of

(Q

λ

1 )

isthe global optimizer. The generalized version of

(Q

λ

1 )

has the form in Equation (2.7) where

ρ(.)

is any 'sparsity-promoting'fun tion. For example,when

ρ(x) = |x|

,1

T

_{ρ(x) = kxk}

1

,we getEquation(2.6).

min

x

λ

1

T

_{ρ(x) +}

1

2 kAx − hk

2

(2.7)

MinimizationEquation(2.7) an be treated usingvarious lassi iterative optimization algo-rithms, asSteepest-Des ent, Conjugate-Gradient, or interior-point algorithms. However, in the

asefor high-dimensional problem,these methodsperform verypoorly[41℄. Thus, anew family ofnumeri al algorithmshasbeendeveloped, alledIterative-Shrinkage algorithms. Someof

algo-rithms inthis family in lude Stage-wise Orthogonal-Mat hing-Pursuit (StOMP) algorithm [26℄, EMandBound-Optimizationapproa hes[57,55℄,IRLS-basedshrinkage algorithm,and Parallel-Coordinate-Des ent(PCD) algorithm [41℄.

2.2 Pursuit Algorithms

In this se tion, we will des ribe deeper algorithms mentioned in Se tion (2.1). Following the problem that we want to deal with

(P

0 )

,

(P

ǫ

0 )

;

(P

1 )

,

(P

ǫ

1 )

; or

(Q

λ

1 )

, we divide algorithms into three lasses in luding the greedy mat hing pursuit, the basis pursuit, and the

l

1

lagrangian

pursuit, respe tively.

2.2.1 Greedy Mat hing Pursuits

Mat hing pursuit algorithm introdu ed byMallat andZhang [110℄ omputessignal approx-imations from the over- omplete di tionary. Inthis method,the olumn (or atom) of the

di tionaryis sele tediteratively onebyone.

Let

A = {a

1 , a

2 , ..., a

M

} ∈ R

L×M

isanover- ompletedi tionaryhavingaunitnormand

A

in ludes

M

linearlyindependent olumns. Themat hingpursuit begins byproje ting the

(35)

signal

h = {h

1 , h

2 , ..., h

L

}

on a olumn

a

i

∈ A

and omputing theresidue

r

1

with

r

0 _{= h}

.

h = a

i

×

L

X

j=1

h

j

a

i

[j] + r

1 = hh, a

i

ia

i

+ r

1

Be ause all olumns in

A

are linearlyindependent, so

r

1

isorthogonal to

a

i

. It means

khk

2 = |hh, a

i

i|

2 + kr

1 k

2

Tominimizeresidue

kr

1 _k

2

,weneedto hoosethe olumn

a

i

su has

|hh, a

i

i|

isthemaximum. Thispro edureisiteratedbysub-de omposing theresidue

r

1

. Forinstan e,assumingm-th

order residual

r

m

is already al ulated for

m > 0

, then in thenext iteration, we need to hoosethe olumn

a

i

m

thatmaximize

|hr

m

_{, a}

i

m

i|

.

Weak-Mat hing-Pursuit ismat hingpursuitbut rather than sear hingfor thelargest

inner-produ t value

|hr

m

_{, a}

i

m

i|

the sele ted olumn

a

i

m

is the olumn that satises Equation

(2.9) where

α ∈ (0, 1]

.

|hr

m

, a

i

m

i| > α sup

j=1,...,M

|hr

m

_{, a}

j

i|

(2.9)

Orthogonal Mat hing Pursuit improvesthemat hingpursuitapproximationsby

orthogonal-izing the dire tions of proje tion. The sele ted olumn

a

i

m

now have to be orthogonalto

thepreviouslysele tion olumns

{a

i

k

}

k=1,...,m

. This anbedonebyproje ting theresidues onan orthogonalfamily

{u

k

}

1≤k<m

omputedfrom

{a

i

k

}

1≤k<m

usingGram-S hmidt algo-rithm. Let

u

0 = a

i

0

,thenGram-S hmidt algorithm omputes

u

m

byEquation(2.10) .

u

k

= a

i

k

−

k−1

X

l=0

hu

l

, a

i

l

i

ku

l

k

2 u

l

(2.10)

and the residue

r

k

is proje ted on

u

k

insteadof

a

i

k

r

k

=

hr

k

_{, u}

k

i

ku

k

2 u

k

+ r

k+1

(2.11)

Summing theEquation(2.11) for

0 ≤ k < m

,and withthe

r

0 _{= h}

yields:

h =

m−1

X

k=0

hr

k

_{, u}

k

i

ku

k

2 u

k

+ r

m

Let

P

V

k

betheorthogonalproje toronthespa e

V

k

generatedby

{u

k

}

0≤k<m

,thenforany

k ≥ 0

theresidual

r

k

isthe omponent of

h

that isorthogonal to

V

k

:

r

m

_{= h −}

m−1

X

k=0

hr

k

_{, u}

k

i

ku

k

2 u

k

(2.12)

Theorthogonalmat hingpursuitsele t

a

i

m

thatmaximize

|hr

m

_{, a}

i

m

i|

with

r

m

is al ulated using Equation(2.12) .

(36)

1

3

5

7

9

11

13

15

0

0.1

0.2

0.3

0.4

0.5

Cardinalityofthetruesolution

A v erage of

l

2

-Error OMP MP Weak-MP

(α = 0.5)

1

3

5

7

9

11

13

15

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

A v erage of distan e

(

¯ S

,S

)

OMP MP Weak-MP

(α = 0.5)

Figure2.2: Performan e ofthe greedy mat hingpursuits algorithms

Numeri al Demonstration of Greedy Mat hing Pursuits We on ludethedis ussionon

the greedy mat hing pursuits algorithms by omparing their behaviors on a simple ase.

To dothat, arandom di tionary of thesize

50 × 100

with entries drawn fromthe normal distribution, and a sparseve tors

x

with independent andidenti ally-distributed (iid) ran-domsupports of ardinalities intherange

[1; 15]

are reated. Using

l

2

-norm to normalize the olumns of random di tionary, we get the normalize di tionary

A ∈ R

50×100

and a

ve tor

h = Ax

is omputed on e

A, x

isgenerated. Now,we onsider

A, h

astheinputfor theMP,OMPor weak-MP algorithm and perform to seekthesolution

x

¯

thatisthemost lose to original

x

.

Toverifytheperforman eofthegreedymat hingpursuitalgorithms,twomeasures,named

l

2

-error, and the supportre overy are used. The

l

2

-error is omputed as theratio

k¯

x−xk

2 kxk

2

thatindi atethe

l

2

-proximitybetweenthetwosolutions: theapproximationsolution

x

¯

and theideal solution

x

. The distan e between the supports ofthetwo solutions is al ulated asin Equation(2.13) where

S, S

¯

arethe supports of

x, x

¯

, respe tively (we re all that the supportof asparserepresentation solutionisits numberof non-zero entries).

distan e

( ¯

S, S) =

max{| ¯

S| , |S|} − | ¯

S ∩ S|

max{| ¯

S| , |S|}

(2.13)

If thetwo supports arethe same, thedistan e iszero and ifa distan e is lose to 1, then

thetwo supportsareentirely dierent.

We have done this experiment

200

times and al ulated the average values. Figure (2.2) presentsthe results. Overall,allalgorithmsperformwellforlow- ardinalities, andthebest

performing method is the OMP. The dieren e between the MP and Weak MP is quite

slight.

2.2.2 Basis Pursuit

Basis Pursuit As we know from the previous se tion, a mat hing pursuit performs a lo al

optimization. However, the basis pursuit performs a more global optimization. To do

that, the onvex optimization as in Equation (2.4) is rewritten as a linear programming problem. Before showing how to re ast the onvex optimization as linear programming,

(37)

1

3

5

7

9

11

13

15

0

0.1

0.2

0.3

0.4

0.5

A v erage of

l

2

-Error OMP IRLS BPbyLinearProgramming

1

3

5

7

9

11

13

15

0

0.1

0.2

0.3

0.4

A v erage of distan e

(

¯ S

,S

)

OMP IRLS BPbyLinearProgramming

Figure2.3: Performan e ofIRLS and BP(using Matlab'sLinear-Programming) algorithms

et al [63℄. Following [63℄, linearprogramming problem is a onstrained optimization over positive ve tor

d = {d

1 , d

2 , ..., d

L

} ∈ (R

+

₎

L

. Let

b = {b

1 , b

2 , ..., b

K

} ∈ R

K

_{, K < L}

,

c =

{c

1 , ..., c

L

} ∈ R

L

be a non-zeros ve tor, and

Φ ∈ R

K×L

is a matrix. Linear programming

nds

d ∈ (R

+

₎

L

su h as

d

isthesolution ofthe minimization problem

min

d∈(R

+

₎

L

L−1

X

i=0

d

i

c

i

subje tto

Φd = b

Now, oming ba k to the basis pursuit optimization and without lossof generality, let

A

be anormalized matrix, we have

min

x

kxk

1

subje tto

Ax = h

(2.14)

If

x ∈ R

M

is de omposedinto

2

sla k variables

t, v ∈ R

M

su h as

x = t − v

, and dening

Φ = (A, −A) ∈ R

L×2M

,

c = 1

,

d = (t, v) ∈ R

2M

, and

h = b

then Equation (2.14) is rewritten aslinearprogrammingsin e

2M −1

X

i=0

d

i

c

i

= kxk

1

and

Φd = At − Av = h

On e the basis pursuit is reformulated as a linear programming, it an be solved using

moderninterior-pointmethods,simplexmethodsthatarebetterthanthegreedyalgorithms

asthey obtain the global solutionof awell-dened optimization problem.

Numeri alDemonstration of Basis Pursuit To evaluatethe solutionof theBasis Pursuit,

two relaxation-basedalgorithmsareperformedand ompared usingthesameexample and

the measures mentioned in Se tion (2.2.1) . The rst one is the IRLS for

p = 1

, and the se ond one isthelinear-programming byMatlab.

Figure (2.3) shows eviden e that the IRLS and BP solver linear-programming by Matlab provideabetterapproximationtotheprobleminEquation(2.4)inbothevaluatedmeasures of quality ompared to OMP. However, Table (2.1) presents that omputing the solution of Equation(2.4) by theBP and IRLS take mu h more time ompared to theOMP (the bestalgorithm ingreedy mat hingalgorithms).

(38)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Average OMP 0.249 2.883 0.139 0.087 0.105 3.162 0.722 0.206 0.205 0.210 0.661 3.243 0.611 5.742 1.415 2.955 IRLS 13.92312.78122.16713.62913.32414.38314.06313.64013.99313.74417.19315.29015.30518.44418.90115.385 BPbyLinearPro36.5925.373 4.210 4.516 4.6393.79004.3964.33004.338 4.491 4.861 4.221 4.415 4.506 4.243 6.595 Table2.1: Time

×10

−2

(se onds)performan eof dieren emethods orrespondingto the

ardi-nalityof thetruesolution

2.2.3

l

1

Lagrangian Pursuit

TheEquation(2.6) ,thatis alledhereafterbythename

l

1

lagrangianpursuitorlagrangianbasis

pursuit, isoften preferredbe ause of its lose onne tion to onvexquadrati programming, for

whi h many algorithms are available. Several algorithms in lude the gradient pursuits [12℄, theGradientProje tion for Sparse Re onstru tion (GPSR) [56℄, andthe Stage-wise Orthogonal Mat hingPursuit (StOMP) method [37℄. In[12℄,agreedy element sele tion is donesimilarly as done in MP and OMP. However the ostly orthogonal proje tion is a omplished by applying

dire tionaloptimization s hemes thatarebasedonthe gradient,the onjugate gradient, andan

approximation to the onjugate gradient. The authors in [56℄ Figueiredo et al deal with the

l

1

lagrangian pursuitproblembyreformulating

(Q

λ

1 )

asthebound- onstrained quadrati program, and then solveit usinga Barzilai-Borwein gradient proje tionalgorithm originallydeveloped in

the ontextofun onstrainedminimizationofasmoothnonlinearfun tion. StOMP[37℄allowsto nd theapproximate sparse solution ofunderdetermined systems with theproperty either that

thedi tionary

A

israndom or thatthenon-zeros in

x

are randomlylo ated, or both. Another approa h that is widely used by many resear hers to deal with (

Q

λ

1

) is using an iterative pro edure basedon shrinkage(also alledsoftthresholding).

Soft thresholding isaniterativealgorithmthatsolves(2.6)withasoftthresholdingtode rease the

l

1

-norm of oe ients

x

,and agradient des ent tode rease thevalue of

kh − Axk

.

1 Initialization Choose

x

0 = 0

,let

k = 0

; 2 Gradientstep Update

¯

x

k

= x

k

+ γ(A

T

h − A

T

Ax

k

)

(2.15) where

γ ≤ 2kA

T

_Ak

−1

3 Softthresholding Computethe omponents

x

k+1

[p]

from

x

¯

k

x

k+1

[p] = ρ

γλ

(¯

x

k

[p])

where

ρ

γλ

(a) = a max(1 −

γλ

|a|

, 0)

(2.16) 4 Stop If

kx

k

− x

k+1

k ≤ ǫ

thenstopthe iterations, otherwiseset

k = k + 1

and goba k

to Gradient step

The ondition

γ ≤ 2kA

T

_Ak

−1

isthe onvergen e ondition.

Ba kproje tion If Equation (2.16) is repla edby an orthogonal proje tion on thesupport of