HAL Id: tel-01750679
https://hal.univ-lorraine.fr/tel-01750679
Submitted on 29 Mar 2018
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Sparse representations over learned dictionary for
document analysis
Thanh Ha Do
To cite this version:
Thanh Ha Do. Sparse representations over learned dictionary for document analysis. Other [cs.OH].
Université de Lorraine, 2014. English. �NNT : 2014LORR0021�. �tel-01750679�
AVERTISSEMENT
Ce document est le fruit d'un long travail approuvé par le jury de
soutenance et mis à disposition de l'ensemble de la
communauté universitaire élargie.
Il est soumis à la propriété intellectuelle de l'auteur. Ceci
implique une obligation de citation et de référencement lors de
l’utilisation de ce document.
D'autre part, toute contrefaçon, plagiat, reproduction illicite
encourt une poursuite pénale.
Contact : ddoc-theses-contact@univ-lorraine.fr
LIENS
Code de la Propriété Intellectuelle. articles L 122. 4
Code de la Propriété Intellectuelle. articles L 335.2- L 335.10
http://www.cfcopies.com/V2/leg/leg_droi.php
D´
epartement de formation doctorale en informatique
Ecole doctorale IAEM Lorraine
´
Sparse Representations over Learned
Dictionary for Document Analysis
TH`
ESE
pr´esent´ee et soutenue publiquement le 4 Avril 2014
pour l’obtention du
Doctorat de l’universit´
e de Lorraine
(sp´
ecialit´
e informatique)
par
DO Thanh Ha
Composition du jury
Rapporteurs :
Christian Viard-Gaudin
Professeur, Institut Universitaire de
Technologie de Nantes
Rolf Ingold
Professeur, Universit´e de Fribourg
Examinateurs :
Jean-Marc Ogier
Professeur, Universit´e de La Rochelle
Laurent Wendling
Professeur, Universit´e Paris Descartes
Directeur de th`
ese :
Salvatore Tabbone
Professeur, L’universit´e de Lorraine
Co-directeur de th`
ese :
Oriol Ramos Terrades
Professeur associ´e `a l’UAB Barcelone
Iwouldliketo a knowledgewithmu happre iationtheMinistryofEdu ationofVietNam,who
supporteda nan efor mystudy. A spe ialthanksgoesto myuniversity,Ha NoiUniversity of
S ien e- VNU, whogave methe permissionto study at LORIAinFran e.
A spe ial gratitude I give to my supervisor, Mr. Salvatore Tabbone, whose ontribution
in orientations, suggestions and en ouragement. He has dedi ated so mu h time and patien e
during theyears of my study. I thank him not only for theinvaluable advi e, for the valuable
expertisethathe shared withmebut alsofor the supportwhi h allowed meto make this thesis
inthe best working onditions.
Furthermore, a spe ial thanksgoesto my o-supervisor, Mr. Oriol Ramos Terrades, for his
s ienti advi easwell ashis availability despitethedistan e. The dis ussionswithhim allows
meto understand better theresear h and to getmore onden e.
I would like to express my deepest appre iation to Mr. Christian Viard-Gaudin and Mr.
RolfIngold for a epting to review mythesis. I am also grateful to Mr. Jean-Mar Ogier, Mr.
Laurent Wendling for a eptingto bepart ofthe jury.
I would like to thanks all the members of the team QGAR for their friendship, their help
duringmystudy withinthe team,espe iallygratefulto PhilippeDos hfor hisoutstanding
te h-ni alsupport. Thankstoallmy olleagueswithwhomIsharedtheo eforthegoodatmosphere.
Iamalso verygratefulto allmyfriendswho, loselyor remotelyhelped anden ouraged me
atthe onvenient moments. Ithankthem for allthetimepre ious thatwe spent together.
Inaddition, Iwouldalso liketo thank myhusbandand mydaughter fortheir love, kindness
andsupportthat they have shown during thepast three years of mystudy. Last but not least,
Inthisthesis,wefo usonhowsparserepresentations anhelptoin reasetheperforman eofnoise
removal,textregionextra tion,patternre ognitionandspottingsymbolsingraphi aldo uments.
Themaingoalistoprovidenewalgorithmsandappli ationsofsparserepresentationinredundant
di tionariesfor graphi al images, byaddressing the problems fromvarious perspe tives.
To do that, rst of all, we give a survey of sparse representations and its appli ations in
image pro essing. Then, we present the motivation of building learningdi tionary and e ient
algorithms for onstru ting a learning di tionary. The te hniques used to solve the sparsity
problemover learningdi tionaryarealso shown.
After des ribing the general ideaof sparserepresentations and learned di tionary, we bring
some ontributionsintheeldofsymbolre ognitionanddo umentpro essingthata hievebetter
performan es omparedtothestate-of-the-art. These ontributionsbeginbyndingtheanswers
tothe following questions.
- The rst question is how we an remove the noise of a do ument when we have no
as-sumptions about themodel ofnoise found in these images? To addresstherst question,
we believe that there is a link between the model of noise and the re onstru tion error
of the signal in the learning di tionary. Therefore, we propose to al ulate the model of
noise automati ally from the database based on thenormalized- orrelation between pairs
of noisy and non-noisy images, and then using this value as the value of re onstru tion
error in the basis pursuit denoising algorithm with a learned di tionary. The e ien y
of the proposed method has been also approved experimentally on dierent datasets for
dierent resolutions and dierent kinds of noise. All experimental results show that the
proposed method outperformsexisting onesinmost ofthe ases.
- These ondquestionishowsparserepresentationsoverlearneddi tionary anseparatethe
text/graphi parts in the graphi al do ument. In fa t, we have been strongly motivated
by thegood performan e of themorphologi al omponent analysis(MCA) method when
applied for separating textures and artoons and the text dete tion from s eni images.
In MCAmethod,thesignalis dis riminatedwithotherbased onthe omparisonbetween
theirs sparse representations in two di tionaries. However, when working with graphi
images, text hara ters are in dierent sizes in luded in the same do ument, and they
tou h eitherthemselves or the graphi s; therefore prior methods asMCA annot be used
e ientlyinthis ase. Asaresults,wehavedevelopedtheassumptionofMCAbyproposing
the strategyusing multi-learneddi tionariesfor separatingthetextregionsfromgraphi al
partinsteadof two di tionaries. Theexperimentalresultsshowthattheproposedmethod
ould be a good hoi e for the segmentation problem with omplex graphi al do uments
whileit over omes therestri tionsof theexistingmethods onsome do uments only.
- This result en ourages us to ontinue with the hallenge in symbol re ognition. On e
again, we desire to answer the question how we an apply the sparse representation for
symbolre ognition? Now,thedi ultyariseswhenitseemsthereisno onne tionbetween
symbol re ognition and sparsity, and in the best of our knowledge, there is not previous
worksinthegraphi s ommunityusingsparserepresentationstodes ribegraphi alsymbols.
Fortunately, after tireless resear h, we nd the bridge between the literature of sparse
representation and the visual vo abulary onstru tion. More spe i ally, we apply the
learned di tionaryalgorithm forlearning avisualvo abularybased onlo aldes riptors of
inthe onstru ting ve tor model an helpto improve the retrieval performan e. We hope
that this work will open a new range of appli ations for the symbol re ognition sin e in
our method other kindof lo aldes riptor an be used.
We omplete this thesis by proposing an approa h of spotting symbols that usesparse
rep-resentations for the oding of a visual vo abulary. This approa h also uses learning te hniques
to adapt su h visual vo abulary to theintrinsi properties of thedo ument datasets. It allows
a hievingarepresentationbeingsparserthantheoneobtainedbyusingapre-xedbasisinstead.
The ontribution done in this work fo uses on the symbol retrieval pro ess. The proposed
ap-proa h follows a two steps ar hite ture in luding the re all and the rening steps. The main
goals of this ar hite ture are to speed up the retrieval pro ess using sparse representation and
indexingte hniquesandtoleavemore omputationalexpensivemat hingmethodsonlyforthose
regions inwhi h the queried symbolmayappear. The rstexperiments onthe SESYDdataset
for asymbolspotting appli ation seemsto agree, and theobtained resultsare promising.
Key words: sparse representations, learned di tionary, learning algorithms, removal noise,
Dans ettethèse,nousnous on entronssur ommentlesreprésentationspar imonieusespeuvent
aider à augmenter les performan es pour réduire le bruit, extraire des régions de texte,
re on-naissan edesformesetlo aliser dessymboles dansdesdo umentsgraphiques. Le but prin ipal
est de fournir de nouveaux algorithmes et d'appli ations de représentation par imonieuses par
des di tionnaires redondants pour des images graphiques, en abordant les problèmes selon des
perspe tivesdiverses.
Pour e faire, tout d'abord, nous donnons une synthèse des représentations par imonieuses
et ses appli ations en traitement d'images. Ensuite, nous présentons notre motivation pour
l'utilisation de di tionnaires d'apprentissage ave des algorithmes e a es pour les onstruire.
Leste hniquesemployantàrésoudreleproblèmedepar imoniesurledi tionnaired'apprentissage
sont présentées aussi.
Aprèsavoirdé ritl'idéegénéraledesreprésentationspar imonieusesetdudi tionnaire
d'appr-entissage,nousprésentonsnos ontributionsdansledomainedelare onnaissan edesymboleset
dutraitement desdo umentsen les omparant auxtravauxde l'étatde l'art. Ces ontributions
s'emploient à répondreauxquestionssuivantes:
- Lapremièrequestionest ommentnouspouvonssupprimerlebruitdesimagesoùiln'existe
au une hypothèse sur le modèle de bruit sous-ja ent à es images? Pour aborder ette
première question, nous royons qu'il y a un lien entre le modèle de bruit et l'erreur de
re onstru tion du signal sur le di tionnaire d'apprentissage. Don , nous proposons de
al uler automatiquement le modèle de bruit à partir de la orrélation normalisée entre
les paires d'images bruitée et non bruitée, et ensuite en utilisant ette valeur omme la
valeur d'erreur de re onstru tion dans l'algorithme BPDN (basis pursuit denoising) ave
un di tionnaired'apprentissage. L'e a itéde laméthode proposéea étéaussiapprouvée
expérimentalement surunensemblede données endiérentesen résolutionseten type de
bruit. Tous les résultatsexpérimentaux montrent que laméthode proposéesurpasse dans
laplupart des aslestravaux existants.
- La deuxième question est omment les représentations par imonieuses sur le di tionnaire
d'apprentissage peuvent être adapter pour séparer le texte du graphique dans des
do u-ments? Notre ontribution a étémotivée parl'appro he MCA(Morphologi alComponent
Analysis) qui a été appliquée ave su ès dans la séparation de textures et des dessins
animés etla déte tion de texte d'images s éniques. Dans la méthode MCA, le signal est
séparé d'un autre par la omparaison de leurs représentations par imonieuses sur deux
di tionnaires. Cependant, dans les images graphiques, diérentes tailles de ara tères de
texte peuvent être in lus dansun même do ument etils peuvent setou her ou tou herle
graphisme;par onséquent,lesméthodesantérieures ommeMCAnepeuventpasêtre
util-isées e a ement dans e as. Ainsi, nousnoussommes appuyés surMCApour proposer
unestratégie utilisantdesmulti-di tionnairesd'apprentissageaulieudedeuxdi tionnaires
uniquement pour séparerlapartiede textedelapartie degraphique. Lesrésultats
expéri-mentauxmontrentquelaméthodeproposéepourraitêtreunbon hoixpourleproblèmede
segmentation texte/graphique dans do uments omplexes surmontant les limitations des
méthodesexistantes etleurs appli ationsà desimple do uments.
- Cesrésultatsnousen ourageà ontinuerave ledédelare onnaissan edesymbole. Nous
désironsrépondreàlaquestionde ommentnouspouvonsappliquerlareprésentation
utilisant des représentations par imonieuses pour dé rire les symboles graphiques. Notre
ontribution est de proposer une solution qui passe par l'établissement d'unlien entre la
représentation par imonieuse et la onstru tion d'un vo abulaire visuel. Plus
spé ique-ment, nousappliquons l'algorithme d'apprentissage pour apprendre unvo abulaire visuel
basésurdesdes ripteurslo auxdénis surlesymbole. Ensuite,nousproposonsune façon
originale de onstruire un modèle ve toriel pour haque symbole à partir de sa
représen-tation par imonieusedanslevo abulaireetadaptonsl'appro he tdf-ifauxreprésentations
par imonieuses. Lesrésultats obtenus ont approuvé ette représentation par imonieuse et
montrequelemodèleve torielpeutaideràaméliorerlaperforman edelare her he. Nous
pensons que e travail ouvrira la voie à de nouvelles appli ations de re onnaissan e de
symbole étant donné que d'autres des ripteurs peuvent être appliqué pour la des ription
lo ale.
Nous omplétons ettethèseen proposantune appro he delo alisationde symboles dansles
do uments graphiques qui utilise les représentations par imonieuses pour oderun vo abulaire
visuel. Cette appro he utilise aussi les te hniques d'apprentissage pour adapter un
vo abu-laire visuel aux propriétés intrinsèques des ensembles de données du do ument. Ce i permet
d'atteindre une représentation plus par imonieuse que la repr±entation obtenue par utilisation
de bases prédenies. Dans e travail, notre ontribution se on entre surle pro essus de
lo al-isation des symboles. L'appro he proposée se dé ompose en deux étapes : l'étape de rappel et
deranage. Lebut prin ipalde ette ar hite ture estd'a élérerlepro essusde lo alisationen
utilisantlesreprésentationspar imonieusesetleste hniquesd'indexationetderéserverles
traite-ments oûteux en d'appariement uniquement pour les régions sur lesquellesle symbolerequête
peutapparaître. Lespremièresexpérien essurl'ensemblededonnéesSESYDpour l'appli ation
de lo alisationde symbolessemblent être ohérents etlesrésultatsobtenus sont prometteurs.
Mots lés: représentationspar imonieuses, di tionnaired'apprentissage,algorithme
appren-tissage,rédu tiondubruit,séparationtexte/graphique,re onnaissan edesymboles,lo alisation
List of Figures xv
List of Tables xix
1 Introdu tion 1
Related Works 7
2 Sparsity and Learning Di tionary 9
2.1 Sparse Representation . . . 9
2.2 PursuitAlgorithms . . . 12
2.2.1 Greedy Mat hingPursuits . . . 12
2.2.2 Basis Pursuit . . . 14
2.2.3
l
1
Lagrangian Pursuit . . . 162.3 Learning Di tionaries . . . 18
2.3.1 Core Idea for Learning Di tionary . . . 18
2.3.2 The K-SVDAlgorithm . . . 19
2.3.3 The MODAlgorithm . . . 20
2.3.4 The OnlineLearning Di tionary Algorithm . . . 22
2.3.5 The RLS-DLA algorithm . . . 22
2.3.6 Numeri al Demonstration ofLearning Algorithms . . . 22
Contributions on Do ument Analysis 25
3 Denoising Graphi al Do uments 27
3.1 Introdu tion . . . 27
3.2 ProblemStatement . . . 29
3.3 Do ument Degradation Models . . . 30
3.4 ProposedApproa h . . . 31
3.4.1 Learning Di tionaryfor Do umentsPat hes. . . 31
3.4.2 Energy NoiseModel . . . 32
3.5 Experimental Validation. . . 35 3.6 Con lusion . . . 40 4 Text/Graphi Separation 43 4.1 Introdu tion . . . 43 4.2 ProblemStatement . . . 45 4.3 ProposedApproa h . . . 47
4.3.1 LearnedDi tionaries for Text andGraphi Parts . . . 47
4.3.2 Text Regions Extra tion byLearned Di tionaries . . . 49
4.4 Experimental Validation. . . 51 4.5 Con lusions . . . 53 5 SymbolRe ognition 57 5.1 Introdu tion . . . 58 5.2 Shape Context . . . 60 5.3 Interest Points . . . 61
5.4 Shape Context of Interest Points . . . 64
5.5 ProposedApproa h . . . 65
5.5.1 LearnedDi tionary of SCIPs . . . 65
5.5.2 Visualve tor model . . . 66
5.5.3 RetrievalSymbol. . . 68
5.6 Experimental Validation. . . 68
5.6.1 Datasets andPerforman e Evaluation . . . 68
5.6.2 Studyof Parameters . . . 69
5.6.3 Invarian e and Robustness . . . 71
5.6.4 Unseensymbols . . . 72
6.1 Introdu tion . . . 79
6.2 Extension of Shape Context of Interest Points forDo uments . . . 81
6.3 Learned Di tionary ofESCIPs . . . 83
6.4 Do ument IndexingbySparsity overLearned Di tionary . . . 84
6.5 Lo ation symbolsinthe graphi al do uments . . . 85
6.5.1 Symbol Re all . . . 85 6.5.2 Symbol Rening . . . 87 6.6 ExperimentalValidation . . . 91 6.7 Con lusion . . . 93 7 Con lusions 95 Bibliography 99
1.1 Illustration of the image de omposition problem with sparse representation (ex-tra ted from[161℄) . . . 3
2.1 Left: Minimun
l − 1
solution ofAx = h
forM = 2
. Right: Geometry ofl
1
minimization forM = 3
. . . 112.2 Performan e ofthe greedy mat hingpursuits algorithms . . . 14
2.3 Performan e ofIRLS andBP(using Matlab's Linear-Programming) algorithms . 15
2.4 The quality oftheobtained solutions . . . 17
2.5 Averagerepresentation errorsobtained at ea hiteration . . . 23
3.1 Classi ation of ImageDenoising Methods (extra tedfrom[124℄). . . 28
3.1 (a): Original binary symbol; from (b) to (g) examples of six levels of Kanungo noise of the GREC2005 dataset. . . 31
3.2 Learneddi tionaryobtainedbyusingthe K-SVDalgorithm onpat hesofthesize
8 × 8
extra ted from DIBCO images after 50 iterations (used in the experiment result). . . 323.3 Illustration thePoint SpreadFun tion. . . 34
3.4 Normalized ross- orrelationbetween two images. . . 35
3.5 Results of denoising the noisy images with Kanungo model at levels 1 and 2 of degredation. Columns 2 and 3 are the denoised images following ea h method. Columns4and5arethebinarizeddenoisedimagesof olumns2and3,respe tively. For the medianand OC, imagesarealreadybinarized in olumns2 and 3. . . 36
3.6 (a), ( ): zoom of denoised images by urvelet and our method,respe tively. (b), (d)denoised binaryversionrespe tively of(a)and ( ). . . 38
3.7 One ofthe s anned do uments. . . 39
3.8 Learneddi tionaryobtainedbyusingthe K-SVDalgorithm onpat hesofthesize
8 × 8
pixels extra ted fromreal s anneddo uments after50 iterations. . . 393.9 The denoised do ument of do ument inFigure 3.7gotbyour approa h. . . 41
3.10 (a) noisydo uments in DIBCO dataset used inTable3.6 ,(b) thedenoised do -uments got by our approa h before binarization and ( ) after binarization using Otsu's method . . . 42
4.1 Examples of graphi -ri h do uments. . . 46
4.2 Examples whenusingMCAwithunde imatedwaveletsand urveletsastwo over- omplete di tionaries
A
t
,A
g
to extra t the Text/Graphi parts: (a)original do -ument,(b)thetext omponent, and ( )graphi omponent (extra tedfrom[72℄). 474.3 Dis riminativedi tionarytrainingsamples: (a)texttrainingsamples;(b)graphi training samples . . . 48
4.4 Azoomofthetraineddi tionaries with(a)
√
s
k
= 8
,(b)√
s
k
= 16
;Graphi (left) and Text (right) . . . 504.6 Theoptimalvalueof
k
0
followingthesizeofthepat h forthegraphi di tionaries (left)and text di tionaries (right) interm of averagerepresentation error . . . 514.5 Example of de omposing input image into
K
sets of non-overlapped pat hes by usingK
sliding windows: (a) inputimage, (b)K
sets of non-overlapped pat hes for therstfour pat hes inea h set. . . 514.7 Examples using two sequen es of over- omplete learned di tionaries to separate the Text/Graphi parts: (a) original do ument, (b) nal graphi layer, and ( ) naltext layer. . . 53
4.8 Behaviour of the sparsity of the texts and the noise omponents in the text di -tionaries (left)and graphi di tionaries (right). . . 54
4.9 Exampleto illustrate how tofurther lter outnoise omponents, thevalueofthe thresholdis
6
. . . 544.10 Do uments usedinthe evaluationof Table 4.2: Origial images (left),Textlayers (midle),Textextra tion (right). . . 55
5.1 Illustrationabout howto ompute the shape ontext . . . 61
5.2 Example of a nite multi-model for
W = 10
andθ = π/4
(above). The orre-spondingLapla e ofGaussian attwo s aleσ = 1
(left),σ = 5
(right). . . 635.3 Pro ess to reatethe DoGimages. . . 63
5.4 Maxima and minima of the DoG images are dete ted by omparing the pixel (marked byred olor) neighbors in the urrent and adja ent images (marked by
bla k olor) . . . 64
5.5 Therelative log-polar oordinates of
c
j
withregard top
i
. . . 655.6 The
24
atoms ( olumns) of learned di tionary built from SCIP des riptors of symbols inGREC2003 database. . . 665.7 Examples aboutthe symbolsin dierent datasets . . . 69
5.8 Retrievalsymbolson CVC dataset whenrequest (rst olumn)isrotated, s aled. 74
5.9 Examples of querying symbols a hieving the best and worst retrieval results in termsof the AUC-PR value. Values orrespond to the osine distan e. . . 75
5.10 SomeretrievalexamplesinCVCdataset: the querysymbolisintherst olumn; other olumns arethe nearestmat hesranked from leftto right.. . . 76
5.11 SomeretrievalexamplesinGRECdataset: thequerysymbolisintherst olumn; other olumns arethenearestmat hesranked from leftto right.. . . 77
6.1 Exampleof thegraphi do uments . . . 80
6.2 Oneof the s anneddo uments . . . 81
6.3 An exampleabout ESCIPdes riptor at interestpoint
p
i
j
of do ument. . . 826.4 The
36
atoms( olumns)oflearneddi tionaryobtainedbyusingESCIPdes riptors asthetrainingdataset. . . 836.5 Illustrationof a solutionto the opimization problemover learned di tionary. . . . 84
6.6 Theinverted lestru ture . . . 85
6.7 Example about how to lo ate an interest region in the do ument (right) being orrespondingto therequest symbol(left) . . . 86
6.8 (a): request symbol,(b): orrespondinginterestregions inthedo uments . . . . 87
6.9 Exampleof situation where ontourpointsdo not belongto theinterest region. 88
1.1 Classi ation a ura y for the digitsand texture data
[%]
. Onlytesting samples are randomly transformed for the digits(su htransformations arenatural in thetexture dataset)(extra ted from[6℄) . . . 4
2.1 Time
×10
−2
(se onds) performan e of dieren e methods orresponding to the ardinality ofthe true solution . . . 163.1 The valueof
r
¯
at sixlevel ofnoise. . . 363.2 Summary of the denoising results inGREC 2005. (a), (b) are level 1, level 2 of noise, respe tively. . . 37
3.3 Averagevalue gainedbyJa ard's similaritymeasure. . . 38
3.4 Averagevalues gainedbyMSE. . . 38
3.5 The obtained resultswith s anneddo umentsusing SSIMmeasure . . . 40
3.6 The obtained resultswith DIBCO2009 dataset using SSIMmeasure. . . 41
4.1 The sizesof thetrainingdatabases. . . 49
4.2 Performan eevaluation: (seeFigure4.10)with
T
0
setinFigure4.6andT
k
= 16; 32
for√
s
k
= 8; 16
. . . 535.1 AverageAUC-PR valuesfor the rotation ands ale dataset. . . 70
5.2 AverageAUC-PR valuesfor the CVCdataset. . . 70
5.3 Bestvalues for the datasets. . . 71
5.4 Retrievalee tiveness withAUC-PR values indierent datasets . . . 71
5.5 Averageof the AUC-PRvalues onsidering several per entages of trainingset size. 72 6.1 Spotting results forqueries inthe olumn
1
. . . 92Introdu tion
The exponential development of the storage apa ity of omputers over the last few de ades
allowed to multiply the digital resour es and soto fa ilitatenumeroustasks. However, infront
of this in rease, raises a signi ant hallenge of nding the relevant information on the huge
amount ofdata. Theneed to develop fastand e ient methods is thus ompulsory.
In the eld of do ument analysis, the pro ess of retrieving the information has been done
basedon the analyses ofthe stru tural information ontained inthedo ument. Ingeneral, the
stru tural information in digital do uments are partitioned into a hierar hyof physi al
ompo-nents, su h aspages, olumns, paragraphs, text lines, words, tables, gures,et .; a hierar hy of
logi al omponents,forexampletitles,authors,aliations,abstra ts,se tions,et .;orboth.
De-pendingonea hkindofstru tural information,andtheformstorepresent thedo ument layout,
thereexist dierent layout analysis algorithms. For example,with thephysi al layout analyses,
algorithms anbe ategorizedinto three lassesin ludingtop-downapproa hes[127,5℄, bottom-up approa hes [131, 88, 176,59,75℄and hybrid approa hes [138℄. With the logi al layout rep-resentationsand analyses,some typi al algorithms an be mentioned in ludingthemodel-based
system [181, 89℄, rule-based system [58℄, or frame-based system [32℄, et . More details about do ument stru ture analysisalgorithms an be foundinthe survey ofHarali k[68℄, Nagy[126℄, Jain etal [76℄,and Mao[111℄.
Our work in this domain fo uses on the te hni al do uments. In parti ular, we develop
pre-pro essing te hniquesto upgrade the qualityof the do ument image or to redu e the noise
generated from the input devi es; we study a new method in text/graphi segmentation, in
des ribing graphi al symbols; and nally a new approa h for the spotting problem have been
also proposed. However, instead of developing the methods mentioned above, we approa h
these problems in dierent dire tion that have been developing widely in the omputer vision
ommittee when dealing with s ene images, but not really in te hni al do uments. It is the
approa hesusingsparserepresentation overlearned di tionary. Ingeneral, inthedomain ofthe
sparserepresentation, elementary atoms hoseninafamily fun tions alleddi tionaryrepresent
the relevant information. As theresults, sear hing this information means nding these atoms.
However, how obtaining an ideal di tionary adapted to all images in the large database, how
nding good sparse representations over this di tionary are fundamental questions and have a
su essful history. One of the rst su esses, onsidered the key to open the door to a huge
jungle, is the dis overy of wavelet orthogonal bases and lo al time-frequen y di tionaries. The
su esses of sear hing good sparse representations over redundant di tionaries have helped to
improve the performan e not only of image denoising, but also of thesour e/image separation
Image Denoising
A su essfuldenoising method using sparsity is rst introdu ed by Eladet al. [43,42℄. In that approa h, the authors used the assumption that pat h of lean image an be approximated by
a sparse linear ombination of elements from a di tionary
A
. Another pat h-based approa hes an be foundin the worksof Buades et al. [16℄,Roth et al. [144℄. In general, denoisinga pat hh ∈ R
L
with a learned di tionaryA ∈ R
L×M
orresponds to solve the sparse de omposition
problem [169,20℄ with
h
is thepat h of noisy image andA
iseither thepre-dened di tionary as wavelets, ridgelets, urvelets,..., or a learned di tionary adapted to the pat hes of images.Following [43,41,107,106,105℄,theenergyof noise
ǫ
anbe hosena ordingto the(supposed known) standard deviationσ
of the noise. The value ofǫ
is proportional to both the noise varian e andimage size [41,161,109℄. Mairal et al. [107,106℄ usedthe umulative distribution fun tionF
m
oftheχ
2
m
distributionand hooseǫ = σ
2
F
−1
m
(τ )
,whereF
−1
m
(τ )
istheinverse ofF
m
. However,inthespe i appli ationsaboutnoiseredu tiononimages,weusuallydonotknowpre isely the modelof noise found inimages, or inother words, thenoise varian e is unknown.
In addition, the noise in do uments is dierent ompared to the noise in natural images that
aregenerated bydevi es likedigital ameras orsimilar. Thus, we annotusethenoise varian e
to de ide thevalue for
ǫ
. In Chapter(3) , we propose anenergy noise modelwhi h allows us to easierset the thresholdrequired for noise removaleven ifthenoise modelis unknown.Image Separation
Image separation problemis the extention of the sour e separation problem being fundamental
inpro essing a ousti signals. Ingeneral, suppose thattheobserved signal
h
isa superposition of two dierent sub-signalsh
1
,h
2
,orh = h
1
+ h
2
where
h
1
is sparsely generated using the model with the di tionaryA
1
andh
2
is sparsely generatedusingothermodelwiththedi tionaryA
2
. Iftheoptimalsolutions(¯
x
1
, ¯
x
2
)
areobtained by solvingtheEquation (1.1) ,(¯
x
1
, ¯
x
2
) = min
x
1
,x
2
kx
1
k
0
+ kx
2
k
0
subje tto
kh − A
1
x
1
− A
2
x
2
k
2
≤ ǫ
1
+ ǫ
2
(1.1) then,thesolutions to the separation problemare al ulated by¯
h
1
= A
1
× ¯x
1
,¯
h
2
= A
2
× ¯x
2
. This is the basis idea of Morphologi al Component Analysis (MCA) algorithm. The su ess oftheseparation MCA algorithm is the rolesof di tionaries
A
i
in adis riminant between ontent types,preferring the omponenth
i
overtheotherparts. Figure(1.1) presentsone examplehow sparserepresentation is usedto solve the image de omposition problem.Clearly,inMCAalgorithm,oneoftheimportantquestionsiswhataretheproperdi tionaries
forthesekindsof ontents. Toidentifysu hdi tionaries,weneed toknowthe ontentsofimages.
In fa t, in[41℄ the separation of the ontent of images hasbeen done basedon the assumption thatimages are ombined linearlyof artoon and textureparts and thepre-dened di tionaries
are hosenbyexperien eofthe authors. Following theworksin[160,44,14,159℄, andidate pre-deneddi tionariesfortexturepart anbe(lo al)dis rete osinetransform,orGabortransform,
while the andidate for artoon parts in lude the bi-orthogonal wavelet transform, isotropi à
trous algorithm, the lo al ridgelet transform, or the urvelet transform. Be ause of making a
hoi efromonetransformtoanotherisusuallydonebyexperien e,soitisnotadaptedtovarious
from[161℄)
the di tionaries are learned by applying K-SVD algorithm on the orrupted pat hes of images
itself. The separation is very su essful, and one of the other remark is that this results are
a hieved without the need to pre-spe ifythedi tionaries for thetwo parts. Learned di tionary
ombinedwithsparsityisalso found inthe work ofPan et al. [135℄andZhao et al. [192℄. However, the performan e of above methods depends strongly on the size of the pat h. In
fa t, ifthe size of the pat h is too large, its sparserepresentation ve tor is large. It means the
omputing osttimewillin rease. Ifthissizeistoosmall,it ouldnot ontainenoughinformation
fordis rimination. Therefore,toover omethisshort oming,weproposeamethodintheChapter
(4) using multi-resolution learned di tionaries for separating text parts from graphi alones. To the best of our knowledge, it is the rst times multi-resolution learned di tionaries have been
usedfor the separation task. In general, the proposed method is a pat h-based approa h using
the assumption that the representations of text andidate pat hes in text learned di tionaries
aresparse,but they arenot sparseenough ingraphi learned di tionaries.
Classi ation
Therearesomepreviousworksthatusesparsityoverlearneddi tionarytoenterin lassi ation
tasks and then improve the lassi ation performan e. To the best of our knowledge, the rst
approa h is the work of Raina etal with theself-taught learning (STL) method [141℄. In [141℄, thedi tionaryislearned fromanunlabeleddataset, thenthesparse oding oe ientsobtained
when odingelementsofthelabeleddatasetserveasfeatureswhi harefedintoanSVM lassier
(seeAlgorithm(1) ). STLalgorithm isverye ientinthe asethatthetraining andtestingsets arealigned.
Algorithm 1 STLalgorithm
1:
(A, X) ←
TrainDi tionary(H)
2: Learn a lassierC bya linearSVMwith inputisX
3:
X
test←
Lars
(H
test
, A)
%Larsisafun tionusedtonda sparsesolution(see Chapter2) 4: Classify the set
X
test byC
Dataset
θ[
0
]
S ale STL HIA 10digits±50
1 59.1 86.0±50
±0.2
55.6 83.6 3 Textures−
−
76.4 94.7 4 Textures−
−
75.6 91.2Table1.1: Classi ation a ura y for the digits and texturedata
[%]
. Onlytesting samples are randomly transformed for the digits (su h transformations are natural in the texture dataset)(extra tedfrom [6℄)
etal.[6℄. Thisapproa h integratestheideas ofthesparserepresentation theoryandhierar hi al stru tures. Infa t, theauthors usedalog-polar mappingto onvertrotatedand s aledpatterns
into shiftedpatternsinthe newspa ewhere theyoperateforlearningthedi tionaryandadding
hierar hy. This approa h is the extension of the self-taught learning (STL) method [141℄ to a hierar hi alar hite tureofdi tionarylearning. Authorsapprovedthatthishierar hi alapproa h
performsbetterthan thelayeredone,andbyusinglearned di tionaryinsteadofpre-dened one.
The results for lassifying the handwritten digits and texture images are improved (see Table
(1.1) ).
Thereis otherstate-of-the-art algorithm based ondis riminative di tionary learningmodels
thatgivesgoodperforman esin lassi ationtasks[114,115,112℄. Theauthorsin[114,115,112℄ proposed a formulation for learning a di tionary tuned for a lassi ation task whi h is alled
the supervised di tionary learning. It is a dis riminative approa h that ee tively exploits the
orresponding sparse signal de ompositions in image lassi ation tasks. This work aorded
an ee tive method for learning a shared di tionary and multiple (linear or bilinear) de ision
fun tions. The experiments done on MNIST [91℄ and USPS [73℄ handwritten digit datasets showspromisingresultsforthisapproa h. Infa t,thisapproa ha hievedstate-of-the-artresults
onMNISTwitha
0.54
per enterrorrate,whi hwassimilartothe0.60
per enterrorratein[142℄, and2.84
per ent errorrateon USPSwhi hwasslightly behindthe2.4
per enterror ratein[65℄. Theauthorsin[182℄presentedamethodthat omputesaspatial-pyramid image representa-tionbasedonsparserepresentation oflo alfeaturesforimage lassi ation. Thismethodusesasele tive sparse representation insteadof traditionalve tor quantization to extra t salient
prop-ertiesoflo aldes riptors. Inaddition,sparserepresentationisusedtooperatelo almaxpooling
onmultiple spatials alesto in orporatetranslationand s aleinvarian e. An en ouragingresult
ofthis paperisthelo aloffeaturesusedinsteadofan imageor pat hesofimage inthesparsity
framework. This method is approved and works well when ombined with simple linear SVMs
for improving the s alability oftraining, the speed of testing, andthe lassi ation a ura y.
The mentioned methods above usually working on the images, the pat hes extra ted from
images, or even being developed to adapt to lo al features of images as in [182℄. However, in the best of our knowledge, these methods gave no hara teristi of the invarian e of sparse
representations. Itmeansiftheimage/pat his hangedundersometransformationssu hass ale,
rotation,degradation, et .,thenits orrespondingsparserepresentationisnotsimilar(oralmost
similar)to thesparserepresentationoforiginal imageinthesamedi tionary. Inaddition,inthe
ontext ofsymbolre ognition, thedes riptionof the symbolneeds theinvarian e riteria under
theanetransformations. Thus,inthe Chapter(5),ourworkinghypothesisisthat'ifalearned di tionaryisoptimizedbytakingintoa ountthedatapropertiesderivedfromades riptor,thus
it not only spe i ally adapted to the des riptor but also provides the optimal approximation
approa h have been approved that it isnot only keeping theinvariant riteria underthe ane
transformationsbut alsoimproving theperforman eof thesymbolretrievalsystem.
Contributions and Organization of the thesis
The ontributionspresentedinthisthesisfollowthelinesofhowgoodsparserepresentationsover
redundant di tionaries an help to in rease the performan e of noise redu tion, text/graphi al
separation,pattern re ognition, and inlo alization of elements into graphi do uments su h as
ar hite tural or ele tri planes. Before presenting our ontributions intheeldofsymbol
re og-nition, do ument pro essing and the good performan es a hieved by our algorithms ompared
tothestate-of-the-art, wereviewinthisthesisthemotivationof onstru ting thelearned
di tio-naryinsteadofusingpre-deneddi tionaryande ientalgorithmi toolsforbuildingalearning
di tionary. The methods used to solve the sparsity problem over learned di tionary are also
shown.
Ourworksinthisthesis willbe reviewed infollowing organization:
•
The rst part in ludes one hapter, hapter number (2), whi h des ribes the ba kground of sparse representation and related worksinvolving to thesparsity. In this hapter, thedenitions about the sparse representation and learned di tionary are presented. Detail
reviews ofstate-of-the-artalgorithmson ndingthesparserepresentation, onbuildingthe
di tionary are also shown and dis ussed arefully. Numeri al experiments areperformed
with the purpose of evaluating the omplexity and nding out the advantage as well as
disadvantageof thesealgorithms for ea hparti ular problem.
•
The se ond partisa lowpro essingfor do ument ltering and text/graphi separation: - Wepropose in hapter (3) analgorithm for de-noising do ument imagesusingsparserepresentations. Following a training set, this algorithm learns not only the main
do ument hara teristi s, but also the noise in luded into the do uments. In this
perspe tive, we propose to model the noise energy based on the normalized
ross- orrelation between pairsof noisyand non-noisy do uments.
- A new approa h to extra t text regions from graphi al do uments is presented in
hapter (4) . In the proposed method, we rst empiri ally onstru t two sequen es of learned di tionaries for the text and graphi al parts respe tively. Then, we
om-pute the sparse representations of all dierent sizes and non-overlapped do ument
pat hes in these learned di tionaries. Based on these representations, ea h pat h
an be lassied into the text or graphi ategory by omparing its re onstru tion
errors. Same-sizedpat hesinone ategoryarethenmergedtogethertodenethe
or-responding text or graphi layers, whi h are ombined to reate a nal text/graphi
layer. Finally, in a post-pro essing step, text regions are further ltered out using
some learned thresholds.
•
The third partisour dedi ations insymbolre ognition and spottingsymbols:- In hapter (5),we propose anew approa h for symboldes riptionbasedon the om-binationbetween shape ontextsof interestpoints(SCIP)des riptor withsparse
rep-resentation over learned di tionary. More spe i ally, a di tionary is learned from
is onsidered as a visual word. Next, a ve tor model for the symbol is onstru ted
based on sparserepresentations of its SCIPs in thelearned di tionary and the tf-idf
approa h adaptedto thesparsity.
- We propose an approa h to deal with theproblem of symbol spotting for graphi al
do uments using sparse representations in hapter (6). In the proposed method, a di tionary is learned from a training database of lo al des riptors dened over the
do uments. Following their sparse representations, interest points sharing similar
propertiesareusedtodeneinterestregions. Usinganoriginaladaptationof
informa-tionretrievalte hniques,ave tormodelforinterestregionsandforaquerysymbolis
built basedonits sparsityinavisual vo abularywhere thevisualwordsare olumns
in the learned di tionary. The mat hing pro ess is then performed omparing the
similaritybetween ve tor models.
Publi ations
The results showninthis thesisarereported fromour following publi ations:
1. T-H Do, S. Tabbone and O. Ramos-Terrades, Noise suppression over bi-level graphi al
do umentsusing a sparserepresentation,CIFED 2012,Bordeaux, Fran e.
2. T-H Do, S.Tabbone andO. Ramos-Terrades, Text/graphi separation usinga sparse
rep-resentation withmulti-learned di tionaries,ICPR2012, Tsukuba,Japan.
3. T-H Do, S. Tabbone and O. Ramos-Terrades, New Approa h for Symbol Re ognition
Combining Shape Context of Interest Points with Sparse Representation. ICDAR 2013,
Washington DC,USA.
4. T-HDo, S.Tabbone andO.Ramos-Terrades, Do ument NoiseRemovalusingSparse
Rep-resentationsoverLearned Di tionary,Do Eng 2013, Floren e,Italy.
5. T-HDo,S.TabboneandO.Ramos-Terrades,SpottingSymbolusingSparsityoverLearned
Di tionary of Lo al Des riptors, the 11th IAPR International Workshop on Do ument
Analysis Systems,2014, Tours, Fran e.
6. T-H Do, S.Tabbone and O. Ramos-Terrades, SparseRepresentation over Learned
Sparsity and Learning Di tionary
Contents
2.1 Sparse Representation . . . 9
2.2 PursuitAlgorithms . . . 12
2.2.1 GreedyMat hingPursuits . . . 12
2.2.2 BasisPursuit . . . 14
2.2.3
l
1
LagrangianPursuit. . . 16
2.3 LearningDi tionaries. . . 18
2.3.1 CoreIdeaforLearningDi tionary . . . 18
2.3.2 TheK-SVDAlgorithm . . . 19
2.3.3 TheMODAlgorithm . . . 20
2.3.4 TheOnlineLearningDi tionaryAlgorithm . . . 22
2.3.5 TheRLS-DLAalgorithm . . . 22
2.3.6 Numeri alDemonstrationofLearningAlgorithms . . . 22
2.4 Con lusion . . . 23 This hapterdes ribestheba kgroundofsparserepresentationandrelatedworksinvolvingto
thesparsity. Inparti ular,the denitionsaboutthesparserepresentation andlearneddi tionary
arepresented. Detailreviews ofstate-of-the-art algorithmson ndingthesparserepresentation,
on building the di tionary are also shown and dis ussed arefully. Numeri al experiments are
performed with the purpose of evaluating the omplexity, nding out theadvantage as well as
disadvantageof these algorithmsfor ea h parti ular problemand to setthe hoi efor our work.
2.1 Sparse Representation
A signal
h ∈ R
L
is stri tlyor exa tlysparseifmost of its entries areequal to zero, or the
ardi-nality ofthe supportofthe signal
#{1 ≤ i ≤ L|h
i
6= 0}
is mu hlessthanL
. Ak−
sparsesignal is a signalthat has exa tlyk
nonzero entries. We an present a signal as a linear ombination ofk
olumns(oratoms)of a given over- omplete di tionaryA
,su h ash = Ax =
k
X
i=1
a
i
x
i
(2.1) Mathemati ally, ifA = {a
1
, a
2
, . . . , a
M
} ∈ R
L×M
, M ≫ L
is a full
−
rank matrix, then the underdeterminedlinear systemof equations (Equation (2.1)) will have innitely many dierent setsof valuesfor thex
's(solutions) thatsatisfyitsimultaneously.For example,the following system
(
x
1
+ 3x
2
− 2x
3
= 5
3x
1
+ 5x
2
+ 6x
3
= 7.
Thesolutionset
x = {x
1
, x
2
, x
3
}
tothissystem anbedes ribedbytheequations:x
1
= −7x
3
−1
andx
2
= 3x
3
+ 2
withx
3
isthe freevariablewhilex
1
andx
2
aredependent onx
3
. Thedierent options for the free variables may lead to dierent des riptions of the same solution set: Ifx
1
is the free variable, andx
2
andx
3
are dependent, then we havex
2
= −3/7x
1
+ 11/7
andx
3
= −1/7x
1
− 1/7
. Thus, ea h freevariablegivesthe solutionset.In above example, the sets of su h these
x
an be des ribed using mathemati al language. However, from the appli ation point of view,one ofthemain tasks indealing theabove systemofequations istond theproper
x
that an des ribeh
well omparingwithothers. Infa t,this task is the omputing and optimizing a signal approximation by hoosing the best di tionarysub-set olumns. Asthe results, to gain this well-dened solution, a fun tion
f (x)
is added to assess thedesirability ofa would-be solutionx
,withsmallervalues being preferred:(P
f
) : min
x
f (x)
subje tto
Ax = h
If
f (x)
isthel
0
pseudo-normkxk
0
(numbernonzeroelementsinve torx
),thentheproblem(P
f
)
be omes ndingthesparserepresentationx
ofh
satisfying:(P
0
) : min
x
kxk
0
subje tto
Ax = h
(2.2)In general, solving Equation (2.2) is often di ult (NP-hard problem) and, therefore, is omputationally intra table. It is ne essary to nd su iently fast algorithms omputing
sub-optimally, but 'good enough' solution. One of the hoi es is using greedy algorithms, su h as
Mat hingPursuit (MP)[110℄,Orthogonal-MP (OMP) [137℄, Weak-MP [167℄,and Thresholding algorithm. In general, a greedy algorithm is an algorithm that follows solving the problem
heuristi of making the lo ally optimal single term updates with the hope of nding a global
optimum. Inthis ase,thesetofa tive olumnsstartedfromemptyismaintainedandexpanded
by one additional olumn of
A
at ea h iteration. The olumn hosenis olumn that maximally redu estheresidualerrorl
2
inapproximatingh
fromthe urrentlya tive olumns. Theresidual errorl
2
isevaluatedafter onstru tinganapproximantin ludingthenew olumn;ifitfallsbelow a spe iedthreshold, the algorithm terminates.The other hoi e is to relax
l
0
-norm by repla ing it withl
p
-normsfor somep ∈ (0, 1]
or by smoothfun tionssu hasP
i
log(1 + αx
2
i
),
P
i
x
2
i
/(α + x
2
i
)
,orP
i
(1 − exp(−αx
2
i
))
. Theex iting algorithmofthisfamilyistheFO alUnderdeterminedSystemSolver(FOCUSS)byGorodnitskyandRao[64℄. Inthisalgorithm,the
l
p
-norm(forsomexedp ∈ (0, 1]
)isrepresentedasaweightedl
2
-norm by usingIterative-Reweighed-Least-Squares (IRLS)method.Another popular strategy is to repla e the
l
0
-norm by thel
1
-norm proposed by Donoho et al [35℄(P
1
) : min
x
kW
−1
xk
1
subje t toAx = h
(2.3)Thematrix
W
is adiagonal positive-denite matrix. A natural hoi efor theentry inW
isw(i, i) = 1/ka
i
k
2
. Letx = W
˜
−1
x
,thenEquation(2.3) isre-formulated as
(P
1
) : min
˜
x
k˜xk
1
x
2
x
1
h
¯
x
B
τ
x
2
x
1
x
3
¯
x
B
τ
Figure 2.1: Left: Minimun
l − 1
solution ofAx = h
forM = 2
. Right: Geometry ofl
1
minimizationforM = 3
.inwhi h
A
˜
isthe normalizedversionofA
. Equation(2.4)isthe lassi basispursuitformat,and thesolutionx
an be foundbyde-normalizingx
˜
. Thus,(P
1
)
usuallybeused withanormalized matrix.Be ause the purpose is to nd the sparse representation in the solution, so, when onvex
l
0
−
norm tol
1
−
norm thenbasis pursuit hasto ensure to re over sparse solutions and that an be explained using a geometri interpretation of basis pursuit as following: LetP
is the ane subspa eofR
M
of oordinateve torsthat re over
h ∈ R
L
P = {x ∈ R
M
: Ax = h}
A basis pursuit nds in
P
an elementx
¯
of minimuml
1
-norm. It an be found by inating thel
1
-normballB
τ
byin reasingτ
untilB
τ
interse tsP
.B
τ
= {x ∈ R
M
: kxk
1
≤ τ} ⊂ R
M
This geometri onguration is depi ted for
M = 2
andM = 3
in Figure (2.1) . Thus, the optimal solutionx
¯
is likely to have zeros or oe ients lose to zeros when it is omputed by minimizing anl
1
−
norm.The solution for
(P
1
)
problem an by found by some existing numeri al algorithms, su h as Basis Pursuit by Linear Programming [20℄, IRLS (Iterative Reweighed Least Squares) (forp = 1
) [28℄. Following [109℄, basis pursuit is omputationally more intense than a mat hing pursuit,butitisamoreglobal optimizationthatyields representationsthat anbe moresparse.Ifthere existssomeappropriate onditions on
A
andx
,likekxk
0
≤
1
2
(1 +
1
max
i6=j
|a
T
i
a
j
|
ka
i
k
2
ka
j
k
2
)
thenBasi PursuitandOMPgive theunique solutionof (2.4) and itisalso theunique solution of
(P
0
)
.Sometimes, the exa t onstraint
h = Ax
is hanged by relaxed one by using the quadrati penaltyfun tionQ(x) = kAx−hk
2
2
≤ ǫ
,withǫ ≥ 0
istheerrortoleran e. Thus,anerror-tolerant versionof(P
0
)
isdened by:(P
0
ǫ
) : min
x
kxk
0
subje tto
kAx − hk
2
≤ ǫ
In(P
ǫ
0
)
thel
2
-normusedforevaluationthe errorofAx − h
anberepla edbyotheroptions, asl
1
, l
2
, orl
∞
. We an see one of advantages of this hange through noise removal. Assumingthat signal
h
has noisee
withnite energykek
2
2
≤ ǫ
2
,h = Ax + e
. Solving(P
ǫ
0
)
an help us to nd thesolutionx
¯
thatbases on itwe an ndthe unknown denoised signal¯
h
byh = A¯
¯
x
.Similarly, when relaxing
l
0
-norm to al
1
-norm, we get(P
ǫ
1
)
known in the literature asbasis pursuitdenoising (BPDN)(P
1
ǫ
) : min
x
kxk
1
subje t tokAx − hk
2
≤ ǫ
(2.5) The solution to(P
ǫ
1
)
is pre isely the solution to the following un onstrained optimization problem(Q
λ
1
) : min
x
λkxk
1
+
1
2
kAx − hk
2
2
(2.6)wheretheLagrangemultiplier
λ
isafun tionofA, h
andǫ
. Inthestatisti alma hinelearning ommunity, theproblem(Q
λ
1
)
is usedfor regression over a omprehensive set of features being the olumns ofA
. Its goal is to nd a simple linear ombination of a few features that ould explain the ve tor output of a omplex systemh
. Thus, solving(Q
λ
1
)
not only provides a way to get su h regression, but it also sele ts a small subset of features.(Q
λ
1
)
is well-known under the name LASSO (Least Absolute Shrinkage and Sele tion Operator) that was on eived byFriedman, Hastieand Tibshirani [170℄. The LASSO team and Efron also proposed an ee tive algorithm, alled LARS (Least Angle Regression Stagewise) [40℄ that guarantees the solution path of
(Q
λ
1
)
isthe global optimizer. The generalized version of(Q
λ
1
)
has the form in Equation (2.7) whereρ(.)
is any 'sparsity-promoting'fun tion. For example,whenρ(x) = |x|
,1T
ρ(x) = kxk
1
,we getEquation(2.6).min
x
λ
1T
ρ(x) +
1
2
kAx − hk
2
2
(2.7)MinimizationEquation(2.7) an be treated usingvarious lassi iterative optimization algo-rithms, asSteepest-Des ent, Conjugate-Gradient, or interior-point algorithms. However, in the
asefor high-dimensional problem,these methodsperform verypoorly[41℄. Thus, anew family ofnumeri al algorithmshasbeendeveloped, alledIterative-Shrinkage algorithms. Someof
algo-rithms inthis family in lude Stage-wise Orthogonal-Mat hing-Pursuit (StOMP) algorithm [26℄, EMandBound-Optimizationapproa hes[57,55℄,IRLS-basedshrinkage algorithm,and Parallel-Coordinate-Des ent(PCD) algorithm [41℄.
2.2 Pursuit Algorithms
In this se tion, we will des ribe deeper algorithms mentioned in Se tion (2.1). Following the problem that we want to deal with
(P
0
)
,(P
ǫ
0
)
;(P
1
)
,(P
ǫ
1
)
; or(Q
λ
1
)
, we divide algorithms into three lasses in luding the greedy mat hing pursuit, the basis pursuit, and thel
1
lagrangian
pursuit, respe tively.
2.2.1 Greedy Mat hing Pursuits
Mat hing pursuit algorithm introdu ed byMallat andZhang [110℄ omputessignal approx-imations from the over- omplete di tionary. Inthis method,the olumn (or atom) of the
di tionaryis sele tediteratively onebyone.
Let
A = {a
1
, a
2
, ..., a
M
} ∈ R
L×M
isanover- ompletedi tionaryhavingaunitnormand
A
in ludesM
linearlyindependent olumns. Themat hingpursuit begins byproje ting thesignal
h = {h
1
, h
2
, ..., h
L
}
on a olumna
i
∈ A
and omputing theresiduer
1
withr
0
= h
.h = a
i
×
L
X
j=1
h
j
a
i
[j] + r
1
= hh, a
i
ia
i
+ r
1
Be ause all olumns in
A
are linearlyindependent, sor
1
isorthogonal to
a
i
. It meanskhk
2
= |hh, a
i
i|
2
+ kr
1
k
2
Tominimizeresidue
kr
1
k
2
,weneedto hoosethe olumn
a
i
su has|hh, a
i
i|
isthemaximum. Thispro edureisiteratedbysub-de omposing theresiduer
1
. Forinstan e,assumingm-th
order residual
r
m
is already al ulated for
m > 0
, then in thenext iteration, we need to hoosethe olumna
i
m
thatmaximize|hr
m
, a
i
m
i|
.Weak-Mat hing-Pursuit ismat hingpursuitbut rather than sear hingfor thelargest
inner-produ t value
|hr
m
, a
i
m
i|
the sele ted olumna
i
m
is the olumn that satises Equation(2.9) where
α ∈ (0, 1]
.|hr
m
, a
i
m
i| > α sup
j=1,...,M
|hr
m
, a
j
i|
(2.9)Orthogonal Mat hing Pursuit improvesthemat hingpursuitapproximationsby
orthogonal-izing the dire tions of proje tion. The sele ted olumn
a
i
m
now have to be orthogonaltothepreviouslysele tion olumns
{a
i
k
}
k=1,...,m
. This anbedonebyproje ting theresidues onan orthogonalfamily{u
k
}
1≤k<m
omputedfrom{a
i
k
}
1≤k<m
usingGram-S hmidt algo-rithm. Letu
0
= a
i
0
,thenGram-S hmidt algorithm omputesu
m
byEquation(2.10) .u
k
= a
i
k
−
k−1
X
l=0
hu
l
, a
i
l
i
ku
l
k
2
u
l
(2.10)and the residue
r
k
is proje ted onu
k
insteadofa
i
k
r
k
=
hr
k
, u
k
i
ku
k
k
2
u
k
+ r
k+1
(2.11)Summing theEquation(2.11) for
0 ≤ k < m
,and withther
0
= h
yields:h =
m−1
X
k=0
hr
k
, u
k
i
ku
k
k
2
u
k
+ r
m
LetP
V
k
betheorthogonalproje toronthespa eV
k
generatedby{u
k
}
0≤k<m
,thenforanyk ≥ 0
theresidualr
k
isthe omponent of
h
that isorthogonal toV
k
:r
m
= h −
m−1
X
k=0
hr
k
, u
k
i
ku
k
k
2
u
k
(2.12)Theorthogonalmat hingpursuitsele t
a
i
m
thatmaximize|hr
m
, a
i
m
i|
withr
m
is al ulated using Equation(2.12) .1
3
5
7
9
11
13
15
0
0.1
0.2
0.3
0.4
0.5
Cardinalityofthetruesolution
A v erage of
l
2
-Error OMP MP Weak-MP(α = 0.5)
1
3
5
7
9
11
13
15
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Cardinalityofthetruesolution
A v erage of distan e
(
¯ S
,S
)
OMP MP Weak-MP(α = 0.5)
Figure2.2: Performan e ofthe greedy mat hingpursuits algorithms
Numeri al Demonstration of Greedy Mat hing Pursuits We on ludethedis ussionon
the greedy mat hing pursuits algorithms by omparing their behaviors on a simple ase.
To dothat, arandom di tionary of thesize
50 × 100
with entries drawn fromthe normal distribution, and a sparseve torsx
with independent andidenti ally-distributed (iid) ran-domsupports of ardinalities intherange[1; 15]
are reated. Usingl
2
-norm to normalize the olumns of random di tionary, we get the normalize di tionaryA ∈ R
50×100
and a
ve tor
h = Ax
is omputed on eA, x
isgenerated. Now,we onsiderA, h
astheinputfor theMP,OMPor weak-MP algorithm and perform to seekthesolutionx
¯
thatisthemost lose to originalx
.Toverifytheperforman eofthegreedymat hingpursuitalgorithms,twomeasures,named
l
2
-error, and the supportre overy are used. Thel
2
-error is omputed as theratiok¯
x−xk
2
kxk
2
thatindi atethel
2
-proximitybetweenthetwosolutions: theapproximationsolutionx
¯
and theideal solutionx
. The distan e between the supports ofthetwo solutions is al ulated asin Equation(2.13) whereS, S
¯
arethe supports ofx, x
¯
, respe tively (we re all that the supportof asparserepresentation solutionisits numberof non-zero entries).distan e
( ¯
S, S) =
max{| ¯
S| , |S|} − | ¯
S ∩ S|
max{| ¯
S| , |S|}
(2.13)If thetwo supports arethe same, thedistan e iszero and ifa distan e is lose to 1, then
thetwo supportsareentirely dierent.
We have done this experiment
200
times and al ulated the average values. Figure (2.2) presentsthe results. Overall,allalgorithmsperformwellforlow- ardinalities, andthebestperforming method is the OMP. The dieren e between the MP and Weak MP is quite
slight.
2.2.2 Basis Pursuit
Basis Pursuit As we know from the previous se tion, a mat hing pursuit performs a lo al
optimization. However, the basis pursuit performs a more global optimization. To do
that, the onvex optimization as in Equation (2.4) is rewritten as a linear programming problem. Before showing how to re ast the onvex optimization as linear programming,
1
3
5
7
9
11
13
15
0
0.1
0.2
0.3
0.4
0.5
Cardinalityofthetruesolution
A v erage of
l
2
-Error OMP IRLS BPbyLinearProgramming1
3
5
7
9
11
13
15
0
0.1
0.2
0.3
0.4
Cardinalityofthetruesolution
A v erage of distan e
(
¯ S
,S
)
OMP IRLS BPbyLinearProgrammingFigure2.3: Performan e ofIRLS and BP(using Matlab'sLinear-Programming) algorithms
et al [63℄. Following [63℄, linearprogramming problem is a onstrained optimization over positive ve tor
d = {d
1
, d
2
, ..., d
L
} ∈ (R
+
)
L
. Let
b = {b
1
, b
2
, ..., b
K
} ∈ R
K
, K < L
,
c =
{c
1
, ..., c
L
} ∈ R
L
be a non-zeros ve tor, andΦ ∈ R
K×L
is a matrix. Linear programming
nds
d ∈ (R
+
)
L
su h as
d
isthesolution ofthe minimization problemmin
d∈(R
+
)
L
L−1
X
i=0
d
i
c
i
subje ttoΦd = b
Now, oming ba k to the basis pursuit optimization and without lossof generality, let
A
be anormalized matrix, we havemin
x
kxk
1
subje tto
Ax = h
(2.14)If
x ∈ R
M
is de omposedinto
2
sla k variablest, v ∈ R
M
su h as
x = t − v
, and deningΦ = (A, −A) ∈ R
L×2M
,c = 1
,d = (t, v) ∈ R
2M
, and
h = b
then Equation (2.14) is rewritten aslinearprogrammingsin e2M −1
X
i=0
d
i
c
i
= kxk
1
andΦd = At − Av = h
On e the basis pursuit is reformulated as a linear programming, it an be solved using
moderninterior-pointmethods,simplexmethodsthatarebetterthanthegreedyalgorithms
asthey obtain the global solutionof awell-dened optimization problem.
Numeri alDemonstration of Basis Pursuit To evaluatethe solutionof theBasis Pursuit,
two relaxation-basedalgorithmsareperformedand ompared usingthesameexample and
the measures mentioned in Se tion (2.2.1) . The rst one is the IRLS for
p = 1
, and the se ond one isthelinear-programming byMatlab.Figure (2.3) shows eviden e that the IRLS and BP solver linear-programming by Matlab provideabetterapproximationtotheprobleminEquation(2.4)inbothevaluatedmeasures of quality ompared to OMP. However, Table (2.1) presents that omputing the solution of Equation(2.4) by theBP and IRLS take mu h more time ompared to theOMP (the bestalgorithm ingreedy mat hingalgorithms).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Average OMP 0.249 2.883 0.139 0.087 0.105 3.162 0.722 0.206 0.205 0.210 0.661 3.243 0.611 5.742 1.415 2.955 IRLS 13.92312.78122.16713.62913.32414.38314.06313.64013.99313.74417.19315.29015.30518.44418.90115.385 BPbyLinearPro36.5925.373 4.210 4.516 4.6393.79004.3964.33004.338 4.491 4.861 4.221 4.415 4.506 4.243 6.595 Table2.1: Time
×10
−2
(se onds)performan eof dieren emethods orrespondingto the
ardi-nalityof thetruesolution
2.2.3
l
1
Lagrangian Pursuit
TheEquation(2.6) ,thatis alledhereafterbythename
l
1
lagrangianpursuitorlagrangianbasis
pursuit, isoften preferredbe ause of its lose onne tion to onvexquadrati programming, for
whi h many algorithms are available. Several algorithms in lude the gradient pursuits [12℄, theGradientProje tion for Sparse Re onstru tion (GPSR) [56℄, andthe Stage-wise Orthogonal Mat hingPursuit (StOMP) method [37℄. In[12℄,agreedy element sele tion is donesimilarly as done in MP and OMP. However the ostly orthogonal proje tion is a omplished by applying
dire tionaloptimization s hemes thatarebasedonthe gradient,the onjugate gradient, andan
approximation to the onjugate gradient. The authors in [56℄ Figueiredo et al deal with the
l
1
lagrangian pursuitproblembyreformulating
(Q
λ
1
)
asthebound- onstrained quadrati program, and then solveit usinga Barzilai-Borwein gradient proje tionalgorithm originallydeveloped inthe ontextofun onstrainedminimizationofasmoothnonlinearfun tion. StOMP[37℄allowsto nd theapproximate sparse solution ofunderdetermined systems with theproperty either that
thedi tionary
A
israndom or thatthenon-zeros inx
are randomlylo ated, or both. Another approa h that is widely used by many resear hers to deal with (Q
λ
1
) is using an iterative pro edure basedon shrinkage(also alledsoftthresholding).Soft thresholding isaniterativealgorithmthatsolves(2.6)withasoftthresholdingtode rease the
l
1
-norm of oe ientsx
,and agradient des ent tode rease thevalue ofkh − Axk
.1 Initialization Choose
x
0
= 0
,letk = 0
; 2 Gradientstep Update¯
x
k
= x
k
+ γ(A
T
h − A
T
Ax
k
)
(2.15) whereγ ≤ 2kA
T
Ak
−1
3 Softthresholding Computethe omponents
x
k+1
[p]
fromx
¯
k
x
k+1
[p] = ρ
γλ
(¯
x
k
[p])
whereρ
γλ
(a) = a max(1 −
γλ
|a|
, 0)
(2.16) 4 Stop Ifkx
k
− x
k+1
k ≤ ǫ
thenstopthe iterations, otherwisesetk = k + 1
and goba kto Gradient step
The ondition
γ ≤ 2kA
T
Ak
−1
isthe onvergen e ondition.
Ba kproje tion If Equation (2.16) is repla edby an orthogonal proje tion on thesupport of
x
,namedΛ
˜
asx
k+1
[p] =
0
ifp /
∈ ˜
Λ
¯
x
k
[p]
ifp ∈ ˜
Λ
then approximation error is redu ed by a ba kproje tion that re overs