HAL Id: tel-00497797
https://tel.archives-ouvertes.fr/tel-00497797
Submitted on 6 Jul 2010
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
segmentation d’images et de videos
Vincent Martin
To cite this version:
Vincent Martin. Vision cognitive : apprentissage supervisé pour la segmentation d’images et de videos.
Interface homme-machine [cs.HC]. Université Nice Sophia Antipolis, 2007. Français. �tel-00497797�
É ole Do toraleS ien es etTe hnologiesde l'Informationet de la
Communi ation
Thèse
pour obtenirle titre de
Do teur en S ien es
de l'Université de Ni e- SophiaAntipolis
Mention: Informatique
présentéeetsoutenue par
Vin ent Martin
Cognitive Vision: Supervised Learning for
Image and Video Segmentation
Thèse dirigéepar Monique Thonnat
équiped'a ueil : ORION INRIA Sophia-Antipolis
Soutenue publiquement le19Dé embre 2007
devant le jury omposéde :
M. Mi hel Barlaud Pr,UNSA, Fran e Président
Mme Rita Cu hiara Pr,Universityof Modena, Italy Rapporteur
M. Markus Vin ze Pr,Universityof Vienna,Austria Rapporteur
M. Régis Clouard MCF,Universitéde Caen,Fran e Examinateur
M. Benoit Georis PhD,Keeneo SAS,Fran e Examinateur
I would like to thank Pr. Rita Cu hiara and Pr. Markus Vin ze for a epting
to review my PhD thesis. I want to thank them for their very pertinent advi es
and remarks.
Mer ià RégisClouardpour saparti ipation aujury. J'adresseégalement tous
mesremer iementsàBenoitGeorispoursaparti ipationà ejurydethèse. Mer i
àMi hel Barlaudd'avoir a epté deprésider e jury.
Mer i àMonique Thonnat pour m'avoir a epté danssonprojetetavoirsu si
bien diriger ma thèse. J'espère pouvoir mettre à prot tout e qu'elle a pu me
transmettre.
Un grandmer iàCatherine Martinpoursadisponibilitéetsonaidepré ieuse
pour toutes les questionsadministratives.
Mer i à Sabine Moisan pour sa bonne humeur, son aide et ses onseils
perti-nents.
Mer i à François Brémond pour sa gentillesse et son intérêt toujours présent
pour mes travauxde re her he.
Mer i à Jean-Paul Rigault pour son attention, sadisponibilitéet sesré its et
autres ane todesqui ont animénosdis ussions.
Mer iàPaulBoissardpoursonenthousiasmeetpouravoirtoujours ruennos
travaux ommuns.
Je tiens à remer ier tout parti ulièrement Bernard Boulay, mon o-bureau
duranttoutes esannéespasséesàl'INRIA.Iltientune pla eimportantedansla
réussite de ette thèse. Cefut unplaisir detravailler àses téset de proter de
sestalentsnotamment eninformatique.
J'adresse mes haleureux remer iementsà Nadia etValeryValentin pour leur
gentillesseetleur générosité sanslimite.
Je remer ie de manière plus générale tous les membres de l'équipe ORION :
Annie, Anh-Tuan, Etienne, Lan, Luis, Mar os, Mohammed, Ruiha, Suresh ainsi
que tous les an iens que j'ai pu roisé : Alberto, Benoit, Céline, Christophe,
Julien,Floren e,Florent,Gabriele,Magalie etNi olas. Ilsontsufairerégner une
ambian estudieuseetdétenduequialargement ontribuéàmonépanouissement
dansl'équipe.
Mer i aux personnels de l'INRIA et notamment duSEMIR pour leur
ompé-ten eetleur servi es.
Enn,jeremer ieprofondémentmafamillepourm'avoirtoujourssoutenudans
In this thesis we address the problem of image and video segmentation with
a ognitive vision approa h. More pre isely, we study two major issues of the
segmentationtaskinvisionsystems: thesele tionofanalgorithmandthetuning
ofits freeparameters a ordingtothe image ontentsand theappli ation needs.
Weproposealearning-basedmethodologytoeasilysetupand ontinuouslyadapt
thesegmentation task.
Our rst ontribution is a generi optimization pro edure to automati ally
extra toptimalalgorithmparameters. Theevaluationofthesegmentationquality
is done with regards to referen e segmentations. In this way, the user task is
redu edto provide referen edata of training images,asmanualsegmentations.
A se ond ontribution is a twofold strategy for the algorithm sele tion issue.
This strategy relies on a training image set representative of the problem. The
rst part uses the results of the optimization stage to perform a global ranking
ofalgorithmperforman evalues. These ondpart onsistsinidentifyingdierent
situationsfromthetrainingimagesetandthentoasso iateatunedsegmentation
algorithm withea h situation.
Athird ontributionisasemanti approa htoimagesegmentation. Inthis
ap-proa h,we ombinetheresultfromthepreviously(bottom-up)optimized
segmen-tationstoaregion labellingpro ess. Regionslabelsaregivenbyregion lassiers
whi hare trainedfrom annotated samples.
A fourth ontributionis theimplementation oftheapproa h andthe
develop-ment ofa graphi al tool urrently ableto arryout thelearning ofsegmentation
knowledge (automati parameter optimization, region annotations, region
las-sier training, and algorithm sele tion) and to use this knowledge to perform
adaptive segmentation.
Wehavetestedourapproa hontworeal-worldappli ations: abiologi al
appli- ation(dete tionand ountingofpestsonroseleaves)forthestati segmentation
part, and video surveillan e appli ations for the video gure-ground
segmenta-tion part. Results,quantitative evaluations, and omparisonswith non-adaptive
segmentationsare presentedto show thepotential of our approa h.
For the segmentationtaskinthebiologi alappli ation, theproposedadaptive
segmentation approa h over performs a non-adaptive segmentation in terms of
segmentation qualityandthus allowsthevisionsystemto ount thepests witha
groundmodelsele tion basedon ontextanalysis, myapproa hallows toenlarge
thes ope ofsurveillan e appli ations tohigh variable environments.
The main limitation of my approa h is its la k of adaptation to unforeseen
situations. An improvement ould be to use ontinuous learning te hnique to
adaptthe segmentation to newsituations.
keywords:Imagesegmentation,videosegmentation, ognitivevision,ma hine
Dans ette thèse, nous abordons le problème de la segmentation d'image dans
le adre de lavision ognitive. Plus pré isément, nous étudionsdeux problèmes
majeurs dans les systèmes de vision : la séle tion d'un algorithme de
segmen-tation et le réglage de ses paramètres selon le ontenu de l'image et les besoins
de l'appli ation. Nous proposons une méthodologie reposant sur des te hniques
d'apprentissage pour fa iliterla ongurationdesalgorithmes etadapteren
on-tinu latâ he de segmentation.
Notre première ontribution estune pro édure d'optimisation générique pour
l'extra tionautomatiquedesparamètres optimaux desalgorithmes. L'évaluation
de laqualité de lasegmentation estfaite suivant une segmentation de référen e.
De ette manière, la tâ he de l'utilisateur est réduite à fournir des données de
référen epourdes imagesd'apprentissage, ommedessegmentations manuelles.
Une se onde ontribution est une stratégie pour le problème de séle tion
d'algorithme. Cettestratégiereposesurunjeud'imagesd'apprentissage
représen-tatifdu problème. Lapremière partie utilise lerésultatde l'étape d'optimisation
pour lasser les algorithmes selon leurs valeursde performan e pour haque
im-age. La se onde partie onsiste à identier diérentes situations à partir du jeu
d'images d'apprentissage (modélisationdu ontexte) et à asso ier un algorithme
paramétré ave haquesituation identiée.
Une troisième ontribution est une appro he sémantique pour la
segmenta-tion d'image. Dans ette appro he, nous ombinons le résultat des
segmen-tations optimisées ave un pro essus d'étiquetage des régions. Les étiquettes
des régions sont données par des lassieurs de régions, eux-mêmes entrainés
à partir d'exemples annotés par l'utilisateur. Une quatrième ontribution est
l'implémentation de l'appro he et le développement d'un outil graphique dédié
àl'extra tion, l'apprentissage,etl'utilisation de la onnaissan e pour la
segmen-tation (modélisation et apprentissage du ontexte pour la séle tion dynamique
d'algorithme de segmentation, optimisation automatique des paramètres,
anno-tationsdesrégions etapprentissagedes lassieurs).
Nous avons testé notre appro he sur deux appli ations réelles : une
appli a-tionbiologique ( omptaged'inse tes surdesfeuilles de rosier)etune appli ation
de vidéo surveillan e. Pour la première appli ation, lasegmentation desinse tes
obtenue par notre appro he est de meilleure qualité qu'une segmentation
permettant d'adapter le hoix d'un modèle de fond suivant les ara téristiques
spatio-temporelles de l'image. Notre appro he permet ainsiaux appli ations de
vidéosurveillan ed'élargirleur hampd'appli ationauxenvironnementfortement
variables ommeles très longues séquen es(plusieurs heures) en extérieur.
Andemontrerlepotentieletleslimitesdenotreappro he,nousprésentonsles
résultats,uneévaluationquantitative etune omparaison ave dessegmentations
non-adaptatvie.
mot- lés : Segmentation d'image, segmentation de vidéos, vision
ogni-tive, te hniques d'apprentissage, évaluation de la segmentation, te hniques
Abstra t iii
Table of Contents vii
List of Tables xi
List of Figures xvii
1 Introdu tion 1
1.1 Motivations . . . 1
1.2 Obje tives . . . 3
1.3 Context of the Study . . . 4
1.4 Contributions . . . 4
1.5 Outline . . . 5
2 State of the Art 7 2.1 Thepla e oftheImage Segmentation Task inVision Systems . . . 7
2.1.1 Knowledge-Based Approa hes . . . 8
2.1.2 Learning Approa hes . . . 9
2.1.3 TowardsCognitive Vision . . . 10
2.1.4 Dis ussion . . . 11
2.2 Segmentation Approa hes . . . 11
2.2.1 Denitionof ImageSegmentation . . . 11
2.2.2 Stati ImageSegmentation . . . 12
2.2.2.1 Feature-Spa e BasedApproa hes . . . 13
2.2.2.2 Image-Domain Based Approa hes . . . 14
2.2.2.3 Obje tBased Approa hes . . . 16
2.2.2.4 Summary . . . 17
2.2.3 ImageSequen e Segmentation . . . 19
2.2.4 Dis ussion . . . 21
2.3 Segmentation Performan e Evaluation . . . 22
2.3.1 Unsupervised Methods . . . 23
2.3.1.1 Empiri al Methods . . . 23
2.3.1.2 Summary . . . 24
2.3.2.3 Multi-obje tive Methods . . . 25
2.3.2.4 Summary . . . 26
2.3.3 Dis ussion . . . 26
2.4 Segmentation Optimization . . . 27
2.4.1 Ba kground onOptimization Te hniques . . . 27
2.4.2 AlgorithmParameter Optimization . . . 35
2.4.3 AlgorithmSele tion . . . 37
2.4.4 Dis ussion . . . 38
2.5 Con lusion. . . 39
3 Approa h Overview 41 3.1 Introdu tion . . . 41
3.2 TheProposedApproa h . . . 41
3.2.1 Hypotheses . . . 42
3.2.2 AFramework for Adaptive ImageSegmentation . . . 42
3.2.2.1 Learning for Segmentation Parameter Tuning . . . 43
3.2.2.2 Learning to Sele t a Segmentation Algorithm . . . 44
3.2.2.3 Learning for Semanti Image Segmentation . . . . 45
3.2.2.4 Adaptive ImageSegmentation . . . 46
3.2.3 Adaptive Video Segmentation . . . 46
3.3 Con lusion. . . 48
4 A Framework for Adaptive ImageSegmentation 49 4.1 Introdu tion . . . 49
4.2 Learning forSegmentation Parameter Tuning . . . 49
4.2.1 Formalization ofthe OptimizationProblem . . . 50
4.2.2 Denition of the Segmentation Performan e Evaluation Metri . . . 50
4.2.3 Choi eof the OptimizationAlgorithm . . . 53
4.2.4 Dis ussion . . . 53
4.3 Learning to Sele t aSegmentation Algorithm . . . 54
4.3.1 ASele tion Strategy Basedon AlgorithmRanking . . . 54
4.3.2 AnAlgorithmSele tionApproa hBasedonImage-Content Analysis . . . 56
4.3.3 Summary . . . 57
4.4 Learning for Semanti ImageSegmentation . . . 58
4.4.1 ClassKnowledgeA quisition byRegion Annotations . . . . 58
4.4.2 Segmentation KnowledgeModelling . . . 61
4.5 Adaptive Image Segmentation . . . 68
5.1.1 AMajor Challenge for Integrated Pest Management . . . . 71
5.1.2 Context of theExperiment . . . 72
5.1.3 Choosing a ropand a bioagressorasamodelstudy . . . . 72
5.2 Experimental Proto ol . . . 72
5.2.1 Greenhouse experiment . . . 72
5.2.2 Samplingstrategy . . . 72
5.3 TheCognitive Vision Systemfor PestDete tion and Counting. . . 74
5.3.1 SystemOverview . . . 74
5.3.2 Learning Stage . . . 76
5.3.2.1 Learning Visual Con epts . . . 76
5.3.2.2 Learning Image Pro essingParameters . . . 76
5.3.2.3 Learning Issues . . . 77
5.3.3 Classi ationSystem . . . 77
5.3.4 ImagePro essing SupervisionSystem . . . 79
5.4 Approa h Assessment . . . 80
5.4.1 Segmentation Algorithms . . . 80
5.4.2 Parameter optimization Assessment . . . 81
5.4.3 AlgorithmSele tion . . . 90
5.4.4 Region-ClassierPerforman e Assessment . . . 92
5.4.5 FinalSegmentation QualityAssessment . . . 92
5.4.6 Overall SystemAssessment . . . 97
5.5 Evaluation ona Publi ImageDatabase . . . 104
5.6 Con lusion. . . 106
6 AdaptiveFigure-GroundSegmentationinVideoSurveillan e Ap-pli ations 107 6.1 Introdu tion . . . 107
6.2 Meta-Learning for videosegmentation algorithms . . . 108
6.2.1 Targeted Appli ations . . . 108
6.2.2 Targeted Algorithms . . . 108
6.2.3 Hypothesis . . . 109
6.2.4 Experiment . . . 109
6.3 Context AnalysisbyImageSequen e Clustering . . . 109
6.4 Real-TimeAdaptive Figure-groundSegmentation . . . 111
6.5 Experimental Results. . . 113
6.5.1 ModelSele tion Ee t . . . 114
6.5.2 Temporal Filtering Ee ts . . . 114
6.5.3 Borderlineand BadResults . . . 114
6.5.4 ComparisonwithMixture ofGaussian . . . 118
7.1.1 AGeneri Optimization Pro edure . . . 126
7.1.2 AStrategy for theAlgorithmSele tion . . . 126
7.1.3 ASemanti Approa hto Image Segmentation . . . 127
7.1.4 ASoftware Implementation of theMethodology . . . 127
7.1.5 Contributions for theCognitiveVision Platform. . . 128
7.1.6 Contributions for theBiologi al Appli ation . . . 128
7.1.7 Contributions for Video Surveillan e Appli ations . . . 128
7.2 FutureWork . . . 129
7.2.1 Short-Term Perspe tives . . . 129
7.2.2 Long-Term Perspe tives . . . 132
A Publi ations of the Author 135 B Implementation 137 B.1 ALibrary for Adaptive Image andVideo Segmentation . . . 137
B.1.1 MainClassDes riptions . . . 137
B.1.1.1 Segmentation Algorithms . . . 137
B.1.1.2 Learning Algorithms . . . 139
B.1.1.3 Optimization Algorithms . . . 139
B.1.1.4 Data manipulation . . . 139
B.2 AGraphi al Tool forAdaptive Imageand VideoSegmentation . . 139
C Fren h Introdu tion 141 C.1 Motivations . . . 141 C.2 Obje tifs. . . 142 C.3 Contextede l'étude . . . 144 C.4 Contributions . . . 145 C.5 Plan . . . 146 D Fren h Con lusion 147 D.1 Bilande l'appro he proposée etdeses ontributions . . . 148
D.1.1 Une pro édured'optimisation générique . . . 148
D.1.2 Une stratégie pour laséle tion d'algorithme . . . 148
D.1.3 Une appro he sémantiquede lasegmentation d'image . . . 149
D.1.4 Une implémentation logi ielle dela méthodologie . . . 149
D.1.5 Contributions pour laplate-formede vision ognitive . . . . 150
D.1.6 Contributions pour l'appli ation biologique . . . 150
D.1.7 Contributions pour l'appli ation vidéo . . . 151
D.2 Perspe tives . . . 151
D.2.1 Perspe tivesà ourtterme . . . 151
D.2.2 Perspe tivesà longterme . . . 154
2.1 Comparisonbetween dierent image segmentation te hniques. . . . 18
2.2 Comparison between dierent image sequen e segmentation
te h-niques. . . 22
4.1 Optimizationalgorithm parameters. . . 54
5.1 Componentsofthe segmentationalgorithm bank,their names,
pa-rameters to tune withrangeand author'sdefault values. . . 81
5.2 Setupof the optimization algorithms. . . 82
5.3 Statisti son theoptimizationperforman es forthetraining image
setusing the Simplex algorithm. . . 86
5.4 Statisti son theoptimizationperforman es forthetraining image
setusing the geneti algorithm. . . 87
5.5 Statisti son theoptimizationperforman es forthetraining image
setusing the systemati sear h. . . 89
5.6 Computational ostof ea hoptimization method. . . 89
5.7 Setupofthesegmentation,thefeatureextra tors,andthe lassiers. 92
5.8 Statisti s on thesegmentation performan es for thetest set using
dierent segmentation strategies. . . 97
5.9 FalseNegativeRate(FNR)andFalsePositiveRate(FPR)fortest
images withno white ies ( lass
C
1
), at leastone white y( lassC
2
)and for thewholetest set.. . . 104 5.10 Statisti s on the optimization performan es using the Simplexal-gorithm. . . 105
5.11 Statisti sontheoptimizationperforman esusingthegeneti
algo-rithm. . . 105
5.12 Statisti s on the optimization performan es using the systemati
sear h. . . 105
1.1 An example of the segmentation of an image with two dierent
algorithms. Therstalgorithmformsregionsa ordingtoa
multi-s ale olor riteria whilethe se ondusesalo al olor homogeneity
riteria. . . 1
1.2 Illustration of the problem of algorithm parameter tuning. An image is segmented with the same algorithm (based on olor ho-mogeneity) tunedwithtwo dierent parameter sets. . . 2
1.3 illustrationoftheproblemof ontext variations onavideo surveil-lan eappli ation. . . 2
2.1 Thethreestagesofvisualpro essingsusuallyfoundinvisionsystems. 8 2.2 Idealsegmentation results at dierent levels ofMarr'svision om-putational model. From left to right: original image, image-based level, surfa e-basedlevel, and obje t-basedlevel. . . 12
2.3 Segmentationevaluationdiagramstartingfromaninputimageand returninga segmentation assessment value. . . 23
2.4 Simpleun onstrained optimization . . . 27
2.5 Thebasi Dire t-sear h logi . . . 29
2.6 Four basi operations inthe Simplex method . . . 30
2.7 TheSimplex algorithm, withits four operations ofree tion, on-tra tion, expansion,andshrinkage. . . 31
2.8 Thestandard reinfor ement learningmodel. . . 35
2.9 Thesegmentation parameter optimization framework. . . 35
3.1 Thelearning modules hema ofthe proposed framework for adap-tiveimage segmentation. . . 42
3.2 Proposedsegmentation parameter optimization framework. Input andoutput areinboldfont. . . 43
3.3 Trainingimage set lusteringbasedonimage- ontent analysis. In-put andoutputare inboldfont.. . . 44
3.4 Learning s hema for algorithm sele tion. Input and outputare in boldfont. . . 44
3.5 Proposed region lassier training s hema. Input and output are inboldfont. . . 45
3.7 Thelearning moduleinvideo segmentation task. . . 47
3.8 Adaptive gure-ground segmentation s hema based on ontext
identi ation andba kground modelsele tion. . . 48
4.1 Limitation of the segmentation evaluation metri when weighting
terms(
w
B
m
andw
B
f
) arenot used. . . 52 4.2 Anexample of adistan e mapfroma binary ontour segmentation. 524.3 Algorithm sele tion in a toy problem with ve images and three
segmentation algorithms. The values of the table orrespond to
thesegmentation quality
ˆ
E
I
A
. . . 55 4.4 Consequen eofparameter averaging indierent evaluationproleases. . . 56
4.5 Anexampleof aparameteroptimization loop. Thenalresult(d)
is not perfe t sin e some regions areover-segmented with respe t
to theground truth(b). . . 59
4.6 Regionannotations withthe developedgraphi al tool. . . 60
4.7 Example of the mapping between a labelled groundtruth regions
andsegmentedregions. . . 61
4.8 Feature sele tion s hemabased on tuning of the feature extra tor
parameters. . . 64
4.9 Modelsele tions hemabasedon tuningofthepredi torparameters. 67
5.1 Greenhousemap showing two hapels of128 m
2
ea h. . . 73
5.2 Exampleof as anned roseleaf infestedby whiteies. . . 74
5.3 Cognitive vision system. The top part orresponds to the initial
learningmoduleand thebottom partto theautomati systemfor
routine exe ution.. . . 75
5.4 High-leveldes riptionofadomain lass(whitey). Visual on epts
are in Small Caps. learnt fuzzy ranges are shown on the right.
Theyare omposedoffour numbers, orrespondingrespe tivelyto
theminimum admissible value, the minimum and maximum most
probable values,and the maximumadmissible value. . . 78
5.5 Anexamplefromtheprogramsupervisionknowledgebase. A
om-positeoperatordes ribesanalternativede omposition(denotedby
a|)intotwosub-operators: regionoredge-basedsegmentation,and
arule sele tstherstone ifthe on eptto re ognize(asindi ated
bythe lassi ation KBS) is Shape. . . 79
5.6 Four representative training images and asso iated ground truth
segmentationsusedingure5.7 to gure5.10.. . . 83
5.7 EvaluationprolesoftheCSC algorithmappliedonthefour
train-ing images presented in Figure 5.6.
E
A
I
= 0
orresponds to the optimum. . . 84optimum. . . 85
5.9 DierentevaluationprolesoftheEGBISalgorithmappliedonthe
four training imagespresentedin Figure5.6.
E
A
I
= 0
orresponds to theoptimum.t
andσ
are thetwo freeparameters. . . 86 5.10 Dierent evaluation proles of the hysteresis thresholdingalgo-rithmapplied on thefour trainingimages presented inFigure5.6.
E
I
A
= 0
orrespondsto the optimum.T
low
andT
high
are the two freeparameters. . . 875.11 Example of optimization results for the img026 ompared to the
groundtruthwiththeir performan es ores (0= nodieren e). . . 88
5.12 Convergen e a ura y of the Simplex algorithm by varying the
maxCalls
parameter. . . 90 5.13 Convergen e a ura yof the GAbyvaryingtheinitial populationsize. . . 91
5.14 Examples of images for the two identied lusters. Left = luster
1(frontsideofthe leaves),right= luster2(ba ksideoftheleaves). 91
5.15 Performan es oftheregion lassierstrainedonthewholetraining
setand dierent olor features. . . 93
5.16 Performan es of the region lassiers trained with the tenimages
ofthe luster 1(light green roseleaves) and dierent olor features. 94
5.17 Performan es of the region lassiers trained with the tenimages
ofthe luster 2(dark green roseleaves) and dierent olor features. 95
5.18 Performan es oftheregion lassierstrained withthewhole
train-ingsetand texturefeatures. . . 96
5.19 Exampleof aninitial over-segmentedimage usedin method 6.. . . 97
5.20 Examplesofresultsonatestimagefor dierentsegmentation
on-gurations(1). . . 98
5.21 Examplesofresultsonatestimagefor dierentsegmentation
on-gurations(2). . . 99
5.22 Examplesofresultsonatestimagefor dierentsegmentation
on-gurations(3). . . 100
5.23 Examplesofresultsonatestimagefor dierentsegmentation
on-gurations(4). . . 101
5.24 Example of an ambiguous image sample for ground truth
estima-tion. The two whiteies on the top have moved during the
s an-ning. Thisleads to olor i kering whi hdo not orrespondto the
normalwhitey olor. . . 102
5.25 Evaluation of mature whitey ounting results inearly dete tion
ases(i.e. between0and5iesperleaf). Theuppergraphpresents
theresultsforthesystem onguredwithtrainedsegmentation
pa-rameters,the lowerone presents theresults for thesystem
6.1 Sixframesrepresentative ofthe ba kground modelling problem. . . 110
6.2 3-D histogram of the image sequen e used during the experiment
(seeFigure 6.1for samples). . . 112
6.3 Pie hart of the ontext lass distribution for the image sequen e
usedfor the experiments. . . 112
6.4 Illustration ofthe segmentation improvement when a dynami
se-le tionof aba kground modelis applied (right olumn). . . 115
6.5 Illustration ofthe temporal ltering ee t on the ontext analysis
(1). Columns are, from left to right: without ontext adaptation,
with ontext adaptation, with ltered ontext adaptation. Rows
areframe attime
t
andt + 1, 87s
.. . . 116 6.6 Illustration ofthe temporal ltering ee t on the ontext analysis(2). Columns are, from left to right: without ontext adaptation,
with ontext adaptation,withltered ontext adaptation. . . 116
6.7 Illustration of the shadow removal problem when the ba kground
modelisnot trained to su h situations. . . 117
6.8 Illustrationof thenoise sensitivityofa poorlytrained ba kground
model. . . 117
6.9 Illustration of the limitation to qui k adaptation of the ontext
adaptationandtemporalltering. Columnsare, fromleft toright:
without ontextadaptation, with ontext adaptation,withltered
ontext adaptation. Rows areframeat time
t
,t + 0.62s
,t + 3.12s
, andt + 7.5s
. . . 119 6.10 Comparison between the proposed approa h (left olumn)with the odebook model [Kimet al.,2005 ℄ and the MoG
model[Stauer andGrimson, 1999℄(right olumn)(1). . . 120
6.11 Comparison between the proposed approa h (left olumn)
with the odebook model [Kimet al.,2005 ℄ and the MoG
model[Stauer andGrimson, 1999℄(right olumn)(2). . . 121
6.12 Comparison between the proposed approa h (left olumn)
with the odebook model [Kimet al.,2005 ℄ and the MoG
model[Stauer and Grimson,1999 ℄(right olumn)onthesequen e
ofFigure6.9. . . 122
7.1 Exampleofalo altuningbasedonaprioriknowledgeofthes ene.
The tuning of the dete tion thresholds for pixels in
z
1
should be lesssensitive to variations thanforz
2
. . . 131 7.2 Illustration of a spatio-temporal segmentation (d) ombining theresults of a ba kground subtra tion algorithm ( ) with a
region-basedalgorithm (b). . . 131
tiond'un ritère ouleurmulti-é helle alorsquelese ondutiliseun
ritèrelo ald'homogénéité ouleur. . . 141
C.2 Illustration du problème de réglage des paramètres. Une
im-age est segmentée ave un même algorithme (basé sur un ritère
d'homogénéité ouleur)régléave deuxjeuxdeparamètresdiérents.142
C.3 Illustrationduproblèmede variations du ontextepourune
appli- ationde vidéosurveillan e . . . 143
D.1 Exemplederéglagelo aldesparamètresbasésurune onnaissan e
aprioride las ènelmée. La valeurdu seuildedéte tionpourles
pixelsdans
z
1
devraitêtreplusfaibleque ellepour lespixelsdansz
2
. . . 153 D.2 Illustration d'une segmentation spatio-temporelle (d) ombinantles résultats d'un algorithme de soustra tion de fond ( ) et d'un
Introdu tion
1.1 Motivations
Thisthesisdealswithimage segmentationinvisionsystems. Imagesegmentation
onsists in grouping pixels sharing some ommon hara teristi s. In vision
sys-tems,thesegmentationlayertypi allypre edesthesemanti analysisofanimage.
Thus, to be useful for higher-level tasks, segmentation must be adapted to the
goal, i.e. able toee tivelysegment obje ts ofinterest. The veryrst problemis
thata unique general method still doesnot exist: depending onthe appli ation,
algorithmperforman esvary. Thisisillustrated inFigureC.1wheretwodierent
algorithms are applied on the same image. The rst one seems to be visually
more e ient to separate the ladybird from the leaf. The se ond one produ es
toomany regionsnot verymeaningful.
Figure1.1: Anexampleofthesegmentationofanimagewithtwodierentalgorithms.Therst
algorithm formsregions a ording to amulti-s ale olor riteria whilethe se onduses alo al
olorhomogeneity riteria.
Basi ally,twopopularapproa hes existtosetuptheimage segmentationtask
ina vision system. A rstapproa h isto developa new segmentation algorithm
dedi atedto theappli ation task. Ase ond approa h isto empiri ally hoose an
existingalgorithm,forinstan ebyatrial-and-errorpro edure. Therstapproa h
leadstodevelopanadho algorithm,froms rat h,andfor ea hnewappli ation.
The se ond approa h does not guarantee adapted results and robustness. So, a
hoose theone bestsuited witha segmentation goal.
When designing a segmentation algorithm, internal parameters (e.g.,
thresh-olds or minimal sizes of regions) are set with default values by the algorithm
authors. Inpra ti e,itisoftenuptoanimage pro essingexperttosupervisethe
tuningofthesefreeparameterstogetmeaningfulresults. AsseeninFigureC.2,it
isnot learhowto hoosethebestparametersetregardingthesegmentedimages:
therst one isquite good but several parts of theinse tare missing;the se ond
one is also good,sin e the inse t is well outlined, but too many meaningless
re-gions are also present. However, omplex intera tions between free parameters
make the behavior of thealgorithm fairly impossible to predi t. Moreover, this
awkward task istedious and time- onsumming. Thus, the algorithm
parame-ter tuningis a real hallenge. To solve this issue, optimization methods should
be investigated inorder toautomati ally extra t optimalparameters.
Figure1.2: Illustrationoftheproblemofalgorithm parametertuning. Animageissegmented
withthesamealgorithm(basedon olorhomogeneity)tunedwithtwodierentparametersets.
In real world appli ations, when the ontext hanges, sodoes theappearan e
of the images. This is parti ularly true for video appli ations where lightning
onditionsare ontinuouslyvarying. It anbeduetolo al hanges(e.g.,shadows,
ree tions)and/orglobalillumination hanges(duetometeorologi al onditions),
as illustrated in Figure C.3 where images are extra ted from the same s ene at
dierent hours of the day. The onsequen es on segmentation results an be
dramati . This ontext adaptation issue emphasizes the need of automati
adaptation apabilities.
1.2 Obje tives
Myobje tive istoproposea ognitivevisionapproa htotheimagesegmentation
problem. Morepre isely,we aim at introdu inglearning andadaptability
apa -ities into the segmentation task. Traditionally, expli it knowledge is usedto set
upthistaskinvisionsystems. Thisknowledgeismainly omposedofimage
pro- essingprograms(e.g.,spe ializedsegmentationalgorithmsandpost-pro essings)
and of program usage knowledge to ontrol segmentation (e.g., algorithm
sele -tion and algorithm parameter settings). To this end,three main issuesof image
segmentation taskinvisionsystems shouldbe solved:
•
Therst issueis toextra t optimalparameters of segmentation algorithms in order to obtain a segmentation adapted to the segmentation task, i.e.a goal-oriented segmentation. The tuning of segmentation algorithm
pa-rameters is known to be a tri ky task and often requiresimage pro essing
skills. So,our obje tive isthreefold: rst,wewant toautomatethis taskin
order to alleviate users' eort and prevent subje tive results. Se ond, the
tness fun tion usedto assess segmentation quality should be generi (i.e.
not appli ation dependent). Third, no a priori knowledge of segmentation
algorithmbehaviorsisrequired,onlyground truthdatashould beprovided
byusers.
•
On eallthealgorithmshave been optimized,ase ond issueisto sele tthe best one. The sele tion strategy should be based on a quantitativeevalu-ation of ea h algorithm performan e. However, when images of the
appli- ation domain arehighly variable, itremains quiteimpossible to a hieve a
good segmentation withonlyone tunedalgorithm. Inthis ase, asele tion
strategy depending on theimage ontent analysis shouldbepreferred.
•
Inmany omputer visionsystems at thedete tionlayer, thegoalis to sep-arate the obje t(s) of interest from the image ba kground. When obje tsofinterestand/orimage ba kground are omplex(e.g. omposedofseveral
sub-parts), a low-level algorithm annot a hieve a semanti segmentation,
even ifoptimized. For this reason,a thirdissueisto rene the(optimized)
segmentation to provide asemanti ally meaningful segmentation to higher
visionmodules.
Our nal obje tive is to show the potential of our approa h throughtwo
dif-ferent segmentation tasksinreal-world appli ations.
•
Therstsegmentation taskwefo usonisimage segmentationina biologi- alappli ation relatedtoearlypestdete tionand ounting. Thisimpliestorobustlysegment theobje ts ofinterest (maturewhiteies) from the
om-plexba kground(roseleaves). Ourgoalistodemonstratethatthe ognitive
•
The se ond segmentation task we fo us on is gure-ground segmentation in a video surveillan e appli ation. The goal is to dete t moving obje ts(e.g.,walkingpeople) intheeldofviewofaxedvideo amera. Dete tion
isusually arried out by usingba kground subtra tion methods. However,
illumination hangesmakethe ba kground modelingproblemdi ult. Our
obje tiveisto showthatadynami sele tionofba kgroundmodelallowsto
enlargethes opeofsurveillan eappli ationstohighvariableenvironments.
1.3 Context of the Study
This work takes pla e in the Orion proje t-team at INRIA Sophia Antipolis
Méditerranée, Fran e. Orion is a leading team in s ene understanding at the
frontier of omputer vision, knowledge-based systems,and software engineering.
Orionhasa ognitivevisionapproa h. Itaimsto a hieve robust,resilient,
adapt-able omputer vision fun tionalities by endowing them witha ognitive fa ulty.
This means the ability to learn, to adapt, and to weight alternative solutions,
and develop new strategies for dete tion, re ognition, and interpretation.
Re- ently, Hudelot [Hudelot, 2005 ℄ proposed a ognitive vision platform for
seman-ti image interpretation. This platform is based on the ooperation of three
knowledge-basedsystemsofwhi honeisdedi atedtotheintelligentmanagement
ofimage pro essing programs. Maillot[Maillot, 2005℄hasendowed this platform
withlearningfa ilitiesandontology-basedsemanti knowledgerepresentationand
managementforobje tre ognition. Currently,thedete tionlayeroftheplatform
relyonadho segmentation. Thismeansthatallthesegmentationoperatorshave
been tuneddeepin ode on eandforall. Inthis ontext, myworkaimstoenri h
this ognitive visionplatform attheimage segmentation levelto enableadaptive
segmentation.
1.4 Contributions
Mymain ontributionistoproposea ognitivevisionapproa htoimage
segmen-tation bysolving the issues listedabove:
•
I propose a generi optimization pro edure to automati ally extra t opti-mal algorithm parameters. This pro edure is based on three independentomponents: asegmentation algorithm with oneor several freeparameters
to tune, a performan e evaluation metri , and an optimization algorithm.
Theevaluation ofthe segmentation qualityis donewith regards to a
refer-en esegmentation(e.g. manualsegmentation). Theperforman eevaluation
metri isgeneri ,hasalow- omputational ost,and anbeusedforabroad
range of segmentation purposes. In this way, the user task is redu ed to
provide referen edata: manual segmentationsof trainingimages.
representative ofthe problem. Therstone isbasedon aglobal rankingof
algorithm performan e values. The se ond strategy is to identify dierent
situations, alled ontexts, from the training image set and to asso iate a
tunedsegmentation algorithm withea h ontext.
•
I also propose an approa h to semanti image segmentation. In this ap-proa h, we onsider the segmentation renement problem as a regionla-belling problem. It is hen e designed for region-based segmentation
algo-rithms only. The goal is to assess the membership of ea h region to a
pre-dened set of regionssharing the same label. The assessment relies on
a preliminarysupervised learningstage where region- lassiers aretrained
with training samples. The role of the user is to label the regions of the
ground truth segmentations. The originality of this approa h is twofold.
First,we usethe optimizedsegmentations asinputoftheregion- lassiers.
Se ond, the sub-tasks of the learning pro ess, namely region feature
ex-tra tion, region feature sele tion, and lassier training, are automati ally
optimized inawrappers heme to getthebest lassi ation performan es.
Inthe s ope ofthetwo previouslydes ribed segmentation tasks,my
ontribu-tions arethe following:
•
For thesegmentation taskinthe biologi alappli ation,theproposed adap-tivesegmentation approa h overperformstheadho segmentation intermsofsegmentation qualityand thusallows thesystemto ountthepestswith
abetter pre ision.
•
For the gure-groundsegmentation task,mymain ontribution takespla e at the ontext modeling level. By a hieving dynami ba kground modelsele tionbasedon ontextanalysis,myapproa hallowstoenlargethes ope
ofsurveillan e appli ationsto highly variableenvironments.
Ea h step of the proposed approa h is tested and evaluated on several image
datasets. Thishelpsustoshowthestrengthsandthelimitationsoftheapproa h
intermsof performan e, omputational ost,and sensitivityto keyparameters.
1.5 Outline
Thismanus riptisstru turedasfollows. Chapter2introdu esthereadertoimage
segmentationinthe ontextof omputervisionsystems. Weproposeanoverview
on four topi s losely related to our problem: image segmentation in omputer
vision systems, segmentation approa hes, performan e evaluation, and
segmen-tation optimization. Chapter3 introdu es the proposed approa h,and givesour
obje tives and assumptions for the dierent segmentation issues. Chapter 4
de-tails ea h step of our approa h: algorithm parameter optimization, algorithm
the segmentation step of a ognitive vision system dedi ated to the re ognition
of biologi al organisms. In hapter 6, we present howour approa h an be used
for the adaptive gure-ground segmentation in video surveillan e appli ations.
State of the Art
2.1 The pla e of the Image Segmentation Task in
Vi-sion Systems
Inthebeginningoftheeighties,Marr[Marr,1982 ℄proposedatheoryofthehuman
per eptualvision. Thistheoryistherst ompletemethodology forthedesign of
information systems. He suggested three levels of abstra tion for theanalysis of
su h omplexsystems:
The omputational level: itdes ribeswhatisthegoalofthesystem. Ithasa
moreabstra tnaturethanthenexttwolevelsandspe iesallinformational
onstraints ne essaryto mapthe inputdata into thedesired output.
The algorithmi level: it stateshow the omputational theory an be arried
outintermsofmethods. Itisrelatedtothespe i ationofalgorithmswith
their inputand outputrepresentations.
The implementational level: it des ribes how an algorithm is embodiedas a
physi al pro ess. It has the lowest des ription level, e.g. the hardware
implementation and thesoftware ode.
An important hara teristi ofthisre onstru tive approa h ofvisionisthe
in- reasingnumberofsolutionswhilede reasing theabstra tion level. For example,
thereareseveralalgorithmstosolvethe omputationaltaskedgedete tion,and
therearemanypossible waysto implement ea h of them.
Inspired from theMarr'stheoreti al framework, most existingarti ial visual
re ognition systems, alled vision systems, follow the paradigm depi ted in
Fig-ure2.1. Animageisrstpre-pro essedinordertohighlightinformation whi his
importantforthenextstages. Classi ally,itoftenreferstothesegmentationtask.
Then, the des riptor mapping module en odes theremaining low-level data into
a symboli form more appropriate for the re ognition and analysis stage, whi h
nallyidenties the image ontent.
Figure2.1: Thethreestagesofvisualpro essingsusuallyfoundinvisionsystems.
of the wholesystem. Thus, great attention has been dire ted to the problem of
segmentation. Hundredsofpubli ationsinthiseldappeareveryyear,ea htrying
to nd an optimal solution for one spe i appli ation or for general purposes.
However, aunied, generallya epted denitionof image segmentation doesnot
yetexist. Mostauthors agreeon thefollowing fa tsabout segmentation:
•
itstaskistopartitiontheimageintoseveralsegmentsorregions(thispoint will be developed inse tion 2.2.1);•
it is an early pro essing stage in omputer vision systems. Within the omputational model for omputer vision (Figure 2.1), it belongs to theprepro essing module;
•
itisone of themost riti altasksinautomati image analysis. 2.1.1 Knowledge-Based Approa hesEarly approa hes in vision systems use expli it knowledge to dene the
seg-mentation task. In [Nazifand Levine, 1984 ℄, an expert system for low-level
im-age segmentation is proposed. The system is based on hundreds of produ tion
rules thatmanipulate ombinations ofregions and linesobtained from twobasi
segmentation algorithms. Another example an be found in the SIGMA
sys-tem [Matsuyama andHwang,1990 ℄ whi h uses a low-level vision expert module
dedi ated to handle segmentation and feature extra tion tasks for aerial image
understanding. One weakness of these systems is their appli ation dependen y.
Theknowledgea quisition ne essaryto buildtherules is alsoa bigproblem.
Then, resear hers have tried to on eive more versatile systems by
in orpo-rating veri ation and knowledge a quisition omponents. In [Ossola, 1996 ℄, an
approa hbasedonthe ooperationoftwoknowledge-basedsystems(KBS)is
pre-sented. Program supervisionte hniques[Moisanand Thonnat, 1995 ℄areusedto
pro essimagesinanintelligent way,e.g. to dynami allysetupthesegmentation
taskwithrespe tto variable onditions. A generalprogram supervision
ar hite -ture ontains three main parts: a library of programs, a knowledge base, and a
reasoningengine. Thereasoningengineisin hargeofsele tingands hedulingthe
programsofthelibrarywhi harebestsatisfyingauserquery. Theengineiterates
thefollowingloop omposedoffoursteps,untilasatisfa torysolutionisrea hed:
someparameters). Theknowledgebase ontainsade larativerepresentation (i.e.
frame and produ tion rules) of the programs alled operators. These operators
arehierar hi ally organizedinseveral levelsof abstra tion and an beprimitives
or omposites (i.e. ombination of several primitives) ones. We an ite the
OCAPI environment in [Clément and Thonnat,1993℄ as a general tool for the
development of KBS dedi atedto the supervision of programs. The strength of
theprogram supervision ar hite ture is theability to reuse programsfor various
appli ations as demonstrated in [Crubézy,1999 ℄ for the supervision of medi al
imagery programsor in[Thonnat, 2002 ℄ forthere ognition of omplex obje ts.
A related approa h for the automati generation of image pro essing
appli- ations alled BORG an be found in [Clouardet al.,1999 ℄. By opposition to
theprogram supervisionapproa h,the systemuses hierar hi al and
opportunis-ti behavior in order to onstru t a solution plan. A plan is represented by an
a tion graph of ve xed levels: requests, tasks, fun tionalities, pro edure, and
operators. Ea hlevel orrespondstoamore orless oarseversionofthesolution.
The system dynami ally onstru ts a parametrized plan from an initial user's
query. A drawba k of this approa h is that thea tion graph is onstrained to a
xednumberoflevelssupposedto overallthesolutionspa eandthus limitsthe
exibility for modeling aproblem.
One advantage of knowledge-based approa hes isthe semanti ri hness whi h
enables user-friendlyintera tion withtheend-users. Nevertheless, one drawba k
isthatthey areappli ation dependent andthusrequiresastrongexpertiseinthe
domain to buildthe knowledge bases: theyare thus limited toa lose world.
2.1.2 Learning Approa hes
Thisse tiondeals with the useofde ision theoryas abasisfor intelligent image
pro essing. Themain ideaistoredu easmu h aspossibletheroleofthehuman
expertise inthebuilding of visionsystems byma hine learningte hniques. This
prin iplewasintrodu edbyDraper[Draperetal.,1996 ℄whoarguesthatKBSare
too ad ho and too dependent on human expertise during their design. Indeed,
theuseofexpli it knowledgeisnot really suited for modeling thevariability,the
hanges,and the omplexityoftheworld.
Case-Based Reasoning (CBR) is a problem solving approa h whi h solves
new problems by adapting previously su essful solutions to similar problems.
In parti ular, the ase based approa h has been used for algorithm parameter
learning. Some interesting works an be found in [Fi et-Cau hard et al.,1999 ℄
and [Fru ietal., 2007 ℄. A ase ontains an image, ontextual information (as
image a quisition information) and algorithm parameters. Finding thebest
seg-mentation for the urrent image is done by retrieving similar ases in the ase
base. Similarity is omputed usingnon-image andimage information. The
eval-uation is done bya measure of dissimilaritybetween theoriginal image and the
segmented image. If the evaluation is bad, the learning module is a tivated to
representation of asesis an appli ation dependent problem.
In [Peng andBahnu, 1998℄, an adaptive integrated image segmentation and
obje t re ognition system is proposed and applied to re ognize ars in outdoor
imagery. The authors stress the importan e of the adaptability to real world
hangesofthesegmentationproblem, inordertoimprove theinterpretation
pro- ess. They propose to use the model mat hing onden e degree as feedba k
to inuen ethesegmentation pro ess. Ateamof sto hasti learningautomata is
usedtorepresentbothglobalandlo alimagesegmentation. Reinfor ement
learn-ingisappliedto lose theloopbetweenmodelmat hingandimage segmentation.
The mainadvantageof reinfor ement learning is thatitonly requires knowledge
of the goodness of thesystemperforman e rather than details onthe algorithm.
As a onsequen e, their method is independent of any segmentation algorithm
but dependent of the re ognition algorithm.
2.1.3 Towards Cognitive Vision
From the previous des ribed approa hes, two open problems still remain: rst,
knowledge a quisition bottlene k when a large amount of knowledge is needed
and,se ond, la kofrobustnesswhen fa edwithvarying onditions. Thus,
lassi- alvisionsystemsareoftenbrittle. Toover omethisbrittleness,anewdis ipline
alled ognitive vision has re ently emerged; a resear h road-map an be found
in [ECVISION,2005 ℄. A ognitive vision system is dened by its ability to
rea-sonfroma priori knowledge, to learn fromper eptualinformation, andto adapt
its strategy to dierent problems. This new dis ipline thus involves several
ex-isting related ones ( omputer vision, pattern re ognition, arti ial intelligen e,
ognitives ien e, et .). Somesystemshavestartedto implement ognitive vision
ideas, mainly for human behavior re ognition relying on dierent te hnologies.
For example, in [Vin zeetal.,2006℄ a ognitive system ombining low-level
im-age omponents and high-level a tivity reasoning pro esses has been developed
to re ognize humana tivities. Thissystemintegrates various te hniquessu has
onne tionism, Bayesian networks, omponent framework, and roboti s. A
og-nitive vision platformhasbeen proposedin[Hudelot andThonnat, 2003 ℄for the
re ognition of omplex naturalobje ts inimageswithreusable omponents. The
authors propose an original distributed ar hite ture based on three KBS for the
interpretation, the an horing, and the image pro essing levels. Con erning the
imagepro essingKBS,theyproposeanimagepro essingontologywhi his
appli- ationindependentbutdependentonthedatastru turesofalibraryofprograms.
Program supervision te hniquesare usedto manage the knowledgeof programs.
Finally, intheir on lusion,they stress the need of integrating ma hine learning
2.1.4 Dis ussion
We have presented the segmentation task through omputer vision approa hes.
We have seen that segmentation is a ru ial task and demands strong eorts
to vision systemdesigners in building omplex and exhaustive knowledge bases.
However, KBS are not approved unanimously by the omputer vision resear h
ommunity. As Draper said [Draper etal., 1996 ℄, we must avoid to build ad
ho systems, based on lose world assumptions. Even if program supervision
te hniquesgaintobeusedforenabling ontrolandreuseofvisionalgorithms,they
stillfailtoadaptthemselvestounknownsituations. The ognitivevisionapproa h
has been re ently introdu ed to a hieve more robust, reusable, and adaptable
omputervisionsystems. Thisapproa haimsat endowing visionsystemsmostly
with learning and adaptability fa ilities. In this ontext, the segmentation task
hasseveral hallenges to be ta kled: starting from a generi solution (e.g., from
a default parametrization), algorithms an be dynami ally tuned by means of
learningte hniques to rea h thespe i goal dened bytheuser.
To fully understand thesegmentation problem, a rst andessential task isto
drawastate-of-the-artonexistingapproa hes. Thisistheroleofthenextse tion.
2.2 Segmentation Approa hes
Many segmentation methods are based on two basi properties of the pixels in
relationtotheir lo alneighborhood: dis ontinuityandsimilarity. Methodsbased
on some dis ontinuity property of the pixels are alled boundary-based
meth-ods,whereas methods based on some similarity propertyare alledregion-based
methods. Before it an be properly stated, some fundamental on epts have to
be spe ied.
2.2.1 Denition of Image Segmentation
Imagesegmentation an be formalized through its region-based denition as
fol-lows:
Denition 1 (Image region) An image region
R
isa non-empty subset of the imageI
,su h thatR ⊆ I, R 6= ∅
Aregiondoesnot needto betopologi ally onne ted. Theexisten eofan
unbro-ken path from one region element (i.e. a pixel) to another one inside the region
issu ient.
Denition 2 (Image partition) A partition of
I
is a set ofn
regionsR
i
, i =
1, . . . , n
su h thatS
n
i=1
R
i
= I
andR
i
∩ R
j
= ∅, ∀i 6= j
Denition 3 (Image segmentation) For a ertain dened homogeneity
predi- ate
H
, a segmentationS
ofI
is a partition ofI
whi h satises:H(R
i
) = 1, ∀i
andH(R
i
∩ R
|
) = ∅
forR
i
andR
j
adja ent,i 6= j
.The rst ondition states that ea h region has to be homogeneous withrespe t
tothepredi ate
H
. These ond onditionstatesthattwoadja entregions annot be merged into a singleregion thatsatisesthepredi ateH
.Thenatureofthepredi ate
H
isthekey-element ofthedenitionof segmenta-tion. It anbe basedonly onpixelvalues,orit an judgethehigh-levelrelevan eof the partition. Sin e the solution is not unique, this makes the segmentation
an ill-posed problem in the sense of Hadamard. Then, to solve the problem, a
solution onsists in dening the segmentation, i.e. dening a predi ate
H
, for ea hlevelofabstra tion. Figure2.2depi tspossible segmentation resultsat ea hlevelofMarr's omputationalmodel. Attheimage-basedlevel,pixelsaregrouped
a ordingto their featurevalues (e.g., their gray value). The surfa e-basedlevel
dete ts surfa es, but not obje ts; for example the ba kground keeps its pat hes.
Theobje t-based leveldete tsa regionperobje t.
Figure2.2: IdealsegmentationresultsatdierentlevelsofMarr'svision omputationalmodel.
Fromlefttoright:originalimage,image-basedlevel,surfa e-basedlevel,andobje t-basedlevel.
2.2.2 Stati Image Segmentation
Several surveys of segmentation te hniques have been published. Three of
them[Paland Pal, 1993 ,Skarbekand Kos han,1994 ,Lu hese and Mitra,2001 ℄
reviewabout300publi ationsgivingafairoverviewofthestate-of-the-artin
seg-mentation at the image-based pro essing level. Pal and Pal [Paland Pal, 1993 ℄
mainlyevaluatealgorithmsforgray-valuedimagesandintrodu ethreeoftherst
attempts toexploit olor information.
Skarbek and Kos han [Skarbek andKos han,1994 ℄ on entrate their survey
on olor image segmentation. They lassify thealgorithms a ording to the
un-derlying on epts of the homogeneity predi ate
H
and identify four ategories: pixel-based, area-based, edge-based and physi s-based approa hes. Pixel-basedapproa hes onsider a region as homogeneous, ifthe featuresof its elements
be-long to the same luster in the feature-spa e. Area-based te hniques dene a
edge-based group, denes regions as those sets of pixels delimited by
inhomo-geneitiesordis ontinuities. Thisisthe omplementary on epttoarea-based
seg-mentation. Physi s-based methods in lude knowledge about physi al properties
ofthe imageformation pro ess toimprove thedete tionofregions orresponding
toobje t surfa es. Physi s-basedmethods are ategorized inthe urrent workas
surfa e-basedte hniques. Theydonot belongto theimage-basedstage,sin e all
additional knowledge about physi al properties of obje t surfa es annot be
re-gardedaspartofalow-levelhomogeneitypredi ate,butratherasexternalhigher
levelinformation about theanalyzeds ene.
Lu hese and Mitra [Lu hese andMitra, 2001℄ also review ex lusively olor
segmentation approa hes and use a similar ategorization: feature spa e based,
image domainbased andphysi sbasedte hniques. The ombination of areaand
edge-based methods into one image domain lass makes more sense nowadays,
sin e manymodernapproa hes tryto satisfyboth on epts simultaneously.
2.2.2.1 Feature-Spa e Based Approa hes
Feature-spa e approa hes generally negle t spatial relationships between image
pixelsandanalyzeex lusivelythe ongurationoftheirfeaturevalues. Algorithms
inthis ategory delimit se tions inthe feature spa eand assign thesame region
labeltoallimagepixelsfallingintothesamese tion. Twoprin iplesare ommon.
Therstonendsse tionsdete tingpeaksinunidimensionalormultidimensional
feature histograms. The se ondone uses traditional lusteringalgorithms.
Histogramthresholding
Histori ally,histogramthresholding isoneoftherstusedte hniquefor
segment-ing images. Gray-level images histograms an be ommonly de omposed into
peaks and valleys whi h hara terize obje ts and ba kgrounds. A good survey
onthesete hniques anbefoundin[Sahoo etal.,1988 ℄. Earlymethods for olor
segmentation work with several one-dimensional histograms, whi h implies that
the orrelation between dierent dimensions is ignored. More re ent algorithms
work intwo or three dimensional olor spa es and are hara terized bydierent
te hniques to robustly dete t peaks and their orresponding boundaries in the
feature spa e. The hoi e of the olor representation often plays a major part.
An additional problemof this approa h is theusually required smoothingof the
feature spa e in order to keep the size of data stru tures tra table. Many
al-gorithms sear h for peaks by approximating the histograms with a mixture of
Gaussian,andfailifthisassumption doesnot hold(afa tthat,inrealimages, is
almostalwaysthe ase).
Clustering te hniques
Clusteringapproa hes an beinterpreted asunsupervised lassi ation methods.
algo-oftheoriginal lusteringmethodsisthatthenumberof lusters(
k
)mustbeknown apriori. Severalheuristi shavebeensuggestedto omputek
automati allybased onsome image statisti s. Awell-known lustering-basedsegmentation algorithmisthemeanshift[Comani iu andMeer, 2002 ℄approa hwhi hintrodu esamethod
toautomati allydete tdierentbandwidthsfromthedataforea hse tionofthe
featurespa e. Themajor drawba kofthis on eptisits omputational ost
om-paredtosimple
k
-meansapproa hes. Thegeneralizationofthek
-meansalgorithm for olorimagesin ludingspatial onstraintsisintrodu edin[Chang etal.,1994℄.Thisalgorithm onsidersthe segmentationasamaximumaposterioriprobability
estimationproblem. Thealgorithm startswithglobalestimatesandprogressively
adaptsthe luster enters to thelo al hara teristi sof ea h region.
2.2.2.2 Image-DomainBased Approa hes
Another wayto ope withtheimage-based segmentation problemis to ompare
the feature values of ea h pixel in the image-domain, i.e. pixels are ompared
within predened spatial neighborhoods. Two major groups of algorithms an
be identied: the rst one denesregions throughthefeature similaritybetween
theirelements(area-based approa hes). These ondone identies feature
dis on-tinuities as boundaries between homogeneous regions (edge-based approa hes).
Many modern segmentation strategies try to satisfy both on epts
simultane-ously[Munozetal.,2003 ℄.
Region Growing te hniques
Traditional area-based te hniques utilize one of two prin iples: region growing
or split-and-merge. Region growing methods assumetheexisten e of some
seed-points, to whi hadja ent pixels will be added if theyfulll a homogeneity
rite-rion. Anextensivereviewisdetailedin[Fan etal.,2005 ℄. Themainadvantageof
these methods isthe reation of spatially onne ted and ompa t regions, whi h
ontrast withthe usually noisyimagepartition obtained withpurefeature-based
segmentation approa hes. Theyarefrequently applied to separateone single
ho-mogeneousobje t(e.g.,ba kground)fromtherestoftheimage,butusingseveral
seeds positioned at dierent obje ts it is also possible to perform more
sophisti- ated segmentations. The required seed sele tion is a subtask of this approa h,
whi h an besolved by takingadvantageof lusteringmethods ormorphologi al
operations, amongothers.
Split-and-Merge te hniques
Split-and-merge algorithms pro eed to su essively divide an image into smaller
non-overlapping regions while some similarity riterion is not met. A
ommon data stru ture used to implement this pro edure is the quadtree
triangula-arealsoemployedasanalternativete hniquetotherigid re tilinearnatureofthe
quadtreestru ture. Theendresultofthesplittingisanover-segmentedimage. A
mergingpro edureisthenappliedtojoinneighboring regionsunderthesame
ho-mogeneitypredi atethatwasusedforsplitting. The omparisonbetweenadja ent
regions anusesimplestatisti sor anbebasedonmoreelaboratedmathemati al
models, like Markov Random Fields (MRF),whi h also permit merging regions
ofsimilar texture[Panjwani and Healey,1995℄.
Edge based te hniques
Edgesaredis ontinuitiesinthefeature hara teristi s(e.g.,intensity)ofadja ent
pixels. Themagnitude of the gradient of agray-valued image hasbeen typi ally
employed, sin e it is a relatively robust edgeness representation form. Its
ap-proximation for dis rete digital images has been analyzed in detail in the past.
Mostmethodsinvolvetheuseofwell-known onvolution kernels,liketheRoberts,
Robinson,Prewitt,Kirs h,orSobeloperators. Thedete tionofedgepixelsisjust
the rst stage of any edge-based segmentation approa h. Further pro essing is
ne essaryinordertoprovideavalidsegmentationasstatedbyDenition3. Sin e
standarddete torslikeCanny's[Canny, 1986 ℄orSUSAN[Smithand Brady,1997 ℄
usuallyleavesomegapsbetweenobje tboundaries,someme hanismsarerequired
to llthemappropriately. Re ently,a newgeneration ofedgedete torsbased on
theEarthMover'sDistan ehavebeenproposed[Ruzonand Tomasi, 2001 ℄. They
show a better performan e due to their apability to dete t jun tions and
or-ners. However, their omputational ost is very high ompared to traditional
te hniques. A lassied and omparative study of edgedete tion algorithms an
be found in[Shari etal.,2002 ℄.
Morphologi al watershed segmentations[Vin ent and Soille, 1991 ℄ an alsobe
ategorized as an edge-based approa h. They work on a topographi al edgeness
map,where theprobabilityofa pixeltobean edgeis modeledbyits altitude. A
ooding step beginswhi h lls thevalleyswith water. Thewatershedlinesare
dete tedwhenthewateroftwodierentvalleysen ounters. Theprin ipal
advan-tage of the watershed segmentation s heme over other edge based te hniques is
thatitgenerates losedboundaries. Theregionsdened bythe losedboundaries
represent an over-segmentation of the image, sin e the algorithm is sensitive to
noise. If the gradientsare omputed at su essively higher s ales, thenumber of
lo al minima (i.e. ood basins) in the gradient magnitude image will de rease.
Theavailablete hniquesworkongray-valuedimagesobtainedusuallyasthe
gra-dient ofthe intensity.
A tive ontourmodels,alsoknownassnakes,isanotherfamilyofedge-based
algorithms [Kassetal.,1988 , Ronfard, 1994 ℄. An interesting and powerful
prop-ertyofana tive ontourmodelisitsabilitytondsubje tive ontoursand
inter-polate a ross gapsinedge hains. An a tive ontour modelrepresents an obje t
boundary or some other salient one dimensional image feature as a parametri
energy minimization problem with the intention that it yields a lo al minimum
of the asso iated energy fun tional. The original model in orporates two
inter-nal energy terms related to ontour smoothness and regularity. A tive ontour
models arewell-adapted forsegmenting obje ts innoisyimages but they require
a priori knowledge of the obje t shapes. Good illustrations of su h algorithms
arefrequentlyfoundinmedi alappli ationssu hasin[Jehan-Besson et al.,2004 ℄
and inopti al ow segmentation asin[Herbulot etal., 2006 ℄.
Hybrid Approa hes
Allpreviousmethods haveintrinsi drawba ksthat anbepartially ompensated
by ombining dierent te hniques. For instan e, lustering methods dete t
ho-mogeneous regionsin the feature spa e. However, sin e spatial relationships are
ignored, the region boundaries in the image-domain are highly irregular. In
re- entyears,numerouste hniquesfor integratingregionandboundaryinformation
have been proposed. A detailedreview of te hniques to ombine area-based and
edge-based approa hes an be found in [Munoz etal.,2003 ℄. One of the main
features of the hybrid approa hes is the timing of the integration: embedded in
the region dete tion, or after both pro esses are ompleted. The most ommon
way to perform integration in the embedded strategy onsists of in orporating
edge. Regiongrowingandsplit-and-mergearethetypi alregion-based
segmenta-tion algorithms [Zugajand Lattuati, 1998 ℄. Post-pro essing integration is based
on fusing results from single segmentation methods, attempting to ombine the
mapof regions(generally withthi kand ina urate boundaries)and themap of
edge outputs (generally with ne and sharp lines, but dislo ated) with the aim
of providing an a urate and meaningful segmentation. Another example of
hy-bridapproa h an be foundin[Chen and Wang,2004 ℄ whi h ombines olor and
texture-based segmentationsusing border renement.
2.2.2.3 Obje t Based Approa hes
Whilethe image-basedapproa h hasbeen dealtwitha relativesu ess,the
hal-lengeofaggregatingpixelsinto segmentsrepresentingmeaningfulpartsofobje ts
is mu h di ult. In fa t, segmentation is also losely related to the problem
of extra ting obje t from images. One of the oldestapproa hes to obje t-based
segmentation is template mat hing. The idea of template mat hing is to reate
a model of an obje t of interest (the template, or kernel) and then to sear h
over the image of interest for obje ts that mat h the template. The simplest
methods, based on orrelation or omparable mat hing operators, an only
de-terminethepositionofthetemplate. Themain di ultyinthis te hnique stems
from the large variability inthe shape and appearan e of obje ts within a given
lass. Consequently, the segmentation may not a urately delineate theobje t's
boundary.
The proposed approa h relies on learnt pat hes from training image samples
and a bottom-up pro ess used to derive a segmentation graph. Partial templates
are used to dete t obje t parts of a given lass (horses in the experiment) by
mat hing to the segmentation graph, even though the global appearan e of
the obje ts in the test images slightly diers from the learnt material. The
methods be ome more omplex and time onsuming if further parameters like
orientation or s ale need to be estimated. Sin ethe numberof obje ts andtheir
orientation inan image areunknowninthe urrent appli ation, thesear h spa e
for mat hing approa hesbe omesintra table.
In[S hnitman etal., 2006 ℄,anapproa hindu ingsemanti segmentation from
examples is des ribed. They argue that determining whether an entity belongs
toa parti ularsemanti partiseasierdone at thefragment levelthan ona
pixel-by-pixel basis. Starting from an example, pat h sets representing a olle tion
of homogeneous fragments are built. Then, a test image is rst over-segmented
and the labelling of ea h fragment is indu ed from the minimization of a global
labelling ost. They apply thegraph- uts multi-label optimization te hnique for
ndingthe globally optimal labelling. Sin ethis example-based approa h allows
to useanon-parametri modelofthe obje t'sparts, theyassumethat thelearnt
fragment-label pairs are representative of thepossible image variations, i.e.
illu-mination, resolution, and s ale hara teristi s. Finally, they on ede that their
approa h is onlyappropriate for images depi ting loselysimilar s enes. A
simi-lar approa h is des ribed in[Heet al.,2006℄where a probabilisti model assigns
labelstoea hregion ofan over-segmentedimagebasedonlo al,global,and
pair-wisefeatures. Asdepi ted by theauthor, their modela ura y islimitedbythe
relian eand the amount on trainingdata.
2.2.2.4 Summary
Inthis se tion, wehave presented adida ti survey onimage segmentation
te h-niques. Thegoal of this review wasto familiarize thereader with lassi al
te h-niquesrather than to givean extendedreviewof all existingalgorithms. To give
anoverallview,asummaryisdrawnupinTable2.1,inspiredbytheonepresented
in[AlvaradoMoya,2004a ℄.
Finally,we an on ludethisstudybymakingsomeimportantremarks, losely
akin tothe on lusions of[Skarbek andKos han, 1994 ℄intheir survey:
1. General purpose algorithms arenot robust and usually not algorithmi ally
e ient.
2. Allte hniquesaredependentonparameters, onstantsandthresholdswhi h
are usually xed on the basis of few experiments. Tuning and adapting
parameters israrely performed.
3. Asarule,authors ignore omparing their novelideas withexisting ones.
meth-F
eature
Spa e
+
Dete tionofhomogeneityin aglobal ontext.−
Spatialrelationshipbetweenpixels isignored.Histogram
+
Multiple1Dhistogrammethodsare omputationallyinexpensive−
Noisesensitive.−
1Dapproa hesignore orrelationbetweendierent featurespa e dimensions.−
Modelsusedtothistograms(e.g.,multi-gaussians)usuallydonot orre tlymat htherealdistributions.−
Limitedtobinarysegmentationproblems.Clustering
+
Simultaneous onsiderationofalldimensionsofthefeature spa e.+
Suitablefor olorandtexturesegmentation.+
Relativelye ientalgorithmsexist.−
Sizeornumberof lustersmustbeknownapriori.Image
Domain
+
Produ esmoother andmorea urateregionboundariesthanfeature spa e-basedapproa hes.−
Edgedete torsfallsintotheedgelinkingproblem.Area-based
+
Creationof onne ted ompa tregions.+
Fastalgorithmsavailable.−
Key-parameterstuning anbeatri kytask.Regiongrowing
+
Suitableforsegmentationof omplexobje tshaving homogeneousba kground.−
Priorinformationonoptimalnumberandpositionofseeds maybeneeded.−
Resultdependson orderinwhi hpixelsareexaminedSplit& Merge
+
Fastandexibleimplementation.−
Traditionaltessellationme hanismsprodu etoo oarse spatialquantizationartifa ts.Edge-based
Edgedete tors
+
A uratelo aldis ontinuitydete tion−
Sensitivetonoiseandparameter hanges.Watershed
+
Dete tionof losed ontours.−
Imageisoftenover-segmented.Snakes
+
Robusttonoise.−
Di ultautomati initializationofthe ontour.Hyb
rid
+
Combinationofseveralmethods anbeappropriatelyadaptedtotheneedsof ea happli ation.−
High omputational ost.Obje t-based