Vision cognitive : apprentissage supervisé pour la segmentation d'images et de videos

(1)

HAL Id: tel-00497797

https://tel.archives-ouvertes.fr/tel-00497797

Submitted on 6 Jul 2010

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

segmentation d’images et de videos

Vincent Martin

To cite this version:

Vincent Martin. Vision cognitive : apprentissage supervisé pour la segmentation d’images et de videos.

Interface homme-machine [cs.HC]. Université Nice Sophia Antipolis, 2007. Français. �tel-00497797�

(2)

É ole Do toraleS ien es etTe hnologiesde l'Informationet de la

Communi ation

Thèse

pour obtenirle titre de

Do teur en S ien es

de l'Université de Ni e- SophiaAntipolis

Mention: Informatique

présentéeetsoutenue par

Vin ent Martin

Cognitive Vision: Supervised Learning for

Image and Video Segmentation

Thèse dirigéepar Monique Thonnat

équiped'a ueil : ORION INRIA Sophia-Antipolis

Soutenue publiquement le19Dé embre 2007

devant le jury omposéde :

M. Mi hel Barlaud Pr,UNSA, Fran e Président

Mme Rita Cu hiara Pr,Universityof Modena, Italy Rapporteur

M. Markus Vin ze Pr,Universityof Vienna,Austria Rapporteur

M. Régis Clouard MCF,Universitéde Caen,Fran e Examinateur

M. Benoit Georis PhD,Keeneo SAS,Fran e Examinateur

(3)

(4)

I would like to thank Pr. Rita Cu hiara and Pr. Markus Vin ze for a epting

to review my PhD thesis. I want to thank them for their very pertinent advi es

and remarks.

Mer ià RégisClouardpour saparti ipation aujury. J'adresseégalement tous

mesremer iementsàBenoitGeorispoursaparti ipationà ejurydethèse. Mer i

àMi hel Barlaudd'avoir a epté deprésider e jury.

Mer i àMonique Thonnat pour m'avoir a epté danssonprojetetavoirsu si

bien diriger ma thèse. J'espère pouvoir mettre à prot tout e qu'elle a pu me

transmettre.

Un grandmer iàCatherine Martinpoursadisponibilitéetsonaidepré ieuse

pour toutes les questionsadministratives.

Mer i à Sabine Moisan pour sa bonne humeur, son aide et ses onseils

perti-nents.

Mer i à François Brémond pour sa gentillesse et son intérêt toujours présent

pour mes travauxde re her he.

Mer i à Jean-Paul Rigault pour son attention, sadisponibilitéet sesré its et

autres ane todesqui ont animénosdis ussions.

Mer iàPaulBoissardpoursonenthousiasmeetpouravoirtoujours ruennos

travaux ommuns.

Je tiens à remer ier tout parti ulièrement Bernard Boulay, mon o-bureau

duranttoutes esannéespasséesàl'INRIA.Iltientune pla eimportantedansla

réussite de ette thèse. Cefut unplaisir detravailler àses téset de proter de

sestalentsnotamment eninformatique.

J'adresse mes haleureux remer iementsà Nadia etValeryValentin pour leur

gentillesseetleur générosité sanslimite.

Je remer ie de manière plus générale tous les membres de l'équipe ORION :

Annie, Anh-Tuan, Etienne, Lan, Luis, Mar os, Mohammed, Ruiha, Suresh ainsi

que tous les an iens que j'ai pu roisé : Alberto, Benoit, Céline, Christophe,

Julien,Floren e,Florent,Gabriele,Magalie etNi olas. Ilsontsufairerégner une

ambian estudieuseetdétenduequialargement ontribuéàmonépanouissement

dansl'équipe.

Mer i aux personnels de l'INRIA et notamment duSEMIR pour leur

ompé-ten eetleur servi es.

(5)

Enn,jeremer ieprofondémentmafamillepourm'avoirtoujourssoutenudans

(6)

In this thesis we address the problem of image and video segmentation with

a ognitive vision approa h. More pre isely, we study two major issues of the

segmentationtaskinvisionsystems: thesele tionofanalgorithmandthetuning

ofits freeparameters a ordingtothe image ontentsand theappli ation needs.

Weproposealearning-basedmethodologytoeasilysetupand ontinuouslyadapt

thesegmentation task.

Our rst ontribution is a generi optimization pro edure to automati ally

extra toptimalalgorithmparameters. Theevaluationofthesegmentationquality

is done with regards to referen e segmentations. In this way, the user task is

redu edto provide referen edata of training images,asmanualsegmentations.

A se ond ontribution is a twofold strategy for the algorithm sele tion issue.

This strategy relies on a training image set representative of the problem. The

rst part uses the results of the optimization stage to perform a global ranking

ofalgorithmperforman evalues. These ondpart onsistsinidentifyingdierent

situationsfromthetrainingimagesetandthentoasso iateatunedsegmentation

algorithm withea h situation.

Athird ontributionisasemanti approa htoimagesegmentation. Inthis

ap-proa h,we ombinetheresultfromthepreviously(bottom-up)optimized

segmen-tationstoaregion labellingpro ess. Regionslabelsaregivenbyregion lassiers

whi hare trainedfrom annotated samples.

A fourth ontributionis theimplementation oftheapproa h andthe

develop-ment ofa graphi al tool urrently ableto arryout thelearning ofsegmentation

knowledge (automati parameter optimization, region annotations, region

las-sier training, and algorithm sele tion) and to use this knowledge to perform

adaptive segmentation.

Wehavetestedourapproa hontworeal-worldappli ations: abiologi al

appli- ation(dete tionand ountingofpestsonroseleaves)forthestati segmentation

part, and video surveillan e appli ations for the video gure-ground

segmenta-tion part. Results,quantitative evaluations, and omparisonswith non-adaptive

segmentationsare presentedto show thepotential of our approa h.

For the segmentationtaskinthebiologi alappli ation, theproposedadaptive

segmentation approa h over performs a non-adaptive segmentation in terms of

segmentation qualityandthus allowsthevisionsystemto ount thepests witha

(7)

groundmodelsele tion basedon ontextanalysis, myapproa hallows toenlarge

thes ope ofsurveillan e appli ations tohigh variable environments.

The main limitation of my approa h is its la k of adaptation to unforeseen

situations. An improvement ould be to use ontinuous learning te hnique to

adaptthe segmentation to newsituations.

keywords:Imagesegmentation,videosegmentation, ognitivevision,ma hine

(8)

Dans ette thèse, nous abordons le problème de la segmentation d'image dans

le adre de lavision ognitive. Plus pré isément, nous étudionsdeux problèmes

majeurs dans les systèmes de vision : la séle tion d'un algorithme de

segmen-tation et le réglage de ses paramètres selon le ontenu de l'image et les besoins

de l'appli ation. Nous proposons une méthodologie reposant sur des te hniques

d'apprentissage pour fa iliterla ongurationdesalgorithmes etadapteren

on-tinu latâ he de segmentation.

Notre première ontribution estune pro édure d'optimisation générique pour

l'extra tionautomatiquedesparamètres optimaux desalgorithmes. L'évaluation

de laqualité de lasegmentation estfaite suivant une segmentation de référen e.

De ette manière, la tâ he de l'utilisateur est réduite à fournir des données de

référen epourdes imagesd'apprentissage, ommedessegmentations manuelles.

Une se onde ontribution est une stratégie pour le problème de séle tion

d'algorithme. Cettestratégiereposesurunjeud'imagesd'apprentissage

représen-tatifdu problème. Lapremière partie utilise lerésultatde l'étape d'optimisation

pour lasser les algorithmes selon leurs valeursde performan e pour haque

im-age. La se onde partie onsiste à identier diérentes situations à partir du jeu

d'images d'apprentissage (modélisationdu ontexte) et à asso ier un algorithme

paramétré ave haquesituation identiée.

Une troisième ontribution est une appro he sémantique pour la

segmenta-tion d'image. Dans ette appro he, nous ombinons le résultat des

segmen-tations optimisées ave un pro essus d'étiquetage des régions. Les étiquettes

des régions sont données par des lassieurs de régions, eux-mêmes entrainés

à partir d'exemples annotés par l'utilisateur. Une quatrième ontribution est

l'implémentation de l'appro he et le développement d'un outil graphique dédié

àl'extra tion, l'apprentissage,etl'utilisation de la onnaissan e pour la

segmen-tation (modélisation et apprentissage du ontexte pour la séle tion dynamique

d'algorithme de segmentation, optimisation automatique des paramètres,

anno-tationsdesrégions etapprentissagedes lassieurs).

Nous avons testé notre appro he sur deux appli ations réelles : une

appli a-tionbiologique ( omptaged'inse tes surdesfeuilles de rosier)etune appli ation

de vidéo surveillan e. Pour la première appli ation, lasegmentation desinse tes

obtenue par notre appro he est de meilleure qualité qu'une segmentation

(9)

permettant d'adapter le hoix d'un modèle de fond suivant les ara téristiques

spatio-temporelles de l'image. Notre appro he permet ainsiaux appli ations de

vidéosurveillan ed'élargirleur hampd'appli ationauxenvironnementfortement

variables ommeles très longues séquen es(plusieurs heures) en extérieur.

Andemontrerlepotentieletleslimitesdenotreappro he,nousprésentonsles

résultats,uneévaluationquantitative etune omparaison ave dessegmentations

non-adaptatvie.

mot- lés : Segmentation d'image, segmentation de vidéos, vision

ogni-tive, te hniques d'apprentissage, évaluation de la segmentation, te hniques

(10)

Abstra t iii

Table of Contents vii

List of Tables xi

List of Figures xvii

1 Introdu tion 1

1.1 Motivations . . . 1

1.2 Obje tives . . . 3

1.3 Context of the Study . . . 4

1.4 Contributions . . . 4

1.5 Outline . . . 5

2 State of the Art 7 2.1 Thepla e oftheImage Segmentation Task inVision Systems . . . 7

2.1.1 Knowledge-Based Approa hes . . . 8

2.1.2 Learning Approa hes . . . 9

2.1.3 TowardsCognitive Vision . . . 10

2.1.4 Dis ussion . . . 11

2.2 Segmentation Approa hes . . . 11

2.2.1 Denitionof ImageSegmentation . . . 11

2.2.2 Stati ImageSegmentation . . . 12

2.2.2.1 Feature-Spa e BasedApproa hes . . . 13

2.2.2.2 Image-Domain Based Approa hes . . . 14

2.2.2.3 Obje tBased Approa hes . . . 16

2.2.2.4 Summary . . . 17

2.2.3 ImageSequen e Segmentation . . . 19

2.2.4 Dis ussion . . . 21

2.3 Segmentation Performan e Evaluation . . . 22

2.3.1 Unsupervised Methods . . . 23

2.3.1.1 Empiri al Methods . . . 23

2.3.1.2 Summary . . . 24

(11)

2.3.2.3 Multi-obje tive Methods . . . 25

2.3.2.4 Summary . . . 26

2.3.3 Dis ussion . . . 26

2.4 Segmentation Optimization . . . 27

2.4.1 Ba kground onOptimization Te hniques . . . 27

2.4.2 AlgorithmParameter Optimization . . . 35

2.4.3 AlgorithmSele tion . . . 37

2.4.4 Dis ussion . . . 38

2.5 Con lusion. . . 39

3 Approa h Overview 41 3.1 Introdu tion . . . 41

3.2 TheProposedApproa h . . . 41

3.2.1 Hypotheses . . . 42

3.2.2 AFramework for Adaptive ImageSegmentation . . . 42

3.2.2.1 Learning for Segmentation Parameter Tuning . . . 43

3.2.2.2 Learning to Sele t a Segmentation Algorithm . . . 44

3.2.2.3 Learning for Semanti Image Segmentation . . . . 45

3.2.2.4 Adaptive ImageSegmentation . . . 46

3.2.3 Adaptive Video Segmentation . . . 46

3.3 Con lusion. . . 48

4 A Framework for Adaptive ImageSegmentation 49 4.1 Introdu tion . . . 49

4.2 Learning forSegmentation Parameter Tuning . . . 49

4.2.1 Formalization ofthe OptimizationProblem . . . 50

4.2.2 Denition of the Segmentation Performan e Evaluation Metri . . . 50

4.2.3 Choi eof the OptimizationAlgorithm . . . 53

4.2.4 Dis ussion . . . 53

4.3 Learning to Sele t aSegmentation Algorithm . . . 54

4.3.1 ASele tion Strategy Basedon AlgorithmRanking . . . 54

4.3.2 AnAlgorithmSele tionApproa hBasedonImage-Content Analysis . . . 56

4.3.3 Summary . . . 57

4.4 Learning for Semanti ImageSegmentation . . . 58

4.4.1 ClassKnowledgeA quisition byRegion Annotations . . . . 58

4.4.2 Segmentation KnowledgeModelling . . . 61

4.5 Adaptive Image Segmentation . . . 68

(12)

5.1.1 AMajor Challenge for Integrated Pest Management . . . . 71

5.1.2 Context of theExperiment . . . 72

5.1.3 Choosing a ropand a bioagressorasamodelstudy . . . . 72

5.2 Experimental Proto ol . . . 72

5.2.1 Greenhouse experiment . . . 72

5.2.2 Samplingstrategy . . . 72

5.3 TheCognitive Vision Systemfor PestDete tion and Counting. . . 74

5.3.1 SystemOverview . . . 74

5.3.2 Learning Stage . . . 76

5.3.2.1 Learning Visual Con epts . . . 76

5.3.2.2 Learning Image Pro essingParameters . . . 76

5.3.2.3 Learning Issues . . . 77

5.3.3 Classi ationSystem . . . 77

5.3.4 ImagePro essing SupervisionSystem . . . 79

5.4 Approa h Assessment . . . 80

5.4.1 Segmentation Algorithms . . . 80

5.4.2 Parameter optimization Assessment . . . 81

5.4.3 AlgorithmSele tion . . . 90

5.4.4 Region-ClassierPerforman e Assessment . . . 92

5.4.5 FinalSegmentation QualityAssessment . . . 92

5.4.6 Overall SystemAssessment . . . 97

5.5 Evaluation ona Publi ImageDatabase . . . 104

5.6 Con lusion. . . 106

6 AdaptiveFigure-GroundSegmentationinVideoSurveillan e Ap-pli ations 107 6.1 Introdu tion . . . 107

6.2 Meta-Learning for videosegmentation algorithms . . . 108

6.2.1 Targeted Appli ations . . . 108

6.2.2 Targeted Algorithms . . . 108

6.2.3 Hypothesis . . . 109

6.2.4 Experiment . . . 109

6.3 Context AnalysisbyImageSequen e Clustering . . . 109

6.4 Real-TimeAdaptive Figure-groundSegmentation . . . 111

6.5 Experimental Results. . . 113

6.5.1 ModelSele tion Ee t . . . 114

6.5.2 Temporal Filtering Ee ts . . . 114

6.5.3 Borderlineand BadResults . . . 114

6.5.4 ComparisonwithMixture ofGaussian . . . 118

(13)

7.1.1 AGeneri Optimization Pro edure . . . 126

7.1.2 AStrategy for theAlgorithmSele tion . . . 126

7.1.3 ASemanti Approa hto Image Segmentation . . . 127

7.1.4 ASoftware Implementation of theMethodology . . . 127

7.1.5 Contributions for theCognitiveVision Platform. . . 128

7.1.6 Contributions for theBiologi al Appli ation . . . 128

7.1.7 Contributions for Video Surveillan e Appli ations . . . 128

7.2 FutureWork . . . 129

7.2.1 Short-Term Perspe tives . . . 129

7.2.2 Long-Term Perspe tives . . . 132

A Publi ations of the Author 135 B Implementation 137 B.1 ALibrary for Adaptive Image andVideo Segmentation . . . 137

B.1.1 MainClassDes riptions . . . 137

B.1.1.1 Segmentation Algorithms . . . 137

B.1.1.2 Learning Algorithms . . . 139

B.1.1.3 Optimization Algorithms . . . 139

B.1.1.4 Data manipulation . . . 139

B.2 AGraphi al Tool forAdaptive Imageand VideoSegmentation . . 139

C Fren h Introdu tion 141 C.1 Motivations . . . 141 C.2 Obje tifs. . . 142 C.3 Contextede l'étude . . . 144 C.4 Contributions . . . 145 C.5 Plan . . . 146 D Fren h Con lusion 147 D.1 Bilande l'appro he proposée etdeses ontributions . . . 148

D.1.1 Une pro édured'optimisation générique . . . 148

D.1.2 Une stratégie pour laséle tion d'algorithme . . . 148

D.1.3 Une appro he sémantiquede lasegmentation d'image . . . 149

D.1.4 Une implémentation logi ielle dela méthodologie . . . 149

D.1.5 Contributions pour laplate-formede vision ognitive . . . . 150

D.1.6 Contributions pour l'appli ation biologique . . . 150

D.1.7 Contributions pour l'appli ation vidéo . . . 151

D.2 Perspe tives . . . 151

D.2.1 Perspe tivesà ourtterme . . . 151

D.2.2 Perspe tivesà longterme . . . 154

(14)

2.1 Comparisonbetween dierent image segmentation te hniques. . . . 18

2.2 Comparison between dierent image sequen e segmentation

te h-niques. . . 22

4.1 Optimizationalgorithm parameters. . . 54

5.1 Componentsofthe segmentationalgorithm bank,their names,

pa-rameters to tune withrangeand author'sdefault values. . . 81

5.2 Setupof the optimization algorithms. . . 82

5.3 Statisti son theoptimizationperforman es forthetraining image

setusing the Simplex algorithm. . . 86

setusing the geneti algorithm. . . 87

setusing the systemati sear h. . . 89

5.6 Computational ostof ea hoptimization method. . . 89

5.7 Setupofthesegmentation,thefeatureextra tors,andthe lassiers. 92

5.8 Statisti s on thesegmentation performan es for thetest set using

dierent segmentation strategies. . . 97

5.9 FalseNegativeRate(FNR)andFalsePositiveRate(FPR)fortest

images withno white ies ( lass

C

1

), at leastone white y( lass

C

2

)and for thewholetest set.. . . 104 5.10 Statisti s on the optimization performan es using the Simplex

al-gorithm. . . 105

5.11 Statisti sontheoptimizationperforman esusingthegeneti

algo-rithm. . . 105

5.12 Statisti s on the optimization performan es using the systemati

sear h. . . 105

(15)

(16)

1.1 An example of the segmentation of an image with two dierent

algorithms. Therstalgorithmformsregionsa ordingtoa

multi-s ale olor riteria whilethe se ondusesalo al olor homogeneity

riteria. . . 1

1.2 Illustration of the problem of algorithm parameter tuning. An image is segmented with the same algorithm (based on olor ho-mogeneity) tunedwithtwo dierent parameter sets. . . 2

1.3 illustrationoftheproblemof ontext variations onavideo surveil-lan eappli ation. . . 2

2.1 Thethreestagesofvisualpro essingsusuallyfoundinvisionsystems. 8 2.2 Idealsegmentation results at dierent levels ofMarr'svision om-putational model. From left to right: original image, image-based level, surfa e-basedlevel, and obje t-basedlevel. . . 12

2.3 Segmentationevaluationdiagramstartingfromaninputimageand returninga segmentation assessment value. . . 23

2.4 Simpleun onstrained optimization . . . 27

2.5 Thebasi Dire t-sear h logi . . . 29

2.6 Four basi operations inthe Simplex method . . . 30

2.7 TheSimplex algorithm, withits four operations ofree tion, on-tra tion, expansion,andshrinkage. . . 31

2.8 Thestandard reinfor ement learningmodel. . . 35

2.9 Thesegmentation parameter optimization framework. . . 35

3.1 Thelearning modules hema ofthe proposed framework for adap-tiveimage segmentation. . . 42

3.2 Proposedsegmentation parameter optimization framework. Input andoutput areinboldfont. . . 43

3.3 Trainingimage set lusteringbasedonimage- ontent analysis. In-put andoutputare inboldfont.. . . 44

3.4 Learning s hema for algorithm sele tion. Input and outputare in boldfont. . . 44

3.5 Proposed region lassier training s hema. Input and output are inboldfont. . . 45

(17)

3.7 Thelearning moduleinvideo segmentation task. . . 47

3.8 Adaptive gure-ground segmentation s hema based on ontext

identi ation andba kground modelsele tion. . . 48

4.1 Limitation of the segmentation evaluation metri when weighting

terms(

w

B

m

and

w

B

f

) arenot used. . . 52 4.2 Anexample of adistan e mapfroma binary ontour segmentation. 52

4.3 Algorithm sele tion in a toy problem with ve images and three

segmentation algorithms. The values of the table orrespond to

thesegmentation quality

ˆ

E

_I

A

. . . 55 4.4 Consequen eofparameter averaging indierent evaluationprole

ases. . . 56

4.5 Anexampleof aparameteroptimization loop. Thenalresult(d)

is not perfe t sin e some regions areover-segmented with respe t

to theground truth(b). . . 59

4.6 Regionannotations withthe developedgraphi al tool. . . 60

4.7 Example of the mapping between a labelled groundtruth regions

andsegmentedregions. . . 61

4.8 Feature sele tion s hemabased on tuning of the feature extra tor

parameters. . . 64

4.9 Modelsele tions hemabasedon tuningofthepredi torparameters. 67

5.1 Greenhousemap showing two hapels of128 m

2

ea h. . . 73

5.2 Exampleof as anned roseleaf infestedby whiteies. . . 74

5.3 Cognitive vision system. The top part orresponds to the initial

learningmoduleand thebottom partto theautomati systemfor

routine exe ution.. . . 75

5.4 High-leveldes riptionofadomain lass(whitey). Visual on epts

are in Small Caps. learnt fuzzy ranges are shown on the right.

Theyare omposedoffour numbers, orrespondingrespe tivelyto

theminimum admissible value, the minimum and maximum most

probable values,and the maximumadmissible value. . . 78

5.5 Anexamplefromtheprogramsupervisionknowledgebase. A

om-positeoperatordes ribesanalternativede omposition(denotedby

a|)intotwosub-operators: regionoredge-basedsegmentation,and

arule sele tstherstone ifthe on eptto re ognize(asindi ated

bythe lassi ation KBS) is Shape. . . 79

5.6 Four representative training images and asso iated ground truth

segmentationsusedingure5.7 to gure5.10.. . . 83

5.7 EvaluationprolesoftheCSC algorithmappliedonthefour

train-ing images presented in Figure 5.6.

E

A

I

= 0

orresponds to the optimum. . . 84

(18)

optimum. . . 85

5.9 DierentevaluationprolesoftheEGBISalgorithmappliedonthe

four training imagespresentedin Figure5.6.

E

A

I

= 0

orresponds to theoptimum.

t

and

σ

are thetwo freeparameters. . . 86 5.10 Dierent evaluation proles of the hysteresis thresholding

algo-rithmapplied on thefour trainingimages presented inFigure5.6.

E

_I

A

= 0

orrespondsto the optimum.

T

low

and

T

high

are the two freeparameters. . . 87

5.11 Example of optimization results for the img026 ompared to the

groundtruthwiththeir performan es ores (0= nodieren e). . . 88

5.12 Convergen e a ura y of the Simplex algorithm by varying the

maxCalls

parameter. . . 90 5.13 Convergen e a ura yof the GAbyvaryingtheinitial population

size. . . 91

5.14 Examples of images for the two identied lusters. Left = luster

1(frontsideofthe leaves),right= luster2(ba ksideoftheleaves). 91

5.15 Performan es oftheregion lassierstrainedonthewholetraining

setand dierent olor features. . . 93

5.16 Performan es of the region lassiers trained with the tenimages

ofthe luster 1(light green roseleaves) and dierent olor features. 94

5.17 Performan es of the region lassiers trained with the tenimages

ofthe luster 2(dark green roseleaves) and dierent olor features. 95

5.18 Performan es oftheregion lassierstrained withthewhole

train-ingsetand texturefeatures. . . 96

5.19 Exampleof aninitial over-segmentedimage usedin method 6.. . . 97

5.20 Examplesofresultsonatestimagefor dierentsegmentation

on-gurations(1). . . 98

5.24 Example of an ambiguous image sample for ground truth

estima-tion. The two whiteies on the top have moved during the

s an-ning. Thisleads to olor i kering whi hdo not orrespondto the

normalwhitey olor. . . 102

5.25 Evaluation of mature whitey ounting results inearly dete tion

ases(i.e. between0and5iesperleaf). Theuppergraphpresents

theresultsforthesystem onguredwithtrainedsegmentation

pa-rameters,the lowerone presents theresults for thesystem

(19)

6.1 Sixframesrepresentative ofthe ba kground modelling problem. . . 110

6.2 3-D histogram of the image sequen e used during the experiment

(seeFigure 6.1for samples). . . 112

6.3 Pie hart of the ontext lass distribution for the image sequen e

usedfor the experiments. . . 112

6.4 Illustration ofthe segmentation improvement when a dynami

se-le tionof aba kground modelis applied (right olumn). . . 115

6.5 Illustration ofthe temporal ltering ee t on the ontext analysis

(1). Columns are, from left to right: without ontext adaptation,

with ontext adaptation, with ltered ontext adaptation. Rows

areframe attime

t

and

t + 1, 87s

.. . . 116 6.6 Illustration ofthe temporal ltering ee t on the ontext analysis

(2). Columns are, from left to right: without ontext adaptation,

with ontext adaptation,withltered ontext adaptation. . . 116

6.7 Illustration of the shadow removal problem when the ba kground

modelisnot trained to su h situations. . . 117

6.8 Illustrationof thenoise sensitivityofa poorlytrained ba kground

model. . . 117

6.9 Illustration of the limitation to qui k adaptation of the ontext

adaptationandtemporalltering. Columnsare, fromleft toright:

without ontextadaptation, with ontext adaptation,withltered

ontext adaptation. Rows areframeat time

t

,

t + 0.62s

,

t + 3.12s

, and

t + 7.5s

. . . 119 6.10 Comparison between the proposed approa h (left olumn)

with the odebook model [Kimet al.,2005 ℄ and the MoG

model[Stauer andGrimson, 1999℄(right olumn)(1). . . 120

6.11 Comparison between the proposed approa h (left olumn)

model[Stauer andGrimson, 1999℄(right olumn)(2). . . 121

6.12 Comparison between the proposed approa h (left olumn)

model[Stauer and Grimson,1999 ℄(right olumn)onthesequen e

ofFigure6.9. . . 122

7.1 Exampleofalo altuningbasedonaprioriknowledgeofthes ene.

The tuning of the dete tion thresholds for pixels in

z

1

should be lesssensitive to variations thanfor

z

2

. . . 131 7.2 Illustration of a spatio-temporal segmentation (d) ombining the

results of a ba kground subtra tion algorithm ( ) with a

region-basedalgorithm (b). . . 131

(20)

tiond'un ritère ouleurmulti-é helle alorsquelese ondutiliseun

ritèrelo ald'homogénéité ouleur. . . 141

C.2 Illustration du problème de réglage des paramètres. Une

im-age est segmentée ave un même algorithme (basé sur un ritère

d'homogénéité ouleur)régléave deuxjeuxdeparamètresdiérents.142

C.3 Illustrationduproblèmede variations du ontextepourune

appli- ationde vidéosurveillan e . . . 143

D.1 Exemplederéglagelo aldesparamètresbasésurune onnaissan e

aprioride las ènelmée. La valeurdu seuildedéte tionpourles

pixelsdans

z

1

devraitêtreplusfaibleque ellepour lespixelsdans

z

2

. . . 153 D.2 Illustration d'une segmentation spatio-temporelle (d) ombinant

les résultats d'un algorithme de soustra tion de fond ( ) et d'un

(21)

(22)

Introdu tion

1.1 Motivations

Thisthesisdealswithimage segmentationinvisionsystems. Imagesegmentation

onsists in grouping pixels sharing some ommon hara teristi s. In vision

sys-tems,thesegmentationlayertypi allypre edesthesemanti analysisofanimage.

Thus, to be useful for higher-level tasks, segmentation must be adapted to the

goal, i.e. able toee tivelysegment obje ts ofinterest. The veryrst problemis

thata unique general method still doesnot exist: depending onthe appli ation,

algorithmperforman esvary. Thisisillustrated inFigureC.1wheretwodierent

algorithms are applied on the same image. The rst one seems to be visually

more e ient to separate the ladybird from the leaf. The se ond one produ es

toomany regionsnot verymeaningful.

Figure1.1: Anexampleofthesegmentationofanimagewithtwodierentalgorithms.Therst

algorithm formsregions a ording to amulti-s ale olor riteria whilethe se onduses alo al

olorhomogeneity riteria.

Basi ally,twopopularapproa hes existtosetuptheimage segmentationtask

ina vision system. A rstapproa h isto developa new segmentation algorithm

dedi atedto theappli ation task. Ase ond approa h isto empiri ally hoose an

existingalgorithm,forinstan ebyatrial-and-errorpro edure. Therstapproa h

leadstodevelopanadho algorithm,froms rat h,andfor ea hnewappli ation.

The se ond approa h does not guarantee adapted results and robustness. So, a

(23)

hoose theone bestsuited witha segmentation goal.

When designing a segmentation algorithm, internal parameters (e.g.,

thresh-olds or minimal sizes of regions) are set with default values by the algorithm

authors. Inpra ti e,itisoftenuptoanimage pro essingexperttosupervisethe

tuningofthesefreeparameterstogetmeaningfulresults. AsseeninFigureC.2,it

isnot learhowto hoosethebestparametersetregardingthesegmentedimages:

therst one isquite good but several parts of theinse tare missing;the se ond

one is also good,sin e the inse t is well outlined, but too many meaningless

re-gions are also present. However, omplex intera tions between free parameters

make the behavior of thealgorithm fairly impossible to predi t. Moreover, this

awkward task istedious and time- onsumming. Thus, the algorithm

parame-ter tuningis a real hallenge. To solve this issue, optimization methods should

be investigated inorder toautomati ally extra t optimalparameters.

Figure1.2: Illustrationoftheproblemofalgorithm parametertuning. Animageissegmented

withthesamealgorithm(basedon olorhomogeneity)tunedwithtwodierentparametersets.

In real world appli ations, when the ontext hanges, sodoes theappearan e

of the images. This is parti ularly true for video appli ations where lightning

onditionsare ontinuouslyvarying. It anbeduetolo al hanges(e.g.,shadows,

ree tions)and/orglobalillumination hanges(duetometeorologi al onditions),

as illustrated in Figure C.3 where images are extra ted from the same s ene at

dierent hours of the day. The onsequen es on segmentation results an be

dramati . This ontext adaptation issue emphasizes the need of automati

adaptation apabilities.

(24)

1.2 Obje tives

Myobje tive istoproposea ognitivevisionapproa htotheimagesegmentation

problem. Morepre isely,we aim at introdu inglearning andadaptability

apa -ities into the segmentation task. Traditionally, expli it knowledge is usedto set

upthistaskinvisionsystems. Thisknowledgeismainly omposedofimage

pro- essingprograms(e.g.,spe ializedsegmentationalgorithmsandpost-pro essings)

and of program usage knowledge to ontrol segmentation (e.g., algorithm

sele -tion and algorithm parameter settings). To this end,three main issuesof image

segmentation taskinvisionsystems shouldbe solved:

•

Therst issueis toextra t optimalparameters of segmentation algorithms in order to obtain a segmentation adapted to the segmentation task, i.e.

a goal-oriented segmentation. The tuning of segmentation algorithm

pa-rameters is known to be a tri ky task and often requiresimage pro essing

skills. So,our obje tive isthreefold: rst,wewant toautomatethis taskin

order to alleviate users' eort and prevent subje tive results. Se ond, the

tness fun tion usedto assess segmentation quality should be generi (i.e.

not appli ation dependent). Third, no a priori knowledge of segmentation

algorithmbehaviorsisrequired,onlyground truthdatashould beprovided

byusers.

•

On eallthealgorithmshave been optimized,ase ond issueisto sele tthe best one. The sele tion strategy should be based on a quantitative

evalu-ation of ea h algorithm performan e. However, when images of the

appli- ation domain arehighly variable, itremains quiteimpossible to a hieve a

good segmentation withonlyone tunedalgorithm. Inthis ase, asele tion

strategy depending on theimage ontent analysis shouldbepreferred.

•

Inmany omputer visionsystems at thedete tionlayer, thegoalis to sep-arate the obje t(s) of interest from the image ba kground. When obje ts

ofinterestand/orimage ba kground are omplex(e.g. omposedofseveral

sub-parts), a low-level algorithm annot a hieve a semanti segmentation,

even ifoptimized. For this reason,a thirdissueisto rene the(optimized)

segmentation to provide asemanti ally meaningful segmentation to higher

visionmodules.

Our nal obje tive is to show the potential of our approa h throughtwo

dif-ferent segmentation tasksinreal-world appli ations.

•

Therstsegmentation taskwefo usonisimage segmentationina biologi- alappli ation relatedtoearlypestdete tionand ounting. Thisimpliesto

robustlysegment theobje ts ofinterest (maturewhiteies) from the

om-plexba kground(roseleaves). Ourgoalistodemonstratethatthe ognitive

(25)

•

The se ond segmentation task we fo us on is gure-ground segmentation in a video surveillan e appli ation. The goal is to dete t moving obje ts

(e.g.,walkingpeople) intheeldofviewofaxedvideo amera. Dete tion

isusually arried out by usingba kground subtra tion methods. However,

illumination hangesmakethe ba kground modelingproblemdi ult. Our

obje tiveisto showthatadynami sele tionofba kgroundmodelallowsto

enlargethes opeofsurveillan eappli ationstohighvariableenvironments.

1.3 Context of the Study

This work takes pla e in the Orion proje t-team at INRIA Sophia Antipolis

Méditerranée, Fran e. Orion is a leading team in s ene understanding at the

frontier of omputer vision, knowledge-based systems,and software engineering.

Orionhasa ognitivevisionapproa h. Itaimsto a hieve robust,resilient,

adapt-able omputer vision fun tionalities by endowing them witha ognitive fa ulty.

This means the ability to learn, to adapt, and to weight alternative solutions,

and develop new strategies for dete tion, re ognition, and interpretation.

Re- ently, Hudelot [Hudelot, 2005 ℄ proposed a ognitive vision platform for

seman-ti image interpretation. This platform is based on the ooperation of three

knowledge-basedsystemsofwhi honeisdedi atedtotheintelligentmanagement

ofimage pro essing programs. Maillot[Maillot, 2005℄hasendowed this platform

withlearningfa ilitiesandontology-basedsemanti knowledgerepresentationand

managementforobje tre ognition. Currently,thedete tionlayeroftheplatform

relyonadho segmentation. Thismeansthatallthesegmentationoperatorshave

been tuneddeepin ode on eandforall. Inthis ontext, myworkaimstoenri h

this ognitive visionplatform attheimage segmentation levelto enableadaptive

segmentation.

1.4 Contributions

Mymain ontributionistoproposea ognitivevisionapproa htoimage

segmen-tation bysolving the issues listedabove:

•

I propose a generi optimization pro edure to automati ally extra t opti-mal algorithm parameters. This pro edure is based on three independent

omponents: asegmentation algorithm with oneor several freeparameters

to tune, a performan e evaluation metri , and an optimization algorithm.

Theevaluation ofthe segmentation qualityis donewith regards to a

refer-en esegmentation(e.g. manualsegmentation). Theperforman eevaluation

metri isgeneri ,hasalow- omputational ost,and anbeusedforabroad

range of segmentation purposes. In this way, the user task is redu ed to

provide referen edata: manual segmentationsof trainingimages.

(26)

representative ofthe problem. Therstone isbasedon aglobal rankingof

algorithm performan e values. The se ond strategy is to identify dierent

situations, alled ontexts, from the training image set and to asso iate a

tunedsegmentation algorithm withea h ontext.

•

I also propose an approa h to semanti image segmentation. In this ap-proa h, we onsider the segmentation renement problem as a region

la-belling problem. It is hen e designed for region-based segmentation

algo-rithms only. The goal is to assess the membership of ea h region to a

pre-dened set of regionssharing the same label. The assessment relies on

a preliminarysupervised learningstage where region- lassiers aretrained

with training samples. The role of the user is to label the regions of the

ground truth segmentations. The originality of this approa h is twofold.

First,we usethe optimizedsegmentations asinputoftheregion- lassiers.

Se ond, the sub-tasks of the learning pro ess, namely region feature

ex-tra tion, region feature sele tion, and lassier training, are automati ally

optimized inawrappers heme to getthebest lassi ation performan es.

Inthe s ope ofthetwo previouslydes ribed segmentation tasks,my

ontribu-tions arethe following:

•

For thesegmentation taskinthe biologi alappli ation,theproposed adap-tivesegmentation approa h overperformstheadho segmentation interms

ofsegmentation qualityand thusallows thesystemto ountthepestswith

abetter pre ision.

•

For the gure-groundsegmentation task,mymain ontribution takespla e at the ontext modeling level. By a hieving dynami ba kground model

sele tionbasedon ontextanalysis,myapproa hallowstoenlargethes ope

ofsurveillan e appli ationsto highly variableenvironments.

Ea h step of the proposed approa h is tested and evaluated on several image

datasets. Thishelpsustoshowthestrengthsandthelimitationsoftheapproa h

intermsof performan e, omputational ost,and sensitivityto keyparameters.

1.5 Outline

Thismanus riptisstru turedasfollows. Chapter2introdu esthereadertoimage

segmentationinthe ontextof omputervisionsystems. Weproposeanoverview

on four topi s losely related to our problem: image segmentation in omputer

vision systems, segmentation approa hes, performan e evaluation, and

segmen-tation optimization. Chapter3 introdu es the proposed approa h,and givesour

obje tives and assumptions for the dierent segmentation issues. Chapter 4

de-tails ea h step of our approa h: algorithm parameter optimization, algorithm

(27)

the segmentation step of a ognitive vision system dedi ated to the re ognition

of biologi al organisms. In hapter 6, we present howour approa h an be used

for the adaptive gure-ground segmentation in video surveillan e appli ations.

(28)

State of the Art

2.1 The pla e of the Image Segmentation Task in

Vi-sion Systems

Inthebeginningoftheeighties,Marr[Marr,1982 ℄proposedatheoryofthehuman

per eptualvision. Thistheoryistherst ompletemethodology forthedesign of

information systems. He suggested three levels of abstra tion for theanalysis of

su h omplexsystems:

The omputational level: itdes ribeswhatisthegoalofthesystem. Ithasa

moreabstra tnaturethanthenexttwolevelsandspe iesallinformational

onstraints ne essaryto mapthe inputdata into thedesired output.

The algorithmi level: it stateshow the omputational theory an be arried

outintermsofmethods. Itisrelatedtothespe i ationofalgorithmswith

their inputand outputrepresentations.

The implementational level: it des ribes how an algorithm is embodiedas a

physi al pro ess. It has the lowest des ription level, e.g. the hardware

implementation and thesoftware ode.

An important hara teristi ofthisre onstru tive approa h ofvisionisthe

in- reasingnumberofsolutionswhilede reasing theabstra tion level. For example,

thereareseveralalgorithmstosolvethe omputationaltaskedgedete tion,and

therearemanypossible waysto implement ea h of them.

Inspired from theMarr'stheoreti al framework, most existingarti ial visual

re ognition systems, alled vision systems, follow the paradigm depi ted in

Fig-ure2.1. Animageisrstpre-pro essedinordertohighlightinformation whi his

importantforthenextstages. Classi ally,itoftenreferstothesegmentationtask.

Then, the des riptor mapping module en odes theremaining low-level data into

a symboli form more appropriate for the re ognition and analysis stage, whi h

nallyidenties the image ontent.

(29)

Figure2.1: Thethreestagesofvisualpro essingsusuallyfoundinvisionsystems.

of the wholesystem. Thus, great attention has been dire ted to the problem of

segmentation. Hundredsofpubli ationsinthiseldappeareveryyear,ea htrying

to nd an optimal solution for one spe i appli ation or for general purposes.

However, aunied, generallya epted denitionof image segmentation doesnot

yetexist. Mostauthors agreeon thefollowing fa tsabout segmentation:

•

itstaskistopartitiontheimageintoseveralsegmentsorregions(thispoint will be developed inse tion 2.2.1);

•

it is an early pro essing stage in omputer vision systems. Within the omputational model for omputer vision (Figure 2.1), it belongs to the

prepro essing module;

•

itisone of themost riti altasksinautomati image analysis. 2.1.1 Knowledge-Based Approa hes

Early approa hes in vision systems use expli it knowledge to dene the

seg-mentation task. In [Nazifand Levine, 1984 ℄, an expert system for low-level

im-age segmentation is proposed. The system is based on hundreds of produ tion

rules thatmanipulate ombinations ofregions and linesobtained from twobasi

segmentation algorithms. Another example an be found in the SIGMA

sys-tem [Matsuyama andHwang,1990 ℄ whi h uses a low-level vision expert module

dedi ated to handle segmentation and feature extra tion tasks for aerial image

understanding. One weakness of these systems is their appli ation dependen y.

Theknowledgea quisition ne essaryto buildtherules is alsoa bigproblem.

Then, resear hers have tried to on eive more versatile systems by

in orpo-rating veri ation and knowledge a quisition omponents. In [Ossola, 1996 ℄, an

approa hbasedonthe ooperationoftwoknowledge-basedsystems(KBS)is

pre-sented. Program supervisionte hniques[Moisanand Thonnat, 1995 ℄areusedto

pro essimagesinanintelligent way,e.g. to dynami allysetupthesegmentation

taskwithrespe tto variable onditions. A generalprogram supervision

ar hite -ture ontains three main parts: a library of programs, a knowledge base, and a

reasoningengine. Thereasoningengineisin hargeofsele tingands hedulingthe

programsofthelibrarywhi harebestsatisfyingauserquery. Theengineiterates

thefollowingloop omposedoffoursteps,untilasatisfa torysolutionisrea hed:

(30)

someparameters). Theknowledgebase ontainsade larativerepresentation (i.e.

frame and produ tion rules) of the programs alled operators. These operators

arehierar hi ally organizedinseveral levelsof abstra tion and an beprimitives

or omposites (i.e. ombination of several primitives) ones. We an ite the

OCAPI environment in [Clément and Thonnat,1993℄ as a general tool for the

development of KBS dedi atedto the supervision of programs. The strength of

theprogram supervision ar hite ture is theability to reuse programsfor various

appli ations as demonstrated in [Crubézy,1999 ℄ for the supervision of medi al

imagery programsor in[Thonnat, 2002 ℄ forthere ognition of omplex obje ts.

A related approa h for the automati generation of image pro essing

appli- ations alled BORG an be found in [Clouardet al.,1999 ℄. By opposition to

theprogram supervisionapproa h,the systemuses hierar hi al and

opportunis-ti behavior in order to onstru t a solution plan. A plan is represented by an

a tion graph of ve xed levels: requests, tasks, fun tionalities, pro edure, and

operators. Ea hlevel orrespondstoamore orless oarseversionofthesolution.

The system dynami ally onstru ts a parametrized plan from an initial user's

query. A drawba k of this approa h is that thea tion graph is onstrained to a

xednumberoflevelssupposedto overallthesolutionspa eandthus limitsthe

exibility for modeling aproblem.

One advantage of knowledge-based approa hes isthe semanti ri hness whi h

enables user-friendlyintera tion withtheend-users. Nevertheless, one drawba k

isthatthey areappli ation dependent andthusrequiresastrongexpertiseinthe

domain to buildthe knowledge bases: theyare thus limited toa lose world.

2.1.2 Learning Approa hes

Thisse tiondeals with the useofde ision theoryas abasisfor intelligent image

pro essing. Themain ideaistoredu easmu h aspossibletheroleofthehuman

expertise inthebuilding of visionsystems byma hine learningte hniques. This

prin iplewasintrodu edbyDraper[Draperetal.,1996 ℄whoarguesthatKBSare

too ad ho and too dependent on human expertise during their design. Indeed,

theuseofexpli it knowledgeisnot really suited for modeling thevariability,the

hanges,and the omplexityoftheworld.

Case-Based Reasoning (CBR) is a problem solving approa h whi h solves

new problems by adapting previously su essful solutions to similar problems.

In parti ular, the ase based approa h has been used for algorithm parameter

learning. Some interesting works an be found in [Fi et-Cau hard et al.,1999 ℄

and [Fru ietal., 2007 ℄. A ase ontains an image, ontextual information (as

image a quisition information) and algorithm parameters. Finding thebest

seg-mentation for the urrent image is done by retrieving similar ases in the ase

base. Similarity is omputed usingnon-image andimage information. The

eval-uation is done bya measure of dissimilaritybetween theoriginal image and the

segmented image. If the evaluation is bad, the learning module is a tivated to

(31)

representation of asesis an appli ation dependent problem.

In [Peng andBahnu, 1998℄, an adaptive integrated image segmentation and

obje t re ognition system is proposed and applied to re ognize ars in outdoor

imagery. The authors stress the importan e of the adaptability to real world

hangesofthesegmentationproblem, inordertoimprove theinterpretation

pro- ess. They propose to use the model mat hing onden e degree as feedba k

to inuen ethesegmentation pro ess. Ateamof sto hasti learningautomata is

usedtorepresentbothglobalandlo alimagesegmentation. Reinfor ement

learn-ingisappliedto lose theloopbetweenmodelmat hingandimage segmentation.

The mainadvantageof reinfor ement learning is thatitonly requires knowledge

of the goodness of thesystemperforman e rather than details onthe algorithm.

As a onsequen e, their method is independent of any segmentation algorithm

but dependent of the re ognition algorithm.

2.1.3 Towards Cognitive Vision

From the previous des ribed approa hes, two open problems still remain: rst,

knowledge a quisition bottlene k when a large amount of knowledge is needed

and,se ond, la kofrobustnesswhen fa edwithvarying onditions. Thus,

lassi- alvisionsystemsareoftenbrittle. Toover omethisbrittleness,anewdis ipline

alled ognitive vision has re ently emerged; a resear h road-map an be found

in [ECVISION,2005 ℄. A ognitive vision system is dened by its ability to

rea-sonfroma priori knowledge, to learn fromper eptualinformation, andto adapt

its strategy to dierent problems. This new dis ipline thus involves several

ex-isting related ones ( omputer vision, pattern re ognition, arti ial intelligen e,

ognitives ien e, et .). Somesystemshavestartedto implement ognitive vision

ideas, mainly for human behavior re ognition relying on dierent te hnologies.

For example, in [Vin zeetal.,2006℄ a ognitive system ombining low-level

im-age omponents and high-level a tivity reasoning pro esses has been developed

to re ognize humana tivities. Thissystemintegrates various te hniquessu has

onne tionism, Bayesian networks, omponent framework, and roboti s. A

og-nitive vision platformhasbeen proposedin[Hudelot andThonnat, 2003 ℄for the

re ognition of omplex naturalobje ts inimageswithreusable omponents. The

authors propose an original distributed ar hite ture based on three KBS for the

interpretation, the an horing, and the image pro essing levels. Con erning the

imagepro essingKBS,theyproposeanimagepro essingontologywhi his

appli- ationindependentbutdependentonthedatastru turesofalibraryofprograms.

Program supervision te hniquesare usedto manage the knowledgeof programs.

Finally, intheir on lusion,they stress the need of integrating ma hine learning

(32)

2.1.4 Dis ussion

We have presented the segmentation task through omputer vision approa hes.

We have seen that segmentation is a ru ial task and demands strong eorts

to vision systemdesigners in building omplex and exhaustive knowledge bases.

However, KBS are not approved unanimously by the omputer vision resear h

ommunity. As Draper said [Draper etal., 1996 ℄, we must avoid to build ad

ho systems, based on lose world assumptions. Even if program supervision

te hniquesgaintobeusedforenabling ontrolandreuseofvisionalgorithms,they

stillfailtoadaptthemselvestounknownsituations. The ognitivevisionapproa h

has been re ently introdu ed to a hieve more robust, reusable, and adaptable

omputervisionsystems. Thisapproa haimsat endowing visionsystemsmostly

with learning and adaptability fa ilities. In this ontext, the segmentation task

hasseveral hallenges to be ta kled: starting from a generi solution (e.g., from

a default parametrization), algorithms an be dynami ally tuned by means of

learningte hniques to rea h thespe i goal dened bytheuser.

To fully understand thesegmentation problem, a rst andessential task isto

drawastate-of-the-artonexistingapproa hes. Thisistheroleofthenextse tion.

2.2 Segmentation Approa hes

Many segmentation methods are based on two basi properties of the pixels in

relationtotheir lo alneighborhood: dis ontinuityandsimilarity. Methodsbased

on some dis ontinuity property of the pixels are alled boundary-based

meth-ods,whereas methods based on some similarity propertyare alledregion-based

methods. Before it an be properly stated, some fundamental on epts have to

be spe ied.

2.2.1 Denition of Image Segmentation

Imagesegmentation an be formalized through its region-based denition as

fol-lows:

Denition 1 (Image region) An image region

R

isa non-empty subset of the image

I

,su h that

R ⊆ I, R 6= ∅

Aregiondoesnot needto betopologi ally onne ted. Theexisten eofan

unbro-ken path from one region element (i.e. a pixel) to another one inside the region

issu ient.

Denition 2 (Image partition) A partition of

I

is a set of

n

regions

R

i

, i =

1, . . . , n

su h that

S

n

i=1

R

i

= I

and

R

i

∩ R

j

= ∅, ∀i 6= j

(33)

Denition 3 (Image segmentation) For a ertain dened homogeneity

predi- ate

H

, a segmentation

S

of

I

is a partition of

I

whi h satises:

H(R

i

) = 1, ∀i

and

H(R

i

∩ R

|

) = ∅

for

R

i

and

R

j

adja ent,

i 6= j

.

The rst ondition states that ea h region has to be homogeneous withrespe t

tothepredi ate

H

. These ond onditionstatesthattwoadja entregions annot be merged into a singleregion thatsatisesthepredi ate

H

.

Thenatureofthepredi ate

H

isthekey-element ofthedenitionof segmenta-tion. It anbe basedonly onpixelvalues,orit an judgethehigh-levelrelevan e

of the partition. Sin e the solution is not unique, this makes the segmentation

an ill-posed problem in the sense of Hadamard. Then, to solve the problem, a

solution onsists in dening the segmentation, i.e. dening a predi ate

H

, for ea hlevelofabstra tion. Figure2.2depi tspossible segmentation resultsat ea h

levelofMarr's omputationalmodel. Attheimage-basedlevel,pixelsaregrouped

a ordingto their featurevalues (e.g., their gray value). The surfa e-basedlevel

dete ts surfa es, but not obje ts; for example the ba kground keeps its pat hes.

Theobje t-based leveldete tsa regionperobje t.

Figure2.2: IdealsegmentationresultsatdierentlevelsofMarr'svision omputationalmodel.

Fromlefttoright:originalimage,image-basedlevel,surfa e-basedlevel,andobje t-basedlevel.

2.2.2 Stati Image Segmentation

Several surveys of segmentation te hniques have been published. Three of

them[Paland Pal, 1993 ,Skarbekand Kos han,1994 ,Lu hese and Mitra,2001 ℄

reviewabout300publi ationsgivingafairoverviewofthestate-of-the-artin

seg-mentation at the image-based pro essing level. Pal and Pal [Paland Pal, 1993 ℄

mainlyevaluatealgorithmsforgray-valuedimagesandintrodu ethreeoftherst

attempts toexploit olor information.

Skarbek and Kos han [Skarbek andKos han,1994 ℄ on entrate their survey

on olor image segmentation. They lassify thealgorithms a ording to the

un-derlying on epts of the homogeneity predi ate

H

and identify four ategories: pixel-based, area-based, edge-based and physi s-based approa hes. Pixel-based

approa hes onsider a region as homogeneous, ifthe featuresof its elements

be-long to the same luster in the feature-spa e. Area-based te hniques dene a

(34)

edge-based group, denes regions as those sets of pixels delimited by

inhomo-geneitiesordis ontinuities. Thisisthe omplementary on epttoarea-based

seg-mentation. Physi s-based methods in lude knowledge about physi al properties

ofthe imageformation pro ess toimprove thedete tionofregions orresponding

toobje t surfa es. Physi s-basedmethods are ategorized inthe urrent workas

surfa e-basedte hniques. Theydonot belongto theimage-basedstage,sin e all

additional knowledge about physi al properties of obje t surfa es annot be

re-gardedaspartofalow-levelhomogeneitypredi ate,butratherasexternalhigher

levelinformation about theanalyzeds ene.

Lu hese and Mitra [Lu hese andMitra, 2001℄ also review ex lusively olor

segmentation approa hes and use a similar ategorization: feature spa e based,

image domainbased andphysi sbasedte hniques. The ombination of areaand

edge-based methods into one image domain lass makes more sense nowadays,

sin e manymodernapproa hes tryto satisfyboth on epts simultaneously.

2.2.2.1 Feature-Spa e Based Approa hes

Feature-spa e approa hes generally negle t spatial relationships between image

pixelsandanalyzeex lusivelythe ongurationoftheirfeaturevalues. Algorithms

inthis ategory delimit se tions inthe feature spa eand assign thesame region

labeltoallimagepixelsfallingintothesamese tion. Twoprin iplesare ommon.

Therstonendsse tionsdete tingpeaksinunidimensionalormultidimensional

feature histograms. The se ondone uses traditional lusteringalgorithms.

Histogramthresholding

Histori ally,histogramthresholding isoneoftherstusedte hniquefor

segment-ing images. Gray-level images histograms an be ommonly de omposed into

peaks and valleys whi h hara terize obje ts and ba kgrounds. A good survey

onthesete hniques anbefoundin[Sahoo etal.,1988 ℄. Earlymethods for olor

segmentation work with several one-dimensional histograms, whi h implies that

the orrelation between dierent dimensions is ignored. More re ent algorithms

work intwo or three dimensional olor spa es and are hara terized bydierent

te hniques to robustly dete t peaks and their orresponding boundaries in the

feature spa e. The hoi e of the olor representation often plays a major part.

An additional problemof this approa h is theusually required smoothingof the

feature spa e in order to keep the size of data stru tures tra table. Many

al-gorithms sear h for peaks by approximating the histograms with a mixture of

Gaussian,andfailifthisassumption doesnot hold(afa tthat,inrealimages, is

almostalwaysthe ase).

Clustering te hniques

Clusteringapproa hes an beinterpreted asunsupervised lassi ation methods.

(35)

algo-oftheoriginal lusteringmethodsisthatthenumberof lusters(

k

)mustbeknown apriori. Severalheuristi shavebeensuggestedto ompute

k

automati allybased onsome image statisti s. Awell-known lustering-basedsegmentation algorithm

isthemeanshift[Comani iu andMeer, 2002 ℄approa hwhi hintrodu esamethod

toautomati allydete tdierentbandwidthsfromthedataforea hse tionofthe

featurespa e. Themajor drawba kofthis on eptisits omputational ost

om-paredtosimple

k

-meansapproa hes. Thegeneralizationofthe

k

-meansalgorithm for olorimagesin ludingspatial onstraintsisintrodu edin[Chang etal.,1994℄.

Thisalgorithm onsidersthe segmentationasamaximumaposterioriprobability

estimationproblem. Thealgorithm startswithglobalestimatesandprogressively

adaptsthe luster enters to thelo al hara teristi sof ea h region.

2.2.2.2 Image-DomainBased Approa hes

Another wayto ope withtheimage-based segmentation problemis to ompare

the feature values of ea h pixel in the image-domain, i.e. pixels are ompared

within predened spatial neighborhoods. Two major groups of algorithms an

be identied: the rst one denesregions throughthefeature similaritybetween

theirelements(area-based approa hes). These ondone identies feature

dis on-tinuities as boundaries between homogeneous regions (edge-based approa hes).

Many modern segmentation strategies try to satisfy both on epts

simultane-ously[Munozetal.,2003 ℄.

Region Growing te hniques

Traditional area-based te hniques utilize one of two prin iples: region growing

or split-and-merge. Region growing methods assumetheexisten e of some

seed-points, to whi hadja ent pixels will be added if theyfulll a homogeneity

rite-rion. Anextensivereviewisdetailedin[Fan etal.,2005 ℄. Themainadvantageof

these methods isthe reation of spatially onne ted and ompa t regions, whi h

ontrast withthe usually noisyimagepartition obtained withpurefeature-based

segmentation approa hes. Theyarefrequently applied to separateone single

ho-mogeneousobje t(e.g.,ba kground)fromtherestoftheimage,butusingseveral

seeds positioned at dierent obje ts it is also possible to perform more

sophisti- ated segmentations. The required seed sele tion is a subtask of this approa h,

whi h an besolved by takingadvantageof lusteringmethods ormorphologi al

operations, amongothers.

Split-and-Merge te hniques

Split-and-merge algorithms pro eed to su essively divide an image into smaller

non-overlapping regions while some similarity riterion is not met. A

ommon data stru ture used to implement this pro edure is the quadtree

(36)

triangula-arealsoemployedasanalternativete hniquetotherigid re tilinearnatureofthe

quadtreestru ture. Theendresultofthesplittingisanover-segmentedimage. A

mergingpro edureisthenappliedtojoinneighboring regionsunderthesame

ho-mogeneitypredi atethatwasusedforsplitting. The omparisonbetweenadja ent

regions anusesimplestatisti sor anbebasedonmoreelaboratedmathemati al

models, like Markov Random Fields (MRF),whi h also permit merging regions

ofsimilar texture[Panjwani and Healey,1995℄.

Edge based te hniques

Edgesaredis ontinuitiesinthefeature hara teristi s(e.g.,intensity)ofadja ent

pixels. Themagnitude of the gradient of agray-valued image hasbeen typi ally

employed, sin e it is a relatively robust edgeness representation form. Its

ap-proximation for dis rete digital images has been analyzed in detail in the past.

Mostmethodsinvolvetheuseofwell-known onvolution kernels,liketheRoberts,

Robinson,Prewitt,Kirs h,orSobeloperators. Thedete tionofedgepixelsisjust

the rst stage of any edge-based segmentation approa h. Further pro essing is

ne essaryinordertoprovideavalidsegmentationasstatedbyDenition3. Sin e

standarddete torslikeCanny's[Canny, 1986 ℄orSUSAN[Smithand Brady,1997 ℄

usuallyleavesomegapsbetweenobje tboundaries,someme hanismsarerequired

to llthemappropriately. Re ently,a newgeneration ofedgedete torsbased on

theEarthMover'sDistan ehavebeenproposed[Ruzonand Tomasi, 2001 ℄. They

show a better performan e due to their apability to dete t jun tions and

or-ners. However, their omputational ost is very high ompared to traditional

te hniques. A lassied and omparative study of edgedete tion algorithms an

be found in[Shari etal.,2002 ℄.

Morphologi al watershed segmentations[Vin ent and Soille, 1991 ℄ an alsobe

ategorized as an edge-based approa h. They work on a topographi al edgeness

map,where theprobabilityofa pixeltobean edgeis modeledbyits altitude. A

ooding step beginswhi h lls thevalleyswith water. Thewatershedlinesare

dete tedwhenthewateroftwodierentvalleysen ounters. Theprin ipal

advan-tage of the watershed segmentation s heme over other edge based te hniques is

thatitgenerates losedboundaries. Theregionsdened bythe losedboundaries

represent an over-segmentation of the image, sin e the algorithm is sensitive to

noise. If the gradientsare omputed at su essively higher s ales, thenumber of

lo al minima (i.e. ood basins) in the gradient magnitude image will de rease.

Theavailablete hniquesworkongray-valuedimagesobtainedusuallyasthe

gra-dient ofthe intensity.

A tive ontourmodels,alsoknownassnakes,isanotherfamilyofedge-based

algorithms [Kassetal.,1988 , Ronfard, 1994 ℄. An interesting and powerful

prop-ertyofana tive ontourmodelisitsabilitytondsubje tive ontoursand

inter-polate a ross gapsinedge hains. An a tive ontour modelrepresents an obje t

boundary or some other salient one dimensional image feature as a parametri

(37)

energy minimization problem with the intention that it yields a lo al minimum

of the asso iated energy fun tional. The original model in orporates two

inter-nal energy terms related to ontour smoothness and regularity. A tive ontour

models arewell-adapted forsegmenting obje ts innoisyimages but they require

a priori knowledge of the obje t shapes. Good illustrations of su h algorithms

arefrequentlyfoundinmedi alappli ationssu hasin[Jehan-Besson et al.,2004 ℄

and inopti al ow segmentation asin[Herbulot etal., 2006 ℄.

Hybrid Approa hes

Allpreviousmethods haveintrinsi drawba ksthat anbepartially ompensated

by ombining dierent te hniques. For instan e, lustering methods dete t

ho-mogeneous regionsin the feature spa e. However, sin e spatial relationships are

ignored, the region boundaries in the image-domain are highly irregular. In

re- entyears,numerouste hniquesfor integratingregionandboundaryinformation

have been proposed. A detailedreview of te hniques to ombine area-based and

edge-based approa hes an be found in [Munoz etal.,2003 ℄. One of the main

features of the hybrid approa hes is the timing of the integration: embedded in

the region dete tion, or after both pro esses are ompleted. The most ommon

way to perform integration in the embedded strategy onsists of in orporating

edge. Regiongrowingandsplit-and-mergearethetypi alregion-based

segmenta-tion algorithms [Zugajand Lattuati, 1998 ℄. Post-pro essing integration is based

on fusing results from single segmentation methods, attempting to ombine the

mapof regions(generally withthi kand ina urate boundaries)and themap of

edge outputs (generally with ne and sharp lines, but dislo ated) with the aim

of providing an a urate and meaningful segmentation. Another example of

hy-bridapproa h an be foundin[Chen and Wang,2004 ℄ whi h ombines olor and

texture-based segmentationsusing border renement.

2.2.2.3 Obje t Based Approa hes

Whilethe image-basedapproa h hasbeen dealtwitha relativesu ess,the

hal-lengeofaggregatingpixelsinto segmentsrepresentingmeaningfulpartsofobje ts

is mu h di ult. In fa t, segmentation is also losely related to the problem

of extra ting obje t from images. One of the oldestapproa hes to obje t-based

segmentation is template mat hing. The idea of template mat hing is to reate

a model of an obje t of interest (the template, or kernel) and then to sear h

over the image of interest for obje ts that mat h the template. The simplest

methods, based on orrelation or omparable mat hing operators, an only

de-terminethepositionofthetemplate. Themain di ultyinthis te hnique stems

from the large variability inthe shape and appearan e of obje ts within a given

lass. Consequently, the segmentation may not a urately delineate theobje t's

boundary.

(38)

The proposed approa h relies on learnt pat hes from training image samples

and a bottom-up pro ess used to derive a segmentation graph. Partial templates

are used to dete t obje t parts of a given lass (horses in the experiment) by

mat hing to the segmentation graph, even though the global appearan e of

the obje ts in the test images slightly diers from the learnt material. The

methods be ome more omplex and time onsuming if further parameters like

orientation or s ale need to be estimated. Sin ethe numberof obje ts andtheir

orientation inan image areunknowninthe urrent appli ation, thesear h spa e

for mat hing approa hesbe omesintra table.

In[S hnitman etal., 2006 ℄,anapproa hindu ingsemanti segmentation from

examples is des ribed. They argue that determining whether an entity belongs

toa parti ularsemanti partiseasierdone at thefragment levelthan ona

pixel-by-pixel basis. Starting from an example, pat h sets representing a olle tion

of homogeneous fragments are built. Then, a test image is rst over-segmented

and the labelling of ea h fragment is indu ed from the minimization of a global

labelling ost. They apply thegraph- uts multi-label optimization te hnique for

ndingthe globally optimal labelling. Sin ethis example-based approa h allows

to useanon-parametri modelofthe obje t'sparts, theyassumethat thelearnt

fragment-label pairs are representative of thepossible image variations, i.e.

illu-mination, resolution, and s ale hara teristi s. Finally, they on ede that their

approa h is onlyappropriate for images depi ting loselysimilar s enes. A

simi-lar approa h is des ribed in[Heet al.,2006℄where a probabilisti model assigns

labelstoea hregion ofan over-segmentedimagebasedonlo al,global,and

pair-wisefeatures. Asdepi ted by theauthor, their modela ura y islimitedbythe

relian eand the amount on trainingdata.

2.2.2.4 Summary

Inthis se tion, wehave presented adida ti survey onimage segmentation

te h-niques. Thegoal of this review wasto familiarize thereader with lassi al

te h-niquesrather than to givean extendedreviewof all existingalgorithms. To give

anoverallview,asummaryisdrawnupinTable2.1,inspiredbytheonepresented

in[AlvaradoMoya,2004a ℄.

Finally,we an on ludethisstudybymakingsomeimportantremarks, losely

akin tothe on lusions of[Skarbek andKos han, 1994 ℄intheir survey:

1. General purpose algorithms arenot robust and usually not algorithmi ally

e ient.

2. Allte hniquesaredependentonparameters, onstantsandthresholdswhi h

are usually xed on the basis of few experiments. Tuning and adapting

parameters israrely performed.

3. Asarule,authors ignore omparing their novelideas withexisting ones.

(39)

meth-F

eature

Spa e

+

Dete tionofhomogeneityin aglobal ontext.

−

Spatialrelationshipbetweenpixels isignored.

Histogram

+

Multiple1Dhistogrammethodsare omputationallyinexpensive

−

Noisesensitive.

−

1Dapproa hesignore orrelationbetweendierent featurespa e dimensions.

−

Modelsusedtothistograms(e.g.,multi-gaussians)usuallydonot orre tlymat htherealdistributions.

−

Limitedtobinarysegmentationproblems.

Clustering

+

Simultaneous onsiderationofalldimensionsofthefeature spa e.

+

Suitablefor olorandtexturesegmentation.

+

Relativelye ientalgorithmsexist.

−

Sizeornumberof lustersmustbeknownapriori.

Image

Domain

+

Produ esmoother andmorea urateregionboundariesthanfeature spa e-basedapproa hes.

−

Edgedete torsfallsintotheedgelinkingproblem.

Area-based

+

Creationof onne ted ompa tregions.

+

Fastalgorithmsavailable.

−

Key-parameterstuning anbeatri kytask.

Regiongrowing

+

Suitableforsegmentationof omplexobje tshaving homogeneousba kground.

−

Priorinformationonoptimalnumberandpositionofseeds maybeneeded.

−

Resultdependson orderinwhi hpixelsareexamined

Split& Merge

+

Fastandexibleimplementation.

−

Traditionaltessellationme hanismsprodu etoo oarse spatialquantizationartifa ts.

Edge-based

Edgedete tors

+

A uratelo aldis ontinuitydete tion

−

Sensitivetonoiseandparameter hanges.

Watershed

+

Dete tionof losed ontours.

−

Imageisoftenover-segmented.

Snakes

+

Robusttonoise.

−

Di ultautomati initializationofthe ontour.

Hyb

rid

+

Combinationofseveralmethods anbeappropriatelyadaptedtotheneedsof ea happli ation.

−

High omputational ost.

Obje t-based

+

Combinetop-downandbottom-upapproa hestoa hievesemanti ally meaningfulsegmentation.

−

Robustnessrelieson thelearning apa itiesfromexamplesorpat hes.

−

Appli abilityis restri tedbyappearan e onstraintsonobje tssu hshapeand s ale.