Contribution à la vérification multi-modale de l'identité en utilisant la fusion de décisions

(1)

HAL Id: tel-00005685

https://pastel.archives-ouvertes.fr/tel-00005685

Submitted on 5 Apr 2004

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of

sci-entific research documents, whether they are

pub-lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diffusion de documents

scientifiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

en utilisant la fusion de décisions

Patrick Verlinde

To cite this version:

Patrick Verlinde. Contribution à la vérification multi-modale de l’identité en utilisant la fusion de

décisions. Interface homme-machine [cs.HC]. Télécom ParisTech, 1999. Français. �tel-00005685�

(2)

46,rue Barrault

Paris75634 Cedex 13

France

A CONTRIBUTION TO MULTI-MODAL

IDENTITY VERIFICATION USING DECISION

FUSION

by

Patrick Verlinde

Dissertation submittedto obtainthedegree of

Docteur de l'Ecole Nationale Superieure des

Telecommunications

Specialite: Signal et Images

Compositionofthe thesiscommittee:

Jean-PaulHaton (LORIA)- President

Gerard Chollet(ENST) - Director

MarcAcheroy(RMA) - Reporter

IsabelleBloch (ENST)- Reporter

PaulDelogne (UCL)- Examiner

JosefKittler(UOS) -Examiner

(3)

(4)

my twin daughters

(5)

(6)

In the rst place I wish to thank my thesis director dr. Gerard Chollet

from CNRS/URA 820 (FR), forhis drivingforce,for his critical and very

usefuladvises, forhavinginvolvedmyresearchinevery suitableprojecthe

couldnd, andforthe huge amountsof informationheprovidedmewith.

Ialsowouldliketo thankprof. dr. ir. Jean-PaulHatonfromLORIA(FR)

forhis usefuladvises he gave me and forhavinghonored me byaccepting

to be thepresident ofmythesiscommittee.

Special thanks go to prof. dr. ir. Marc Acheroy, head of the electrical

engineeringdepartment of theRMA and director ofthe Signaland Image

Centre (SIC) forbelieving inme, forhavingsupported and motivated me

all thetime, for hiscontinuous ow of advises, and, lastbutnotleast, for

havingaccepted to be a reporterforthisthesis.

I also want to express my sincere gratitude towards prof. dr. ir. Isabelle

BlochfromENST/TSI(FR)notonlyforhelpingmeinaveryfriendlyway

to eectively control my \uncertainties", but also for having accepted to

be a reporterformywork.

Iamveryproudtohaveprof. dr. ir. JosefKittlerfromUniversityofSurrey

(UK)inmythesiscommittee, andI wouldliketo thankhimespecially for

hishelping commentswith respectto thestatistical aspects ofthisstudy.

Thank you prof. dr. ir. Paul Delogne from UCL/TELE (BE), for your

many\personalized"commentswhichI'msurehaveimprovedthecontents

as well as the readability of this work, and for having accepted to be a

(7)

tary Academy (RMA), forhaving granted me the time and the means to

nish this thesis, and for having accepted to be a member of my thesis

committee.

Thankstoir. CharlesBeumierfromRMA/SIC(BE)forhishelpintheeld

of machine visionin general and in the framework of the M2VTS project

in particular, to dr. ir. Stephane Pigeon from RMA/SIC (BE) for his

criticalremarks inthegeneraleld offusionand forhishelpin thedesign

ofthe experimental protocols and forwritingthe software forthe NIST99

evaluationand fortheevaluation of themixtureof experts.

Thanksto dr. ir. Jan

Cernock y fromVUT (CZ)forhis hospitalityduring

the NIST99 evaluations and for his help in generating the data for the

experimentsinvolvingthemixture ofexperts.

Thanks to dr. ir. GilbertMa^treformerly from IDIAP/visiongroup(CH)

for his very positive help in the design of experimental protocols and in

theeldof machinevision, to dr. ir. EddyMayoraz from IDIAP/machine

learninggroup(CH)forhis friendlyguidance andfor hishelp in

formaliz-ing theparadigm of the multi-linearclassier, to ir. Frederic Gobry from

IDIAP/machine learning group (CH) for his help in writing the code for

themulti-linearclassier, and to dr. ir. Dominique(Doms) Genoudfrom

IDIAP/speech group(CH)forhis helpintheeld of speakerverication.

Thanks also to ir. Guillaume (Guig) Gravier from ENST/TSI (FR) for

his friendship and for all his help in the elds of speaker verication and

informationtechnology.

I also would like to thank Bruno, Chris, Dirk, Florence, Idrissa, Lionel,

Marc, Michel, Monica, Nada, Pascal, Vincianne, Wim, Xavier, Yann,

Youssef, and all my other colleagues from RMA/ELTE (BE) and from

ENST/TSI (FR) for the wonderful working atmosphere they all have

contributedto.

Iam gratefulfor thecontributionsof thefollowing students: Benny Tops,

and DangVan Thuong.

Finally I would like to thank Renate for her love, support, and so much

(8)

1 Introduction 1

1.1 Introduction. . . 1

1.2 Subjectof thethesis . . . 1

1.3 Identitydeterminationconcepts . . . 4

1.4 Structure ofthe thesis . . . 4

1.5 Originalcontributionsof thisthesis . . . 7

I General issues related to automatic biometric multi-modal identity verication systems 9 2 Biometric verication systems 11 2.1 Introduction. . . 11

2.2 Requirementsforbiometrics . . . 11

2.3 Classicationof biometrics . . . 12

2.4 Generalstructureof amono-modal biometricsystem . . . . 14

2.5 Theneed formulti-modalbiometricsystems . . . 15

2.6 Characterization ofa vericationsystem . . . 17

2.7 State oftheart . . . 21

2.7.1 Generaloverview . . . 21

2.7.2 Results obtainedon theM2VTS database . . . 24

2.8 Comments . . . 24

3 Experimental setup 27 3.1 Introduction. . . 27

3.2 TheM2VTS audio-visual persondatabase . . . 27

3.3 Experimental protocol . . . 28

3.3.1 Generalissues. . . 28

(9)

3.4.2 Performances . . . 36

3.4.3 Statistical analysisof thedierent experts . . . 36

3.5 Comments . . . 49

4 Data fusion concepts 51 4.1 Introduction. . . 51

4.2 Taxonomyofdata fusionlevels . . . 51

4.3 Decisionfusionarchitectures. . . 53

4.4 Parallel decisionfusionas aparticular classicationproblem 54 4.5 Comments . . . 55

II Combiningthe dierent experts inautomatic biomet-ric multi-modal identity verication systems 57 5 Introduction to part two 59 5.1 Goal . . . 59

5.2 Parametric ornon-parametric methods? . . . 59

5.3 Comments . . . 61

6 Parametric methods 62 6.1 Introduction. . . 62

6.2 A simpleclassier: themulti-linearclassier . . . 63

6.2.1 Decision fusionasa particular classicationproblem 63 6.2.2 Principle . . . 64

6.2.3 Training . . . 65

6.2.4 Testing . . . 67

6.2.5 Results . . . 67

6.2.6 Partial conclusionsand future work . . . 70

6.3 A statisticalframework fordecisionfusion . . . 71

6.3.1 Bayesian decisiontheory . . . 71

6.3.2 Neyman-Pearson theory . . . 74

6.3.3 Application of Bayesian decision theory to decision fusion . . . 76

6.3.4 The naive Bayesclassier . . . 77

6.3.5 Applicationsof thenaive Bayes classier. . . 78

6.3.6 The issueof thea prioriprobabilities. . . 86

(10)

6.4.2 Results . . . 93 6.4.3 Mixture of Experts . . . 94 6.5 Comments . . . 94 7 Non-parametric methods 97 7.1 Introduction. . . 97 7.2 Voting techniques. . . 97 7.3 A classicalk-NN classier . . . 99

7.4 A k-NNclassier usingdistanceweighting . . . 100

7.5 A k-NNclassier usingvector quantization . . . 101

7.6 A decisiontreebased classier . . . 103

7.7 Comments . . . 106

8 Comparing the dierent methods 107 8.1 Introduction. . . 107

8.2 Parametric versus non-parametricmethods . . . 107

8.3 Experimental comparisonof classiers . . . 108

8.3.1 Testresults . . . 108 8.3.2 Validationresults . . . 110 8.3.3 Statistical signicance . . . 111 8.4 Visualinterpretations . . . 113 8.5 Comments . . . 113 9 Multi-level strategy 115 9.1 Introduction. . . 115

9.2 A multi-leveldecisionfusionstrategy . . . 115

9.3 Mono-modalmono-expert fusion . . . 116

9.3.1 Introduction . . . 116

9.3.2 Results . . . 117

9.4 Mono-modalmulti-expert fusion . . . 118

9.4.1 Introduction . . . 118

9.4.2 Methods . . . 119

9.4.3 Results . . . 120

9.4.4 Combiningthe outputsofsegmentalvocal experts . 120 9.4.5 Combiningthe outputsofglobal vocal experts . . . 123

9.5 Multi-modal multi-expert fusion . . . 126

(11)

10.2 Future work . . . 133

Bibliographie 135

A A monotone multi-linear classier 149

B The iterative goal function 163

C The global goal function 167

D Proof of equivalence 169

E Expression of the conditional probabilities 173

F Visual interpretations 177

(12)

ATM Automatic Teller Machine

AVBPA Audio-and Video-based BiometricPerson Authentication

BDT BinaryDecision Tree

DET Detection ErrorTradeo

EER EqualErrorRate (FAR=FRR)

FA False Acceptance

FAR False Acceptance Rate

FE FrontalExpert

FR False Rejection

FRR False Rejection Rate

GMM Gaussian MixtureModel

HMM HiddenMarkov Model

k-NN k-Nearest Neighbor

LC LinearClassier

LDA LinearDiscriminantAnalysis

LR naive bayesclassier usingaLogistic Regressionmodel

M2VTS MultiModalVericationforTeleservices and Securityapplications

MAJ Majorityvoting

MAP MaximumA posteriori Probability

MCP MaximumConditionalProbability

ML MaximumLikelihood

MLP Multi-Layer Perceptron

NBG Naive Bayes classier usingGaussian distributions

NIST National InstituteforStandardsand Technology (USA)

NN Nearest Neighbor

NSA National SecurityAgency(USA)

PE ProleExpert

PIN Personal IdenticationNumber

PLC Piece-wise LinearClassier

QC Quadratic Classier

ROC ReceiverOperatingCharacteristic

TD Temporal Decomposition

TER TotalErrorRate

VE VocalExpert

(13)

(14)

Introduction

1.1 Introduction

The rst chapter starts by introducingthe subject of thethesis. Toavoid

confusion,thisintroductionisfollowedbyanexplanationofthedierences

and/orsimilaritiesbetweenterms thatare oftenencountered inthe

litera-ture related to the eld of automatic identity \determination", which are

authentication, recognition, identication, and verication. These

deni-tionsare followed bya presentation of thestructureof the thesis and this

chapter isendedbyclearlystating theoriginalcontributionsofthisthesis.

1.2 Subject of the thesis

Thisthesis deals withthe automaticverication of the identity of a

coop-erativepersonundertest,bycombiningtheresultsofanalysesofhisorher

face, proleand voice. This specic application whichis used throughout

thiswork,hasbeendenedintheframeworkof theM2VTS(Multi-Modal

Vericationfor Tele-services and Securityapplications) project of the

Eu-ropean Union ACTS program [1 ]. The exact denitionof verication and

thedierences withother, oftenencountered terms,such as identication,

authenticationor recognition,willbe explainedhereafter. Thekey idea in

thisthesis is to analyze thepossibilitiesof usingdata fusion techniques to

combinetheresultsobtainedbydierentbiometric(face,proleandvoice)

expertsthateachhaveanalyzedtheidentityclaimofthepersonundertest.

Inthis work we are explicitlyavoidingissuessuch as ethics,responsibility

orprivacy. The interestedreadercanndan introductiontothese delicate

(15)

The automaticverication ofa person ismore and more becoming an

im-portant tool in several applications such ascontrolled access to restricted

(physicalandvirtual)environments. Justthinkaboutsecuretele-shopping,

accessingthesaferoomofyourbank,tele-banking,accessingtheservicesof

interactive dialogue systems [175], orwithdrawingmoney from automatic

tellermachines(ATM).

Anumberofdierentreadilyavailabletechniques,suchaspasswords,

mag-netic stripe cards and Personal Identication Numbers (PIN) are already

widelyused inthiscontext, buttheonlything they reallyverify is,inthe

bestcase,acombinationofacertainpossession(forinstancethepossession

of the correct magnetic stripe card) and of a certain knowledge, through

thecorrect restitution of acharacter and/or digit combination. As is well

known,these intrinsicallysimple(access)controlmechanismscan very

eas-ilyleadtoabuses, inducedforinstancebythelossortheftofthemagnetic

stripe card and the corresponding PIN. Thereforea new kindof methods

isemerging, basedon socalledbiometric characteristicsormeasures, such

asvoice,face(includingprole),eye(iris-pattern,retina-scan), ngerprint,

palm-print,hand-shape orsomeother (preferably)uniqueand measurable

physiological or behavioral characteristic information of the person to be

veried.

Inthiswork,abiometricmeasurewillalsobecalledamodality. Thismeans

thatan identityvericationsystem which uses several biometricmeasures

or modalities (for instance a visual and a vocal biometric modality) is a

multi-modal identityverication system.

Anothertermwhichwillbeusedveryofteninthisworkisanexpert. Inthis

thesis,an expert iseach algorithm ormethod usingcharacteristic features

comingfrom aparticular modalityto verifythe identityofa personunder

test. In thissense, one single biometric measure or modality can lead to

theuseofmore thanone expert(thevisualmodalitycan forinstancelead

to theuseof two experts: aprole and afrontalfaceexpert). This means

thata mono-modal identityverication system can stillbe a multi-expert

system.

Biometric measures in general, and non-invasive/user-friendly (vocal,

vi-sual) biometric measures in particular, are very attractive because they

have the huge advantage that one can not lose or forget them, and they

arereallypersonal(one cannotpass them to someone else),since theyare

based on a physical appearance measure. We can start using these

(16)

plicationsuse a classical technique (password, or magnetic stripecard) to

claima certain identitywhichisthen veriedusingone ormore biometric

measures.

Ifone uses onlya single (user-friendly) biometricmeasure, theresults

ob-tainedmaybe foundto be notgood enough. This is dueto thefact that

these user-friendlybiometricmeasures tend to vary with time forone and

thesamepersonandtomakeitevenworse,theimportanceofthisvariation

is itself very variable from one person to another. This especially is true

for the vocal (speech) modality, which shows an important intra-speaker

variability. One possible solution to try to cope with the problem of this

intra-person variabilityisto usemorethan one biometricmeasure. In this

newmulti-modalcontext, itisthusbecomingimportanttobeableto

com-bine (or fuse) the outcomes of dierent modalities or experts. There is

currentlyasignicantinternationalinterestinthistopic. Theorganization

of already two international conferences on the specic subject of

Audio-and Video-based Biometric Person Authentication (AVBPA) is probably

thebestproof ofthis[16 , 38 ].

Combining theoutcomes of dierent experts can be doneby using

classi-caldatafusiontechniques[2 , 46 ,70 ,71 ,101,170 ,172 ,181],butthemajor

drawbackofthebulkofallthesemethodsistheirratherhighdegreeof

com-plexity,which isexpressed- amongstelse- bythe factthatthese methods

tendto incorporate a lot of parameters that have to be estimated. If this

estimation is not done using enough training data (i.e. if the estimation

isnot doneproperly), thisplacesa seriousconstraint on the abilityofthe

system to correctly generalize [9, 121]. But actually a major diÆculty of

thisparticular estimation problemis the scarcity of multi-modal training

data. Indeed, to keep the automatic verication system user-friendly,the

enrollment of a (new) client should not take too much time, and as a

di-rect consequencefrom this,the amount of client trainingdatatends to be

limited. To try to deal with this lack of training data, one possibility is

to develop simple classiers (i.e. forinstance classiersthat use onlyfew

parameters),so thattheirparameters can be estimatedusingonlylimited

amountsoftrainingdata. Thepricetobepaidwhenusingsimplemethods

(17)

1.3 Identity determination concepts

Automaticsystemsforrecognizingapersonorforauthenticatinghisidentity

(whichisequivalent),allhaveadatabaseofN so-calledauthorizedpersons

orclients. Authentication orrecognition isthegeneral term, which covers

on one handidentication and on the other handverication. These two

processesarequitedierent asthefollowingmore detaileddescriptionwill

show.

Identicationinthestrictsenseofthewordsupposesaclosedworldcontext.

This means that we are sure that the person under test is a client. The

onlythingweneedto ndoutiswhichclientofthedatabaseofauthorized

persons matches \the best" the person under test. There is no criterion

(such asathresholdforinstance)to denehowgoodthe match hasto be,

tobeacceptable. Identicationisthusa 1-out-of-N matching process,and

itisclear thattheperformances decreasewithN.

Vericationinthestrictsenseofthewordoperatesinanopenworldcontext.

Thismeansthatwearenolongersurethatthepersonundertestisaclient.

Inthiscase, thepersonundertestclaimsacertainidentity,whichofcourse

hastobetheidentityofanauthorizedperson. Ifthepersonundertestisno

member of the database of authorizedpersons, he is a so-called impostor.

Verication is thus a 1-out-of-1 matching process, where it is important

that the mismatch between the reference model from the database and

themeasuredcharacteristicsoftheperson underteststaysbelowacertain

threshold. The vericationperformancesare independentof N.

Sometimespeople do refer to identication in the largesense of the word

asthe(sequential)processofidenticationfollowedbyavericationofthe

identied identity. Sometimes thisdoubleprocess isalso called

identica-tion in an openworld context.

Inthisthesis we willonlyconsiderverication problems. Thismeansthat

thedecisionproblemwe areconfrontedwithis atypical binaryhypothesis

test. Indeed,thedecisionwehavetotakeiseithertoacceptortorejectthe

identityclaim ofthe person undertest.

1.4 Structure of the thesis

Thisthesishasbeendividedintotwoparts. Intherstpart,generalissues

related to automatic multi-modal identity verication systems, such as a

(18)

auto-set-up (including the presentation and the analysis of our experts) and a

generaloverviewofdatafusionrelatedconcepts, aretreated. Inthesecond

part, the fusion of the dierent experts in a multi-modal identity

veri-cation system is implemented on the decision level, usingparametric and

non-parametricmethods. Thesedierentmethodsarethencomparedwith

eachother and a structuredhierarchicalapproach forgraduallyupgrading

theperformancesof automaticbiometricverication systemsis presented.

Attheendofthesetwoparts,weareconcludingthisthesisbysummarizing

ourcontributions to theeld and by looking at possibleextensions of the

work done.

To be more specic,the rst part isorganized asfollows. In chapter 2 we

dealwithbiometricmodalitiesand westartbylistingsome theoreticaland

practicalrequirementsthat biometricsin generalshould conformto. This

isfollowedbyasectionwhichpresentsatentativeclassicationofthemost

commonlyfoundbiometricsintotwoclasses: theso-calledphysiologicaland

behavioral biometrics. In thefollowingsection the general structureof an

automaticmono-modalbiometricvericationsystemis presented,whilein

the next section some general arguments for usingmulti-modal biometric

vericationsystemsaredeveloped. Thefollowingsectionismeantto

intro-duceanddenetheclassicalperformancecharacteristicsusedintheeldof

automaticidentityverication, and the nalsection is giving an overview

of the state of the art in multi-modal biometric identity verication

sys-tems. Chapter 3 gives detailsaboutthe experimentalset-up. It starts by

presentingtheM2VTSdatabases usedinthiswork. Afterthis,the

experi-mental protocol isdescribed. Finally,thethree dierent biometricexperts

we have been usingthroughout thiswork are brie y introducedand their

individualperformances are highlightedand statisticallyanalyzed.

Chap-ter4introducessomeelementarydatafusionconceptssuchasthedierent

datafusionlevelsand architectures,and shows howitis possible, by

mak-ing some well-funded choices, to transform a general data fusion problem

into a particularclassication problem.

Thesecond partofthisworkdealsmore particularlywiththeparallel

com-bination or fusion of the partial (soft) decisions of the dierent experts.

Chapter 5explains whywe have chosen to experiment with parametric as

well as with non-parametric methods. Chapter 6 deals with parametric

techniques,buttoshowtheusefulnessofthese parametricmethodsrstof

all a trivial but original method is presented: the monotone multi-linear

(19)

mation with respect to the probability density functions of the dierent

populations is thrown away. Thereforein a fairly early stage of this work

ithasbeendecidedto stopdevelopingthissimplemethod andto fallback

insteadthe lessoriginal,butmore fundamental statistical decisiontheory,

by using so-called parametric techniques. In thisparametric class,

classi-ersbasedonthegeneralBayesiandecisiontheory(MaximumA-posteriori

Probability and Maximum Likelihood) and on a simplied version of it

(the Naive Bayesian classier, which has beenapplied in the case of

sim-ple Gaussians and in the case of a logistic regression model), have been

studied. Furthermore experiments have also been done usingLinear and

Quadraticclassiers. Neuralnetworksformaspecialcaseoftheparametric

family,since thenumberof parameters to be estimated can be very large.

Thereforeneuralnetworksaresometimesclassiedassemi-parametric

clas-siers. Stillwe will present neuralnetworks inthe chapter on parametric

techniques, by means of its most popularrepresentative: theMulti-Layer

perceptron. Chapter 7 deals with non-parametric techniques. This

chap-terstartsbypresentingavery simplefamilyofnon-parametrictechniques.

Thesevotingtechniquesaresometimesreferredtoask-out-of-nvoting

tech-niques,where k relates to thenumberof experts that have to decidethat

thepersonunder test isa client, beforetheglobal voting classieraccepts

thepersonundertestasaclient. Afterthevotingmethods,anothersimple

but very populartechnique,the k Nearest Neighbor(k-NN) technique, is

presented with a number of variants. These variants include a distance

weighted and a vector quantized version of the classical k-NN rule. This

chapterendsbypresentingthecategoryof(binary)decisiontrees,bymeans

of an implementation of the C4.5 algorithm, which is probably the most

popularmethodinits kind. Chapter 8dealswiththecomparisonbetween

thedierent parametric and non-parametric methodsthat have been

pre-sented in the second part of the thesis. Chapter 9 presents a multi-level

decisionfusionstrategythatallowsto graduallyimprove theperformances

of an automatic biometricidentity verication system, whilelimiting the

initialinvestments.

Chapter10nallyconcludesthisthesis,formulatessome recommendations

for developing automatic multi-modal biometric identity verication

(20)

1.5 Original contributions of this thesis

Theoriginalcontributionsof thisthesisare thefollowingones:

1. theformulation(intheframeworkofamulti-modalbiometricidentity

verication system) of the fusion of the partial(soft) decisions of d

experts in parallel as a particular classication problem in the

d-dimensionalspace [179 ];

2. thesystematicanddetailedstatisticalanalysisofthedierentexperts

thathave beenused;

3. thedevelopmentofasimpledecisionfusionmethod,basedona

mono-tone multi-linearclassier [179 , 180 ];

4. the analysis of the applicability, the characteristics and the

perfor-manceofthelogisticregressionmethodinaBayesianframework[177 ];

5. thedevelopment of a Vector Quantizationversion of theclassical

k-NearestNeighboralgorithm[173 ];

6. thesystematiccomparisonofalargenumberofparametricaswellas

non-parametric techniques to solve the particular classication

pro-blem[174];

7. the introduction of either the non-parametric Cochran's Q test for

binary responses, or the non-parametric Page test forordered

alter-natives, to measure the statistical signicance of the dierences in

performance of several (i.e. more than two) fusion modules at the

same time;

8. theformulationof amulti-levelfusionstrategywhichallowsto

grad-uallyimprove theperformances of anautomatic (biometric)identity

vericationsystem [176 , 178 ];

9. theformulationofthemixtureofexperts paradigmintheframework

of mono-modal multi-expert data fusion, appliedto a segmental

ap-proach to text-independent speakerverication[171];

10. the introduction of the use of multi-modal identity verication in

(21)

(22)

General issues related to

automatic biometric

multi-modal identity

(23)

(24)

Biometric verication

systems

2.1 Introduction

This chapter starts by deningtheideal theoretical and practical

require-ments for any biometric. This is followed by a section which presents a

tentative classication (according to [120]) of the most commonly found

biometricsinto two classes: theso-calledphysiologicaland behavioral

bio-metrics. In the following section the general structure of an automatic

mono-modal biometric verication system is presented, whilein the next

sectionsomegeneralargumentsforusingmulti-modalbiometricverication

systemsaredeveloped. The followingsectionpresentsthenthe main

char-acteristicsofidentityvericationsystems. Inthenalsection,anoverview

ofthestateoftheartofmulti-modalbiometricpersonvericationsystems

isgiven.

2.2 Requirements for biometrics

Automaticbiometricsystemshaveto identifyanindividualortoverifyhis

orheridentity

1

usingmeasurementsofthe(living)humanbody. According

to [88 , 89 ], in theory any human characteristic can be used to make an

identity verication, as long as it satises the following desirable (ideal)

requirements:

1

Asalready mentionedin chapter 1, wewill consider inthis workonlyverication

(25)

universality thismeansthatevery person shouldhave thecharacteristic;

uniqueness thisindicatesthatnotwopersonsshouldbethesameinterms

ofthecharacteristic;

permanence thismeansthatthecharacteristic doesnotvarywith time;

collectability thisindicatesthatthecharacteristiccanbemeasured

quan-titatively.

Inpractice, thereare some otherimportant requirements:

performance this species not onlythe achievable verication accuracy,

butalso theresource requirementsto achieve anacceptable

verica-tionaccuracy;

robustness this refers to the in uence of the working or environmental

factors (channel, noise, distortions, ::: ) that aect the verication

accuracy;

acceptability this indicates to what extent people are willing to accept

thebiometricvericationsystem;

circumvention thisrefers to how easy it is to fool the systemby

fraud-ulent techniques (make sure that the individual owns the data, and

thatheisnottransformingit; thiscould alsoincludeaso-called

live-linesstest).

As mentioned before, these requirements should be regarded as ideal. In

otherwords,thebetter a biometricsatisesthese requirements, thebetter

it will perform. In practice however, there is no single biometric which

fulllsallthese idealrequirementsperfectly. Thisobservationis oneofthe

main reasons why combining several biometric modalities in multi-modal

systemsis gainingeld.

2.3 Classication of biometrics

Arangeofmono-modalbiometricsystemsisindevelopmentoronthe

mar-ket,becausenoone biometricmeetsalltheneeds.Thetradeos in

develop-ingthesesystemsinvolve cost,reliability,discomfortinusingadevice,and

the amount of data needed. Fingerprints, for instance, have a long track

(26)

amount of datathat needsto bestored to describeangerprint(the

tem-plate) tended to be rather large. In contrast, the hardware for capturing

the voice is cheap (relying on low-cost microphones or on an already

ex-istingtelephone), butitvarieswhen emotionsand statesofhealthchange.

According to [120], biometricsencompasses bothphysiological and

behav-ioral characteristics. This is illustrated for a number of frequently used

biometricsinFigure 2.1.

Face

Fingerprint

Hand

Eye

Signature

Voice

Keystroke

Behavioral

Physiological

Automated biometrics

Figure 2.1: Classication of a number of biometrics in physiological and

behavioralcharacteristics.

A physiological characteristic is a relatively stable physical feature such

as a ngerprint [89 , 130, 153], hand geometry [190 ], palm-print [188 ],

in-frared facial and hand vein thermograms [141 ], iris pattern [184], retina

pattern[74 ],orfacialfeature[11 ,12 ,34 ,39 ,102 ,116,183 ,189 ]. Indeed,all

these characteristicsare basically unalterablewithout trauma to the

indi-vidual. Abehavioraltraitontheotherhand,hassome physiologicalbasis,

butalso re ects a person's psychological (emotional)condition. The most

commonbehavioraltraitused inautomatedbiometricvericationsystems

isthe humanvoice[3 , 10 ,20 , 22 , 31 ,35 , 36 , 52 ,60, 62 , 63 ,64 , 65 , 66 ,69,

72 ,73 , 76 , 81 , 80 ,105 , 111 , 112 , 131 , 132 , 133, 134, 151,154 , 160 ]. Other

behavioral traits are gait [126 ], keystroke dynamics [127], and (dynamic)

(27)

systems thatrelyon behavioralcharacteristicsshouldideally update their

enrolledreferencetemplate(s)onaregularbasis. Thiscouldbedoneeither

in an automatic manner, each time a reference is used successfully (i.e.

thesystemdecidesthat anaccess claimisan authenticclientclaim), orin

a supervisedmanner, by re-enrolling each client periodically. The former

method has theadvantage to be user-friendly,but has thedrawback that

one updatesthe client referenceswith atemplate from animpostor inthe

casethatthesystemcommitsaFalse Acceptance. Thelatterapproach has

theadvantageto updatetheclientreferencesalwayswithclienttemplates,

buthasthedrawbackthatitisnotveryuser-friendly,sincetheclientsneed

to doadditional trainingsessions.

The dierences betweenphysiological and behavioral methodsare

impor-tant. On one hand, the degree of intra-person variability is smaller in a

physiological thanin a behavioralcharacteristic. On theother hand,

ma-chinesthatmeasurephysiologicalcharacteristicstendtobelargerandmore

expensive,and mayseemmore threateningorinvasive to users(thisis for

instancethe caseforretinascanners). Becauseof these dierences,no one

biometricwillserveall needs.

2.4 General structure of a mono-modal biometric

system

Automated mono-modal biometric verication systems usually work

ac-cordingtothefollowingprinciples. Inatypical functionalsystemasensor,

adaptedtothespecicbiometric,generatesmeasurementdata. Fromthese

data,features thatmaybe usedforverication areextracted, usingimage

and/orsignalprocessingtechniques. Ingeneral, eachbiometrichasitsown

featureset. Patternmatchingtechniquescomparethefeaturescomingfrom

thepersonundertest withthose storedinthedatabaseunder theclaimed

identity, to providelikely matches. Last butnot least, decisiontheory

in-cludingstatistics providesa mechanism foransweringthequestion\Isthe

person undertest whoheorsheclaimstobe?" and forevaluating

biomet-rictechnology[77 , 78, 158]. Automaticmono-modalbiometricverication

systemsareusuallybuiltarrangingtwo mainmodulesinseries: (1)a

mod-ulewhichcomparesthemeasuredfeaturesfromthepersonundertestwith

areference client model andgivesa scalarnumber

2

asoutput,followed by

2

(28)

(2)adecisionmodule realizedbyathresholdingoperation. Thisthreshold

can bea functionoftheclaimed identity.

Thearchitectureofanautomaticmono-modalbiometricvericationsystem

isrepresentedingure 2.2.

model

selection

matching

feature

extraction

decision

forming

identification key

score

biometric signal

Figure2.2: Typicalmono-modalbiometricvericationsystemarchitecture.

2.5 The need for multi-modal biometric systems

There can be several reasons why one would prefer multi-modal

biomet-ric verication systems over mono-modal ones. Generally, the criterion

to choose between mono- and multi-modalsystems will be system

perfor-mance. The end-user typically desires a guarantee that the classication

errors (FAR and FRR) will be limited by maximal values that will

de-pendon the application. Andalthoughthere existmono-modal biometric

verication techniques that do oer very small classication errors, the

mainproblem with thiscategory of biometrics is that they are either too

expensivetobeusedinageneralpurposecontext(forinstanceidentity

ver-ication inthecaseof credit cardpaymentsoverthe Internetusinga PC)

or perceived by the user as too invasive. So very often one is confronted

with the obligation of using inexpensive hardware and non-invasive

user-friendlybiometrics. Two ofthe mostpopularbiometricsthat canconform

to these constraintsare faces and voices. However, thedrawback of using

inexpensive hardware (cheap black and white CCD-cameras and low-cost

microphones)toobtaintherawdatameasurementsofthesebiometrics,has

asadirectconsequencethatthemeasurementsgenerallywillbecorrupted

(29)

metricsarethatthevisualmodalityisrathersensitivetolightingconditions

andthatthevocalmodalitytendstovarywithtime(sinceitisabehavioral

biometric). Thismakestheuseofamono-modalbiometricverication

sys-tem based solely either on the facial or on the vocal modality a very big

challenge,especiallysinceitisusuallynotpossibleto updatethedatabase

referencesof theauthorizedusersona regular basis.

One possible solution to cope with this problem is to use not one single

mono-modal biometric system, but to use several of them in parallel to

form a so-called multi-modal biometric verication system. It can be felt

intuitivelythatsuchastrategycan behelpful,ifoneconsidersc

omplemen-tarybiometrics. Thiscomplementaritycan beachieved withrespecttothe

dierent requirements as they were presented in section 2.2. A possible

exampleof complementary biometrics withrespect to thepermanence

re-quirementwouldbethecombineduseofaphysiological(face: more

invari-ant in time) and a behavioral (voice: less invariant) biometric. The main

and very general idea of using multi-modalbiometric verication systems

insteadofmono-modalonesisthustheabilitytousemore(complementary)

informationwith respectto theperson under test intheformer approach,

than in the latter approach. In chapter 9, a more detailed step-by-step

analysisofa multi-levelstrategyto graduallyimprovetheperformancesof

anautomated biometricsystemis presented.

A possibleand straightforward way of buildinga multi-modalverication

systemfrom d such mono-modalsystems is to inputthe d scoresprovided

in parallel into a fusion module, which combines the d scores and passes

thefused score on to the decisionforming module. This module then has

to take the decision accept or reject, based on a threshold. Just as inthe

case of the mono-modal system, this threshold can be a function of the

claimedidentity. However, two alternativesremain for thefusionmodule:

a global (i.e. the same for all persons) or a personal (i.e. tailored to the

specic characteristics of each authorized person) approach. For the sake

of simplicityand because the personalapproach needs more trainingdata

(sinceinthiscase thefusionmoduleneedsto beoptimizedforeachclient),

we haveopted inthiswork fora globalfusionmodule.

Figure 2.3 shows the typical architecture of a general multi-expert

veri-cationsystem,includingthepossibleuseofpersonalizedfusionordecision

(30)

physical

identification

key

appearance

scores

fusion

fused

decision

forming

decision

expert i

expert j

Figure 2.3: Multi-expertarchitecture.

2.6 Characterization of a verication system

Inthiswork,we willconsiderthevericationof theidentityofaperson as

atypical two-class problem: eithertheperson istheone (inthiscasehe is

calleda client),orisnotthe one (inthatcase he iscalled animpostor) he

claimsto be. Thismeansthatwe aregoingtoworkwithabinaryfaccept,

rejectgdecisionscheme.

When dealing with binary hypothesis testing, it is trivial to understand

that the decision module can make two kinds of errors. Applied to this

problemof thevericationof theidentityof aperson,these two errorsare

called:

False Rejection (FR): i.e. when an actualclient isrejected as being

animpostor;

False Acceptance (FA): i.e. when an actualimpostor is acceptedas

being aclient.

Theperformancesofaspeakervericationsystemareusuallygiveninterms

of the global error ratescomputed duringtests: the False Rejection Rate

(FRR) and the False Acceptance Rate (FAR) [18 ]. These error rates are

denedasfollows:

FRR=

numberof FR

(31)

FAR=

numberof FA

numberofimpostor accesses

(2.2)

Aperfectidentityverication(FAR=0andFRR=0)isinpractice

unachiev-able. However, as shown by the study of binary hypothesis testing [167 ],

any of the two FAR, FRR can be reducedto an arbitrary small value by

changingthedecisionthreshold,withthedrawbackof increasingtheother

one. Auniquemeasure canbeobtainedbycombiningthesetwoerrorsinto

theTotal ErrorRate(TER) orits complimentary,theTotal SuccessRate

(TSR):

TER=

numberof FA +numberof FR

total numberofaccesses

(2.3)

TSR=1 TER (2.4)

However, care should be taken when using one of these two unique

mea-sures. Indeed, from the denition just given it follows directly that these

twouniquenumberscould beheavilybiasedbyoneoreithertypeoferrors

(FARorFRR),dependingsolelyonthenumberofaccesses thathavebeen

used in obtainingthese respective errors. Asa matter of fact, due to the

proportional weighting as speciedin the denition, the TER willalways

be closer to that type of error (FAR or FRR) which has been obtained

usingthelargest numberof accesses.

The overall performanceof an identityverication system ishowever

bet-tercharacterizedbyit'sso-calledReceiverOperating Characteristic(ROC),

which represents the FAR as a function of the FRR [167]. The

Detec-tionErrorTradeo(DET)curve isaconvenient non-lineartransformation

of the ROC curve, which has become the standard method for

compar-ingperformancesof speakerverication methodsusedintheannualNIST

evaluationcampaigns[142]. InaDET curve,thehorizontalaxisshowsthe

normaldeviateoftheFalseAlarmprobabilityin(%),whichisanon-linear

transformationofthehorizontalFalseAcceptanceaxisoftheclassicalROC

curve. TheverticalaxisoftheDETcurverepresentsnormaldeviateofthe

Missprobability(in %), which is a non-lineartransformation of theFalse

Rejection axis of the classicalROC curve. The use of the normal deviate

scalemovesthecurvesawayfromthelowerleftwhen performanceis high,

making comparisons between dierent systems easier. It can also be

(32)

portionoftheirrange. Furtherdetailsofthisnon-lineartransformationare

presented in [115 ]. Figures 2.4 and 2.5 give respectivelyan example of a

typical ROCand atypical DET curve.

0

5

10

15

20

25

30

35

40

45

50

0

5

10

15

20

25

30

35

40

45

50 false acceptance (%)

false rejection (%)

NIST EPFL MALES

GMM EPFL

EER

Figure 2.4: Typical exampleofa ROCcurve.

Each point on a ROC or a DET characteristic corresponds with a

parti-culardecision threshold. The Equal Error Rate (EER: i.e. when FAR =

FRR),isoften usedasthe onlyperformancemeasure of an identity

veri-cationmethod,althoughthismeasure givesjustone pointoftheROCand

comparingdierentsystemssolelybasedonthissinglenumbercanbevery

misleading[129].

Highsecurityaccessapplicationsareconcernedaboutbreak-ins andhence

operateatapointontheROCwithsmallFAR.Forensicapplicationsdesire

tocatchacriminalevenattheexpenseofexaminingalargenumberoffalse

accepts and hence operate at small FRR/high FAR. Civilianapplications

attempt to operate at the operating points with both low FRR and low

FAR.Theseconcepts areshown inFigure2.6, whichwas foundin [88 ].

Unfortunately in practice, as will be shown further in the study of the

fusionmodulespresentedinthisthesis,itisnotalwayspossibletoexplicitly

identify a continuous decisionthresholdin a certain fusionmodule, which

meansthatinthatcaseitwillafortiorinotbepossibletovarythedecision

(33)

Figure2.5: Typical exampleof a DETcurve.

False Acceptance Rate

False Rejection Rate

Equal Error Rate

High Security Access Applications

Forensic Applications

Figure2.6: Typical examples ofdierent operatingpointsfordierent

(34)

alsotheonlycorrectwayofdeterminingtheperformanceof anoperational

system, sinceinsuch systemsthedecisionthresholdhas beenxed.

AllvericationresultsinthisthesiswillbegivenintermsofFRR,FAR,and

TER.Foreacherrorthe95%levelcondenceintervalwillbegivenbetween

squarebrackets. The concept of condenceintervals refersto the inherent

uncertaintyintestresultsowingtosmallsamplesize. Theseintervalsarea

posterioriestimatesoftheuncertaintyintheresultsonthetestpopulation.

They do not include the uncertainties caused by errors (mislabeled data,

forexample)inthetestprocess. Thecondenceintervals donotrepresent

a prioriestimates ofperformanceindierentapplicationsorwithdierent

populations[182 ].

These condence levels will be calculated assuming that the probability

distributionforthenumberoferrorsisbinomial. Butsincethebinomiallaw

cannotbeeasilyanalyzedinananalyticalway,thecalculationofcondence

intervalscan notbe donedirectlyin ananalyticalway. Thereforewe have

usedthe Normal law asan approximation of the binomiallaw. This large

sampleapproach isalready statisticallyjustiedstartingfrom 30 samples.

Usingthisapproximation,the95%condenceintervalofan errorE based

onN tests, isdenedbythefollowinglower(givenbytheminussign)and

upper(given bythe plussign)bounds:

E1:96

r

E(1 E)

N :

Moredetailedinformationaboutthecalculationofcondenceintervalscan

be foundin[41 , 44,155 ].

2.7 State of the art

2.7.1 General overview

Some work on multi-modal biometric identityverication systems has

al-ready been reported in the literature. Hereafter, an overview is given of

themostimportantcontributions,withabriefdescriptionofthework

per-formed.

1. Asearlyas1993,ChibelushietAl. haveproposedin[40 ] tointegrate

acousticandvisualspeech(motion ofvisiblearticulators) forspeaker

recognition. The combination scheme used is a simple linear one.

(35)

2. In1995, Brunelliand Falavignahave proposed in[33 ] aperson

iden-tication system based on acoustic and visual features. The voice

modality is based on a text-independent vector quantization and it

uses two typesof information: staticand dynamic acoustic features.

The face modality implements a template matching technique on

three distinct areas of the face (eyes, nose, and mouth). They use

adatabasecontainingup tothreesessionsof87 persons. Onesession

wasusedfortraining,theothersfortesting,whichdidleadtoa total

numberof 155 tests. Themostperformingfusionmodule isa neural

network. The best results obtained on thisparticular database are:

FAR =0:5% andFRR =1:5%.

3. In1997, Dieckmann et Al. have proposedin [50 ] a decisionlevel

fu-sion scheme, based on a 2-out-of-3 majority voting. This approach

integrates two biometricmodalities (face and voice), which are

ana-lyzedbythree dierentexperts: (static) face, (dynamic) lipmotion,

and (dynamic) voice. The authors have tested their approach on a

specic database of 15 persons, where the best verication results

obtainedwere FAR=0:3% andFRR =0:2%.

4. In1997, Ducet Al. didproposein[55]a simpleaveraging technique

and compared it withthe Bayesian integration scheme presented by

Bigun et Al. in [13 ]. In this multi-modal system the authors use a

frontal face identication expert based on Elastic Graph Matching,

anda text-dependentspeechexpertbasedon person-dependent

Hid-denMarkovModels(HMMs) forisolateddigits. Allexperimentsare

performedontheM2VTSdatabase,andthebestresultsareobtained

fortheBayesianfusionmodule: FAR =0:54% and FRR=0:00%.

5. In1997,JourlinetAl. haveproposedin[93 ]anacoustic-labialspeaker

vericationmethod. Theirapproachusestwoclassiers. Oneisbased

on a liptracker using visualfeatures, and the other one is based on

atext-dependentperson-dependentHMMmodelingofisolateddigits

usingacoustic features. The fusedscoreis computedastheweighted

sumof thescores generated by thetwo experts. Allexperimentsare

performedontheM2VTSdatabase,andthebestresultsobtainedfor

theweightedfusionmodule are: FAR =0:5% and FRR=2:8%.

6. In 1998, Kittler et Al. have proposed in [98 ] a multi-modal person

(36)

prole expert is usinga chamfer matching algorithm, and the voice

expertisbasedontheuseoftext-dependentperson-dependentHMM

modelsfor isolateddigits. Allthese experts give theirsoft decisions

(scoresbetweenzero andone)to thefusionmodule. Allexperiments

are performed on the M2VTS database, and the best combination

resultsareobtainedfora simplesum rule: EER =0:7%.

7. In 1998, Hong and Jain have proposed in [82 ] a multi-modal

per-sonalidenticationsystem which integrates two dierent biometrics

(face and ngerprints) that complement each other. The face

ver-ication is done using the eigenfaces approach, and the ngerprint

expertisbasedon aso-calledelasticmatching algorithm. Thefusion

algorithm operates at the expert decision level, where it combines

thescoresfrom thedierentexperts(underthestatistically

indepen-dencehypothesis),bysimplymultiplyingthem. Thefaccept,rejectg

decision is then taken by comparing the fused score to a threshold.

The databases used in this work are the Michigan State University

ngerprint database containing 1500 images from 150 persons, and

a facedatabase coming from theOlivetti Research Lab, the

Univer-sity of Bern, and the MIT Media Lab, which contains 1132 images

from86persons. Theresultsobtainedforthefusionapproachonthis

databaseare: FAR =1:0% and FRR =1:8%.

8. In 1998, Ben-Yacoub did propose in [7 ] a multi-modal data fusion

approach for person authentication, based on Support Vector

Ma-chines (SVM). In his multi-modal system he uses the same experts

and the same database as Duc et Al. in the work presented above.

The best results which he obtained for the SVM fusion module are

FAR =0:07% andFRR =0:00%.

9. In 1999, Pigeon did propose in [135 ] a multi-modal person

authen-tication approach based on simple fusion algorithms. In this

multi-modal system the author uses a face identication expert based on

templatematching,aproleidenticationexpertbasedonachamfer

matching algorithm, and a text-dependent speech expert based on

person-dependent HMM modelsfor isolated digits. All experiments

areperformed onthe M2VTS database, and thebest resultsare

ob-tained for a fusion module based on a linear discriminant function:

(37)

recognitionsystemusingunconstrainedaudioandvideo. Thesystem

doesnotneedfullyfrontal faceimagesorclean speechasinput. The

face expert is based on the eigenfaces approach, and the audio

ex-pert uses a text-independent HMM using Gaussian Mixture Models

(GMMs). The combination of these two experts is performed using

a Bayes net. The system was tested on a specic database

contain-ing 26 persons and the best results obtained using the best images

and audio clips from an entire sessionare: FAR = 0:00% and FRR

=0:00%.

2.7.2 Results obtained on the M2VTS database

To facilitate the comparison with the work presented in this thesis, we

haveisolatedfromthepreviousstateofthearttheresultswhichhavebeen

obtained on the same M2VTS database asthe one we have been working

on. TheseresultsarepresentedinTable2.1hereafter. Whereavailable,the

condence interval is indicated between square brackets. Care should be

takenhoweverwhencomparingtheseresults,sincetheexpertsusedarenot

necessarilythesame forallmethods. The lastlineinthisTable represents

thebestresultsobtainedinthisthesis,usinga logistic regressionmodel.

Table2.1: StateoftheartofthevericationresultsobtainedontheM2VTS

database.

Author(s) Experts FRR (%) FAR (%)

DucetAl. frontal,vocal 0.00 0.54

JourlinetAl. lips,vocal 2.80 0.50

Kittler et Al. frontal,prole, vocal 0.70(EER) 0.70 (EER)

Ben-Yacoub frontal,vocal 0.00 0.07

Pigeon frontal,prole, vocal 0.78 0.07

Verlinde frontal,prole, vocal 0.00 0.00

2.8 Comments

(38)

lim-of all applications. We have seen that voice is one of the most popular

biometrics, thanks to its high acceptability and its user-friendliness [88 ].

Since voice is a behavioralbiometric modality and sincein a multi-modal

approachitiswiseto complementabehavioralmodalitywitha

physiolog-icalone,wewantedto add a physiological modalitywhich also washighly

acceptable. These considerations have led to choose the visual modality.

In the framework of the M2VTS application, another important criterion

forchoosing the dierent biometrics was the availabilityof the hardware.

Withrespectto thetele-services,theideawastouseso-calledmulti-media

PC's, which are equipped with low-cost microphones and CCD-camera.

Theseconsiderationsreinforceeachotherandtheyexplainwhyinthe

multi-modalsystempresentedinthiswork,voiceandvisionwereusedasthetwo

(complementary) biometric modalities. Analyzing the state of the art in

automaticbiometricmulti-modalidentityvericationsystems, ithasbeen

shown that on the M2VTS database, the best method presented in this

(39)

(40)

Experimental setup

3.1 Introduction

Thischapter starts bypresenting theM2VTS databaseused inthiswork.

Afterthis,theexperimentalprotocolusedfortestingtheindividualexperts

and thefusionmodulesis described. Finally,the three dierent biometric

experts (a frontal,a prole and a vocal one) we have beenusing

through-outthiswork arebrie y introducedand theirindividualperformancesare

highlighted. Thisisfollowedbyathoroughstatisticalanalysisoftheresults

givenbythesethree dierentexpertsforbothclientandimpostoraccesses.

Inthisanalysisitisshownthatthedistributionofthescoresperexpertand

per type of access(the so-calledconditional probabilitydensityfunctions)

donot satisfytheNormality hypothesis. Furthermore itisshownthatthe

chosen experts do have good discriminatory power, and are

complemen-tary. The potentialgain obtained by combining theresults of these three

dierentexperts areshown bymeans ofa simplelinearclassier.

3.2 The M2VTS audio-visual person database

The M2VTS [1] multi-modal database comprises 37 dierent persons and

provides5 shotsforeach person. Theseshotswere taken at intervals of at

leastoneweek. Duringeachshot,peoplewereasked (1)to countfrom \0"

to \9" in French (which was the native language for most of the people)

and (2) to rotate their head from 0 to -90 degrees, back to 0 and further

to +90 degrees, and nally back again to 0 degrees. The most diÆcult

shot to recognize is the 5

th

shot. Thisshot mainly diersfrom the others

(41)

presence of ahat/scarf, ::: ), voice variations orshot imperfections(poor

focus, dierent zoom factor, poor voice signal to noise ratio, ::: ). More

detailswithrespectto thisdatabasecan befoundin[136,137 , 135 ].

Takinginto account thespecicityof ourproblem(i.e. combining outputs

of several experts) we are not going to use this5

th

shot, since we arenot

interested in developing individual powerful experts that work well even

underthese extremeconditions aspresentedbyshotnumber5.

ToshowthequalityofthepicturescontainedinthesmallM2VTSdatabase,

Figures3.1, 3.2, and 3.3showrespectivelythefrontal viewsof some

per-sons,therotation sequence and the5 dierent shotsforone and the same

person [135].

3.3 Experimental protocol

3.3.1 General issues

Inthemostgeneral(butrich) case,three dierentdatasetsareneeded for

training, ne-tuningand testing theindividualexperts. Therst dataset

iscalled thetraining set and is usedbyeach expert to model thedierent

persons. The second data set is called the development or validation set

andisusedtone-tunethedierentexperts,forinstancebycalculatingthe

decisionthresholds. Thethirddatasetiscalledthetestsetanditisusedto

test the performances of theobtained experts. For thefusion module, we

candeneinthemostgeneralcaseexactlythesamedatasetsasinthecase

of the individualexperts. This general concept of the use of the dierent

datasets is illustratedin Figure 3.4. Thisdoes notnecessarily mean that

onealwayswillneedsixcompletelyseparateddatasets,sincethefactthat

the test set for the individual experts is completely dissociated from the

development of the experts, makes it suitable to be reused for the fusion

module. Furthermore, not all types of experts, nor all fusion modules do

includethemodelingofthepersons. Thismeansthatintheparticularcase

ofexperts andfusionmoduleswhichdo notusedatatomodelpersonsand

intheobviouscaseinwhichwedoreusetheexperttestsetasadatasetfor

thefusionmodule,one onlyneeds threedierentdatasets insteadofsix in

themostgeneralcase. ThisisillustratedinFigure3.5. Intheintermediate

case, where the experts do need separate trainingand development data,

butthefusionmoduledoesnotneedanydevelopmentdata,one needsfour

dierentdata sets,asillustrated inFigure3.6.

(42)

(43)

Figure3.2: M2VTS database: viewstaken from a rotationsequence.

Figure3.3: M2VTS database: frontalviewsofone personcomingfromthe

(44)

Training

Development

Testing

Development

Training

Fusion module

Expert

Dataset 1

Dataset 4

Dataset 5

Dataset 6

Dataset 3

Dataset 2

Figure 3.4: Themostgeneral casewhere 6dierent datasets areused.

Expert

Testing

Training

Dataset 1

Dataset 2

Training

Dataset 3

Testing

Fusion module

(45)

Development

Testing

Training

Dataset 1

Expert

Dataset 2

Dataset 3

Training

Dataset 4

Testing

Fusion module

Figure3.6: Theintermediatecasewherefourdierentdatasetsareneeded.

ifthetestdataisthesame asthetrainingdata,performanceswillbe

overestimated. This is true for both the individualexperts and the

fusionmodule. Thisis ofcourseduetothefactthat theexperts and

the fusion module will generate the best results for the same data

theyhave beentrainedon.

ifthetrainingdatafortheexperts isthesame asforthefusion

mod-ule,thefusionmodulewillbeunderperforming. Thereasonforthisis

thatthefusionmoduledoesn'tgetenoughinformation. Indeed,inthe

extremecaseofexpertsthatperformperfectlyontheirtrainingdata,

the outcome of such an expert would be either 0 or 1, which leaves

thefusion module with the arbitrary choice of setting the threshold

somewhere inbetween.

3.3.2 Experimental protocol

Forourexperiments,wehaveoptedforaverysimpleexperimentalprotocol.

Inthisprotocolwe useonlytherst foursessionsof theM2VTSdatabase

(46)

experts. This means that each access has been used to model the

respective client,yielding 37dierent clientmodels.

2. Thenthe accesses fromeach personinthe secondenrollment session

have beenusedto generatevalidationdataintwo dierentmanners.

Oncetoderiveone singleclientaccessbymatchingtheshotofa

spe-cic person with its own reference model, and once to generate 36

impostor access by matching it to the 36 models of the other

per-sons of the database. This simple strategy thus leads to 37 client

and 3637=1.332 impostor accesses, which have been used for

vali-datingtheperformanceof the individualexperts and forcalculating

thresholds.

3. Thethirdenrollmentsessionhasbeenusedtotesttheseexperts,using

thethresholdscalculated onthe validation dataset. Thissame data

sethasalsobeenusedtotrainthefusionmodules,whichagain leads

to 37 clientand 1.332 impostor reference points.

4. Finally,thefourthenrollmentsessionhasbeenusedtotestthefusion

modules,yieldingoncemorethesame numberofclient andimpostor

claims.

The drawback of this simple protocol, is that the impostors are known

at the expert and supervisor training time. In section 8.3.2, validation

results will be presented using a protocol that does not suer from the

same drawback. This validationprotocolis implementedusinga so-called

leave-one-outmethod[49].

3.4 Identity verication experts

3.4.1 Short presentation

Alltheexperimentsinthisthesishavebeenperformedusingthreedierent

identity verication experts. Each one of these experts will be described

brie yhereafter.

Prole image expert

The prole image verication expert is described in detail in [138] and

its description hereafter has been inspiredby the presentation of this

(47)

corresponding to the claimed identity. The candidate image prole is

ex-tractedfromtheproleimagesbymeansofcolor-basedsegmentation. The

similarityof thetwo prolesis measuredusingtheChamferdistance

com-putedsequentially[28 ]. TheeÆciencyofthevericationprocessisaidedby

pre-computing a distance map for each reference prole. The map stores

thedistance of each pixel in the proleimage to the nearest point on the

reference prole. As the candidate prole can be subject to translation,

rotation and scaling,theobjective of thematchingstage is to compensate

forsuch geometric transformations. The parameters of the compensating

transformationaredeterminedbyminimizingthechamferdistancebetween

the template and the transformed candidate prole. The optimization is

carried out using a simplex algorithm which requires only the distance

function evaluation and no derivatives. The convergence of the simplex

algorithmtoalocalminimumispreventedbyacarefulinitializationofthe

transformation parameters. The translationparameters are estimated by

comparing the position of the nose tip in the two matched proles. The

scale factor is derived from the comparison of the prole heights and the

rotationisinitiallysetto zero. Once theoptimalsetoftransformation

pa-rameters is determined,the user is accepted orrejected dependingon the

relationship of the minimal chamfer distance to a pre-speciedthreshold.

The system can be trained very easily. It is suÆcient to store one prole

perclientin thetrainingset.

Frontal image expert

Thefrontalimageverication expertisdescribed indetailin[116]andthe

descriptionhereafter was based on the presentation of this expert in [98 ].

This frontal image expert is based on robust correlation of a frontal face

imageofthepersonundertest andthestoredfacetemplatecorresponding

totheclaimedidentity. A searchfortheoptimumcorrelationisperformed

inthespace of all validgeometricand photometric transformations ofthe

inputimagetoobtainthebestpossiblematchwithrespecttothetemplate.

The geometric transformation includes translation, rotation and scaling,

whereasthephotometric transformationcorrects forachangeof themean

levelof illumination. The search technique fortheoptimaltransformation

parameters is based on random exponential distributions. Accordingly,

at each stage the transformation between the test and reference images

is perturbed by a random vector drawn from an exponential distribution

(48)

transformedfaceimageandthetemplate,andthesimilarityoftheintensity

distributionsofthetwoimages. Thedegreeofsimilarityismeasuredwitha

robustkernel. Thisensuresthatgrosserrorsdueto,forinstance,hairstyle

changes do notswamp thecumulative errorbetween thematchedimages.

Inotherwords,thematchingisbenevolent,aimingtondaslargeareasof

the face as possible, supporting a close agreement between the respective

gray-level histogramsof thetwo images. Thegross errors willbe re ected

inareducedoverlapbetweenthetwoimages,whichistakenintoaccountin

theoverall matchingcriterion. The systemistrainedvery easilybymeans

ofstoring one templateforeach client. Eachreference image issegmented

tocreateafacemaskwhichexcludesthebackgroundandthetorso asthese

arelikelyto changeovertime.

Vocal expert

The vocal identity verication expert is presented in detail in [22 ]. This

text-independent speakerverication expert is based on a similarity

mea-surebetween speakers,calculated onsecond order statistics[21].

In this algorithm a rst covariance matrix X is generated from a

refe-rencesequence,consistingofM m-dimensionalacousticalvectors,and

pro-nouncedbytheperson who'sidentityisclaimed:

X= 1 M M X i=1 X i X T i ; whereX T i is X i transposed.

AsecondcovariancematrixY isthengeneratedinthesamewayfroma

se-quence,consistingofM m-dimensionalacousticalvectors,andpronounced

bythe person undertest.

Thena similaritymeasure betweenthese twospeakersisperformed,based

onthe sphericitymeasure

AH ( X;Y) : AH (X;Y)=log A H ; A( 1 ; 2 ;:::; m )= 1 m m X i=1 i =m 1 tr YX 1 ; H( 1 ; 2 ;:::; m )=m m X 1 i ! 1 =m tr XY 1 1 :

(49)

It can be shown that this sphericity measure is always non-negative and

it is equal to zero only in the case that the two covariance matrices X

and Y are the same. The verication process consists then of comparing

theobtainedsphericitymeasure witha decisionthreshold,calculatedon a

validation database.

Oneofthegreatadvantagesof thisalgorithmisthatno explicitextraction

ofthemeigenvalues

i

isnecessary,sincethesphericitymeasureonlyneeds

thecalculation ofthetrace tr( ) ofthematrix productYX

1

orXY

1 .

3.4.2 Performances

The performances achieved by the three mono-modal identity verication

systemswhichhavebeenusedinthese experimentsaregiven inTable 3.1.

The resultshave been obtainedbyadjusting thethresholdat theEER on

the validation set and applying this thresholdas an a priorithreshold on

thetest set. Observingthe resultsfor theprole an thefrontal experts it

can be seen that, although the optimization has been done according to

theEERcriterion,theFRRandtheFARareverydierent. Thisindicates

that for these two experts, the training and validation sets are not very

representative of thetest set.

Table 3.1: Verication resultsforindividualexperts.

Expert FRR(%) FAR (%) TER (%)

(37tests) (1.332 tests) (1.369 tests)

Prole 21.6[11.4,37.2] 8.5[7.1,10.1] 8.9[7.5,10.5]

Frontal 21.6[11.4,37.2] 8.3 [6.9, 9.9] 8.7[7.3,10.3]

Vocal 5.4[ 1.5,17.7] 3.6 [2.7, 4.7] 3.7[2.8, 4.8]

3.4.3 Statistical analysis of the dierent experts

Introduction

Astatistical analysisoftheindividualexperts

1

isimportantto getanidea

onone handoftheirindividualdiscriminatorypower,and oftheir

comple-mentarityonthe otherhand.

1

(50)

Thepowerofan expert to discriminatebetween clientsand impostorswill

increase(forgivenvariances)withthedierencebetweenthemeanvalueof

thescores obtainedforclient accessesand themean value ofthescores

ob-tainedforimpostoraccesses. Thetypicalstatisticaltesttoseeifthereexist

signicant dierences between the means (or more generally between the

statisticalmomentofrstorder)ofseveralpopulationsistheso-called

ana-lysisofvariance(ANOVA).Inthegeneralcase,thisanalysisisimplemented

usinganF-test. Inthespeciccaseoftwopopulations,thisANOVAcould

also be performedusingan independentsamplest-test [123]. Another

im-portantcharacteristicofanexpertisitsvariance(ormoregenerallythe

sta-tistical moment of second order). The equality of variances can be tested

by a Levene test, which is also implemented using an F-test [114 ]. It is

advantageous thatthevarianceofan expert isthesame forclientsand for

impostors, because thisleadsto simplermethods to combine thedierent

experts (see chapter 6). Obviously we will needto performt- and F-tests

to analyze themeans and thevariances of the dierentexperts. However,

thet-and F-testsgive onlyexact resultsifthepopulationshavea Normal

distribution. So beforewe canuset-orF-tests, we needto verifythe

Nor-malityofthedierentpopulations. Thusthisistherststatisticalanalysis

thatweneedto perform. SincetheANOVA isonlyvalidifthevariancesof

thedierentpopulationsperexpertareequal, we have to checkthe

equal-ityofvariances before performingtheANOVA.These remarksexplainthe

forcedorder of therst three analysesthatare presentedbelow.

Wecangetanideaoftheindependenceofthedierentexperts(andthusof

theamount of extra informationthateach expert bringsin), byanalyzing

theircorrelation. Anda lineardiscriminantanalysisgivesusarst ideaof

thecombined discriminatorypoweroftheexperts.

Last butnotleast, the analysis ofthe extremevaluesgives usinsight into

thepossibleuse ofpersonalized approaches.

Analysis of Normality

ThepurposeofaNormalityanalysisisto checkwhether theobserveddata

do or do not support the hypothesis H

0

that the underlying probability

densityfunction is Normal. There exist two types of teststo perform this

analysis: objective (numerical) and subjective (graphical) tests. An

im-portant remark related to the vericationof H

0

is that theassumption of

NormalityismuchmorediÆculttoverifywhenusingsmallsamplesizes. In