HAL Id: tel-00005685
https://pastel.archives-ouvertes.fr/tel-00005685
Submitted on 5 Apr 2004
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
en utilisant la fusion de décisions
Patrick Verlinde
To cite this version:
Patrick Verlinde. Contribution à la vérification multi-modale de l’identité en utilisant la fusion de
décisions. Interface homme-machine [cs.HC]. Télécom ParisTech, 1999. Français. �tel-00005685�
46,rue Barrault
Paris75634 Cedex 13
France
A CONTRIBUTION TO MULTI-MODAL
IDENTITY VERIFICATION USING DECISION
FUSION
by
Patrick Verlinde
Dissertation submittedto obtainthedegree of
Docteur de l'Ecole Nationale Superieure des
Telecommunications
Specialite: Signal et Images
Compositionofthe thesiscommittee:
Jean-PaulHaton (LORIA)- President
Gerard Chollet(ENST) - Director
MarcAcheroy(RMA) - Reporter
IsabelleBloch (ENST)- Reporter
PaulDelogne (UCL)- Examiner
JosefKittler(UOS) -Examiner
my twin daughters
In the rst place I wish to thank my thesis director dr. Gerard Chollet
from CNRS/URA 820 (FR), forhis drivingforce,for his critical and very
usefuladvises, forhavinginvolvedmyresearchinevery suitableprojecthe
couldnd, andforthe huge amountsof informationheprovidedmewith.
Ialsowouldliketo thankprof. dr. ir. Jean-PaulHatonfromLORIA(FR)
forhis usefuladvises he gave me and forhavinghonored me byaccepting
to be thepresident ofmythesiscommittee.
Special thanks go to prof. dr. ir. Marc Acheroy, head of the electrical
engineeringdepartment of theRMA and director ofthe Signaland Image
Centre (SIC) forbelieving inme, forhavingsupported and motivated me
all thetime, for hiscontinuous ow of advises, and, lastbutnotleast, for
havingaccepted to be a reporterforthisthesis.
I also want to express my sincere gratitude towards prof. dr. ir. Isabelle
BlochfromENST/TSI(FR)notonlyforhelpingmeinaveryfriendlyway
to eectively control my \uncertainties", but also for having accepted to
be a reporterformywork.
Iamveryproudtohaveprof. dr. ir. JosefKittlerfromUniversityofSurrey
(UK)inmythesiscommittee, andI wouldliketo thankhimespecially for
hishelping commentswith respectto thestatistical aspects ofthisstudy.
Thank you prof. dr. ir. Paul Delogne from UCL/TELE (BE), for your
many\personalized"commentswhichI'msurehaveimprovedthecontents
as well as the readability of this work, and for having accepted to be a
tary Academy (RMA), forhaving granted me the time and the means to
nish this thesis, and for having accepted to be a member of my thesis
committee.
Thankstoir. CharlesBeumierfromRMA/SIC(BE)forhishelpintheeld
of machine visionin general and in the framework of the M2VTS project
in particular, to dr. ir. Stephane Pigeon from RMA/SIC (BE) for his
criticalremarks inthegeneraleld offusionand forhishelpin thedesign
ofthe experimental protocols and forwritingthe software forthe NIST99
evaluationand fortheevaluation of themixtureof experts.
Thanksto dr. ir. Jan
Cernock y fromVUT (CZ)forhis hospitalityduring
the NIST99 evaluations and for his help in generating the data for the
experimentsinvolvingthemixture ofexperts.
Thanks to dr. ir. GilbertMa^treformerly from IDIAP/visiongroup(CH)
for his very positive help in the design of experimental protocols and in
theeldof machinevision, to dr. ir. EddyMayoraz from IDIAP/machine
learninggroup(CH)forhis friendlyguidance andfor hishelp in
formaliz-ing theparadigm of the multi-linearclassier, to ir. Frederic Gobry from
IDIAP/machine learning group (CH) for his help in writing the code for
themulti-linearclassier, and to dr. ir. Dominique(Doms) Genoudfrom
IDIAP/speech group(CH)forhis helpintheeld of speakerverication.
Thanks also to ir. Guillaume (Guig) Gravier from ENST/TSI (FR) for
his friendship and for all his help in the elds of speaker verication and
informationtechnology.
I also would like to thank Bruno, Chris, Dirk, Florence, Idrissa, Lionel,
Marc, Michel, Monica, Nada, Pascal, Vincianne, Wim, Xavier, Yann,
Youssef, and all my other colleagues from RMA/ELTE (BE) and from
ENST/TSI (FR) for the wonderful working atmosphere they all have
contributedto.
Iam gratefulfor thecontributionsof thefollowing students: Benny Tops,
and DangVan Thuong.
Finally I would like to thank Renate for her love, support, and so much
1 Introduction 1
1.1 Introduction. . . 1
1.2 Subjectof thethesis . . . 1
1.3 Identitydeterminationconcepts . . . 4
1.4 Structure ofthe thesis . . . 4
1.5 Originalcontributionsof thisthesis . . . 7
I General issues related to automatic biometric multi-modal identity verication systems 9 2 Biometric verication systems 11 2.1 Introduction. . . 11
2.2 Requirementsforbiometrics . . . 11
2.3 Classicationof biometrics . . . 12
2.4 Generalstructureof amono-modal biometricsystem . . . . 14
2.5 Theneed formulti-modalbiometricsystems . . . 15
2.6 Characterization ofa vericationsystem . . . 17
2.7 State oftheart . . . 21
2.7.1 Generaloverview . . . 21
2.7.2 Results obtainedon theM2VTS database . . . 24
2.8 Comments . . . 24
3 Experimental setup 27 3.1 Introduction. . . 27
3.2 TheM2VTS audio-visual persondatabase . . . 27
3.3 Experimental protocol . . . 28
3.3.1 Generalissues. . . 28
3.4.2 Performances . . . 36
3.4.3 Statistical analysisof thedierent experts . . . 36
3.5 Comments . . . 49
4 Data fusion concepts 51 4.1 Introduction. . . 51
4.2 Taxonomyofdata fusionlevels . . . 51
4.3 Decisionfusionarchitectures. . . 53
4.4 Parallel decisionfusionas aparticular classicationproblem 54 4.5 Comments . . . 55
II Combiningthe dierent experts inautomatic biomet-ric multi-modal identity verication systems 57 5 Introduction to part two 59 5.1 Goal . . . 59
5.2 Parametric ornon-parametric methods? . . . 59
5.3 Comments . . . 61
6 Parametric methods 62 6.1 Introduction. . . 62
6.2 A simpleclassier: themulti-linearclassier . . . 63
6.2.1 Decision fusionasa particular classicationproblem 63 6.2.2 Principle . . . 64
6.2.3 Training . . . 65
6.2.4 Testing . . . 67
6.2.5 Results . . . 67
6.2.6 Partial conclusionsand future work . . . 70
6.3 A statisticalframework fordecisionfusion . . . 71
6.3.1 Bayesian decisiontheory . . . 71
6.3.2 Neyman-Pearson theory . . . 74
6.3.3 Application of Bayesian decision theory to decision fusion . . . 76
6.3.4 The naive Bayesclassier . . . 77
6.3.5 Applicationsof thenaive Bayes classier. . . 78
6.3.6 The issueof thea prioriprobabilities. . . 86
6.4.2 Results . . . 93 6.4.3 Mixture of Experts . . . 94 6.5 Comments . . . 94 7 Non-parametric methods 97 7.1 Introduction. . . 97 7.2 Voting techniques. . . 97 7.3 A classicalk-NN classier . . . 99
7.4 A k-NNclassier usingdistanceweighting . . . 100
7.5 A k-NNclassier usingvector quantization . . . 101
7.6 A decisiontreebased classier . . . 103
7.7 Comments . . . 106
8 Comparing the dierent methods 107 8.1 Introduction. . . 107
8.2 Parametric versus non-parametricmethods . . . 107
8.3 Experimental comparisonof classiers . . . 108
8.3.1 Testresults . . . 108 8.3.2 Validationresults . . . 110 8.3.3 Statistical signicance . . . 111 8.4 Visualinterpretations . . . 113 8.5 Comments . . . 113 9 Multi-level strategy 115 9.1 Introduction. . . 115
9.2 A multi-leveldecisionfusionstrategy . . . 115
9.3 Mono-modalmono-expert fusion . . . 116
9.3.1 Introduction . . . 116
9.3.2 Results . . . 117
9.4 Mono-modalmulti-expert fusion . . . 118
9.4.1 Introduction . . . 118
9.4.2 Methods . . . 119
9.4.3 Results . . . 120
9.4.4 Combiningthe outputsofsegmentalvocal experts . 120 9.4.5 Combiningthe outputsofglobal vocal experts . . . 123
9.5 Multi-modal multi-expert fusion . . . 126
10.2 Future work . . . 133
Bibliographie 135
A A monotone multi-linear classier 149
B The iterative goal function 163
C The global goal function 167
D Proof of equivalence 169
E Expression of the conditional probabilities 173
F Visual interpretations 177
ATM Automatic Teller Machine
AVBPA Audio-and Video-based BiometricPerson Authentication
BDT BinaryDecision Tree
DET Detection ErrorTradeo
EER EqualErrorRate (FAR=FRR)
FA False Acceptance
FAR False Acceptance Rate
FE FrontalExpert
FR False Rejection
FRR False Rejection Rate
GMM Gaussian MixtureModel
HMM HiddenMarkov Model
k-NN k-Nearest Neighbor
LC LinearClassier
LDA LinearDiscriminantAnalysis
LR naive bayesclassier usingaLogistic Regressionmodel
M2VTS MultiModalVericationforTeleservices and Securityapplications
MAJ Majorityvoting
MAP MaximumA posteriori Probability
MCP MaximumConditionalProbability
ML MaximumLikelihood
MLP Multi-Layer Perceptron
NBG Naive Bayes classier usingGaussian distributions
NIST National InstituteforStandardsand Technology (USA)
NN Nearest Neighbor
NSA National SecurityAgency(USA)
PE ProleExpert
PIN Personal IdenticationNumber
PLC Piece-wise LinearClassier
QC Quadratic Classier
ROC ReceiverOperatingCharacteristic
TD Temporal Decomposition
TER TotalErrorRate
VE VocalExpert
Introduction
1.1 Introduction
The rst chapter starts by introducingthe subject of thethesis. Toavoid
confusion,thisintroductionisfollowedbyanexplanationofthedierences
and/orsimilaritiesbetweenterms thatare oftenencountered inthe
litera-ture related to the eld of automatic identity \determination", which are
authentication, recognition, identication, and verication. These
deni-tionsare followed bya presentation of thestructureof the thesis and this
chapter isendedbyclearlystating theoriginalcontributionsofthisthesis.
1.2 Subject of the thesis
Thisthesis deals withthe automaticverication of the identity of a
coop-erativepersonundertest,bycombiningtheresultsofanalysesofhisorher
face, proleand voice. This specic application whichis used throughout
thiswork,hasbeendenedintheframeworkof theM2VTS(Multi-Modal
Vericationfor Tele-services and Securityapplications) project of the
Eu-ropean Union ACTS program [1 ]. The exact denitionof verication and
thedierences withother, oftenencountered terms,such as identication,
authenticationor recognition,willbe explainedhereafter. Thekey idea in
thisthesis is to analyze thepossibilitiesof usingdata fusion techniques to
combinetheresultsobtainedbydierentbiometric(face,proleandvoice)
expertsthateachhaveanalyzedtheidentityclaimofthepersonundertest.
Inthis work we are explicitlyavoidingissuessuch as ethics,responsibility
orprivacy. The interestedreadercanndan introductiontothese delicate
The automaticverication ofa person ismore and more becoming an
im-portant tool in several applications such ascontrolled access to restricted
(physicalandvirtual)environments. Justthinkaboutsecuretele-shopping,
accessingthesaferoomofyourbank,tele-banking,accessingtheservicesof
interactive dialogue systems [175], orwithdrawingmoney from automatic
tellermachines(ATM).
Anumberofdierentreadilyavailabletechniques,suchaspasswords,
mag-netic stripe cards and Personal Identication Numbers (PIN) are already
widelyused inthiscontext, buttheonlything they reallyverify is,inthe
bestcase,acombinationofacertainpossession(forinstancethepossession
of the correct magnetic stripe card) and of a certain knowledge, through
thecorrect restitution of acharacter and/or digit combination. As is well
known,these intrinsicallysimple(access)controlmechanismscan very
eas-ilyleadtoabuses, inducedforinstancebythelossortheftofthemagnetic
stripe card and the corresponding PIN. Thereforea new kindof methods
isemerging, basedon socalledbiometric characteristicsormeasures, such
asvoice,face(includingprole),eye(iris-pattern,retina-scan), ngerprint,
palm-print,hand-shape orsomeother (preferably)uniqueand measurable
physiological or behavioral characteristic information of the person to be
veried.
Inthiswork,abiometricmeasurewillalsobecalledamodality. Thismeans
thatan identityvericationsystem which uses several biometricmeasures
or modalities (for instance a visual and a vocal biometric modality) is a
multi-modal identityverication system.
Anothertermwhichwillbeusedveryofteninthisworkisanexpert. Inthis
thesis,an expert iseach algorithm ormethod usingcharacteristic features
comingfrom aparticular modalityto verifythe identityofa personunder
test. In thissense, one single biometric measure or modality can lead to
theuseofmore thanone expert(thevisualmodalitycan forinstancelead
to theuseof two experts: aprole and afrontalfaceexpert). This means
thata mono-modal identityverication system can stillbe a multi-expert
system.
Biometric measures in general, and non-invasive/user-friendly (vocal,
vi-sual) biometric measures in particular, are very attractive because they
have the huge advantage that one can not lose or forget them, and they
arereallypersonal(one cannotpass them to someone else),since theyare
based on a physical appearance measure. We can start using these
plicationsuse a classical technique (password, or magnetic stripecard) to
claima certain identitywhichisthen veriedusingone ormore biometric
measures.
Ifone uses onlya single (user-friendly) biometricmeasure, theresults
ob-tainedmaybe foundto be notgood enough. This is dueto thefact that
these user-friendlybiometricmeasures tend to vary with time forone and
thesamepersonandtomakeitevenworse,theimportanceofthisvariation
is itself very variable from one person to another. This especially is true
for the vocal (speech) modality, which shows an important intra-speaker
variability. One possible solution to try to cope with the problem of this
intra-person variabilityisto usemorethan one biometricmeasure. In this
newmulti-modalcontext, itisthusbecomingimportanttobeableto
com-bine (or fuse) the outcomes of dierent modalities or experts. There is
currentlyasignicantinternationalinterestinthistopic. Theorganization
of already two international conferences on the specic subject of
Audio-and Video-based Biometric Person Authentication (AVBPA) is probably
thebestproof ofthis[16 , 38 ].
Combining theoutcomes of dierent experts can be doneby using
classi-caldatafusiontechniques[2 , 46 ,70 ,71 ,101,170 ,172 ,181],butthemajor
drawbackofthebulkofallthesemethodsistheirratherhighdegreeof
com-plexity,which isexpressed- amongstelse- bythe factthatthese methods
tendto incorporate a lot of parameters that have to be estimated. If this
estimation is not done using enough training data (i.e. if the estimation
isnot doneproperly), thisplacesa seriousconstraint on the abilityofthe
system to correctly generalize [9, 121]. But actually a major diÆculty of
thisparticular estimation problemis the scarcity of multi-modal training
data. Indeed, to keep the automatic verication system user-friendly,the
enrollment of a (new) client should not take too much time, and as a
di-rect consequencefrom this,the amount of client trainingdatatends to be
limited. To try to deal with this lack of training data, one possibility is
to develop simple classiers (i.e. forinstance classiersthat use onlyfew
parameters),so thattheirparameters can be estimatedusingonlylimited
amountsoftrainingdata. Thepricetobepaidwhenusingsimplemethods
1.3 Identity determination concepts
Automaticsystemsforrecognizingapersonorforauthenticatinghisidentity
(whichisequivalent),allhaveadatabaseofN so-calledauthorizedpersons
orclients. Authentication orrecognition isthegeneral term, which covers
on one handidentication and on the other handverication. These two
processesarequitedierent asthefollowingmore detaileddescriptionwill
show.
Identicationinthestrictsenseofthewordsupposesaclosedworldcontext.
This means that we are sure that the person under test is a client. The
onlythingweneedto ndoutiswhichclientofthedatabaseofauthorized
persons matches \the best" the person under test. There is no criterion
(such asathresholdforinstance)to denehowgoodthe match hasto be,
tobeacceptable. Identicationisthusa 1-out-of-N matching process,and
itisclear thattheperformances decreasewithN.
Vericationinthestrictsenseofthewordoperatesinanopenworldcontext.
Thismeansthatwearenolongersurethatthepersonundertestisaclient.
Inthiscase, thepersonundertestclaimsacertainidentity,whichofcourse
hastobetheidentityofanauthorizedperson. Ifthepersonundertestisno
member of the database of authorizedpersons, he is a so-called impostor.
Verication is thus a 1-out-of-1 matching process, where it is important
that the mismatch between the reference model from the database and
themeasuredcharacteristicsoftheperson underteststaysbelowacertain
threshold. The vericationperformancesare independentof N.
Sometimespeople do refer to identication in the largesense of the word
asthe(sequential)processofidenticationfollowedbyavericationofthe
identied identity. Sometimes thisdoubleprocess isalso called
identica-tion in an openworld context.
Inthisthesis we willonlyconsiderverication problems. Thismeansthat
thedecisionproblemwe areconfrontedwithis atypical binaryhypothesis
test. Indeed,thedecisionwehavetotakeiseithertoacceptortorejectthe
identityclaim ofthe person undertest.
1.4 Structure of the thesis
Thisthesishasbeendividedintotwoparts. Intherstpart,generalissues
related to automatic multi-modal identity verication systems, such as a
auto-set-up (including the presentation and the analysis of our experts) and a
generaloverviewofdatafusionrelatedconcepts, aretreated. Inthesecond
part, the fusion of the dierent experts in a multi-modal identity
veri-cation system is implemented on the decision level, usingparametric and
non-parametricmethods. Thesedierentmethodsarethencomparedwith
eachother and a structuredhierarchicalapproach forgraduallyupgrading
theperformancesof automaticbiometricverication systemsis presented.
Attheendofthesetwoparts,weareconcludingthisthesisbysummarizing
ourcontributions to theeld and by looking at possibleextensions of the
work done.
To be more specic,the rst part isorganized asfollows. In chapter 2 we
dealwithbiometricmodalitiesand westartbylistingsome theoreticaland
practicalrequirementsthat biometricsin generalshould conformto. This
isfollowedbyasectionwhichpresentsatentativeclassicationofthemost
commonlyfoundbiometricsintotwoclasses: theso-calledphysiologicaland
behavioral biometrics. In thefollowingsection the general structureof an
automaticmono-modalbiometricvericationsystemis presented,whilein
the next section some general arguments for usingmulti-modal biometric
vericationsystemsaredeveloped. Thefollowingsectionismeantto
intro-duceanddenetheclassicalperformancecharacteristicsusedintheeldof
automaticidentityverication, and the nalsection is giving an overview
of the state of the art in multi-modal biometric identity verication
sys-tems. Chapter 3 gives detailsaboutthe experimentalset-up. It starts by
presentingtheM2VTSdatabases usedinthiswork. Afterthis,the
experi-mental protocol isdescribed. Finally,thethree dierent biometricexperts
we have been usingthroughout thiswork are brie y introducedand their
individualperformances are highlightedand statisticallyanalyzed.
Chap-ter4introducessomeelementarydatafusionconceptssuchasthedierent
datafusionlevelsand architectures,and shows howitis possible, by
mak-ing some well-funded choices, to transform a general data fusion problem
into a particularclassication problem.
Thesecond partofthisworkdealsmore particularlywiththeparallel
com-bination or fusion of the partial (soft) decisions of the dierent experts.
Chapter 5explains whywe have chosen to experiment with parametric as
well as with non-parametric methods. Chapter 6 deals with parametric
techniques,buttoshowtheusefulnessofthese parametricmethodsrstof
all a trivial but original method is presented: the monotone multi-linear
mation with respect to the probability density functions of the dierent
populations is thrown away. Thereforein a fairly early stage of this work
ithasbeendecidedto stopdevelopingthissimplemethod andto fallback
insteadthe lessoriginal,butmore fundamental statistical decisiontheory,
by using so-called parametric techniques. In thisparametric class,
classi-ersbasedonthegeneralBayesiandecisiontheory(MaximumA-posteriori
Probability and Maximum Likelihood) and on a simplied version of it
(the Naive Bayesian classier, which has beenapplied in the case of
sim-ple Gaussians and in the case of a logistic regression model), have been
studied. Furthermore experiments have also been done usingLinear and
Quadraticclassiers. Neuralnetworksformaspecialcaseoftheparametric
family,since thenumberof parameters to be estimated can be very large.
Thereforeneuralnetworksaresometimesclassiedassemi-parametric
clas-siers. Stillwe will present neuralnetworks inthe chapter on parametric
techniques, by means of its most popularrepresentative: theMulti-Layer
perceptron. Chapter 7 deals with non-parametric techniques. This
chap-terstartsbypresentingavery simplefamilyofnon-parametrictechniques.
Thesevotingtechniquesaresometimesreferredtoask-out-of-nvoting
tech-niques,where k relates to thenumberof experts that have to decidethat
thepersonunder test isa client, beforetheglobal voting classieraccepts
thepersonundertestasaclient. Afterthevotingmethods,anothersimple
but very populartechnique,the k Nearest Neighbor(k-NN) technique, is
presented with a number of variants. These variants include a distance
weighted and a vector quantized version of the classical k-NN rule. This
chapterendsbypresentingthecategoryof(binary)decisiontrees,bymeans
of an implementation of the C4.5 algorithm, which is probably the most
popularmethodinits kind. Chapter 8dealswiththecomparisonbetween
thedierent parametric and non-parametric methodsthat have been
pre-sented in the second part of the thesis. Chapter 9 presents a multi-level
decisionfusionstrategythatallowsto graduallyimprove theperformances
of an automatic biometricidentity verication system, whilelimiting the
initialinvestments.
Chapter10nallyconcludesthisthesis,formulatessome recommendations
for developing automatic multi-modal biometric identity verication
1.5 Original contributions of this thesis
Theoriginalcontributionsof thisthesisare thefollowingones:
1. theformulation(intheframeworkofamulti-modalbiometricidentity
verication system) of the fusion of the partial(soft) decisions of d
experts in parallel as a particular classication problem in the
d-dimensionalspace [179 ];
2. thesystematicanddetailedstatisticalanalysisofthedierentexperts
thathave beenused;
3. thedevelopmentofasimpledecisionfusionmethod,basedona
mono-tone multi-linearclassier [179 , 180 ];
4. the analysis of the applicability, the characteristics and the
perfor-manceofthelogisticregressionmethodinaBayesianframework[177 ];
5. thedevelopment of a Vector Quantizationversion of theclassical
k-NearestNeighboralgorithm[173 ];
6. thesystematiccomparisonofalargenumberofparametricaswellas
non-parametric techniques to solve the particular classication
pro-blem[174];
7. the introduction of either the non-parametric Cochran's Q test for
binary responses, or the non-parametric Page test forordered
alter-natives, to measure the statistical signicance of the dierences in
performance of several (i.e. more than two) fusion modules at the
same time;
8. theformulationof amulti-levelfusionstrategywhichallowsto
grad-uallyimprove theperformances of anautomatic (biometric)identity
vericationsystem [176 , 178 ];
9. theformulationofthemixtureofexperts paradigmintheframework
of mono-modal multi-expert data fusion, appliedto a segmental
ap-proach to text-independent speakerverication[171];
10. the introduction of the use of multi-modal identity verication in
General issues related to
automatic biometric
multi-modal identity
Biometric verication
systems
2.1 Introduction
This chapter starts by deningtheideal theoretical and practical
require-ments for any biometric. This is followed by a section which presents a
tentative classication (according to [120]) of the most commonly found
biometricsinto two classes: theso-calledphysiologicaland behavioral
bio-metrics. In the following section the general structure of an automatic
mono-modal biometric verication system is presented, whilein the next
sectionsomegeneralargumentsforusingmulti-modalbiometricverication
systemsaredeveloped. The followingsectionpresentsthenthe main
char-acteristicsofidentityvericationsystems. Inthenalsection,anoverview
ofthestateoftheartofmulti-modalbiometricpersonvericationsystems
isgiven.
2.2 Requirements for biometrics
Automaticbiometricsystemshaveto identifyanindividualortoverifyhis
orheridentity
1
usingmeasurementsofthe(living)humanbody. According
to [88 , 89 ], in theory any human characteristic can be used to make an
identity verication, as long as it satises the following desirable (ideal)
requirements:
1
Asalready mentionedin chapter 1, wewill consider inthis workonlyverication
universality thismeansthatevery person shouldhave thecharacteristic;
uniqueness thisindicatesthatnotwopersonsshouldbethesameinterms
ofthecharacteristic;
permanence thismeansthatthecharacteristic doesnotvarywith time;
collectability thisindicatesthatthecharacteristiccanbemeasured
quan-titatively.
Inpractice, thereare some otherimportant requirements:
performance this species not onlythe achievable verication accuracy,
butalso theresource requirementsto achieve anacceptable
verica-tionaccuracy;
robustness this refers to the in uence of the working or environmental
factors (channel, noise, distortions, ::: ) that aect the verication
accuracy;
acceptability this indicates to what extent people are willing to accept
thebiometricvericationsystem;
circumvention thisrefers to how easy it is to fool the systemby
fraud-ulent techniques (make sure that the individual owns the data, and
thatheisnottransformingit; thiscould alsoincludeaso-called
live-linesstest).
As mentioned before, these requirements should be regarded as ideal. In
otherwords,thebetter a biometricsatisesthese requirements, thebetter
it will perform. In practice however, there is no single biometric which
fulllsallthese idealrequirementsperfectly. Thisobservationis oneofthe
main reasons why combining several biometric modalities in multi-modal
systemsis gainingeld.
2.3 Classication of biometrics
Arangeofmono-modalbiometricsystemsisindevelopmentoronthe
mar-ket,becausenoone biometricmeetsalltheneeds.Thetradeos in
develop-ingthesesystemsinvolve cost,reliability,discomfortinusingadevice,and
the amount of data needed. Fingerprints, for instance, have a long track
amount of datathat needsto bestored to describeangerprint(the
tem-plate) tended to be rather large. In contrast, the hardware for capturing
the voice is cheap (relying on low-cost microphones or on an already
ex-istingtelephone), butitvarieswhen emotionsand statesofhealthchange.
According to [120], biometricsencompasses bothphysiological and
behav-ioral characteristics. This is illustrated for a number of frequently used
biometricsinFigure 2.1.
Face
Fingerprint
Hand
Eye
Signature
Voice
Keystroke
Behavioral
Physiological
Automated biometrics
Figure 2.1: Classication of a number of biometrics in physiological and
behavioralcharacteristics.
A physiological characteristic is a relatively stable physical feature such
as a ngerprint [89 , 130, 153], hand geometry [190 ], palm-print [188 ],
in-frared facial and hand vein thermograms [141 ], iris pattern [184], retina
pattern[74 ],orfacialfeature[11 ,12 ,34 ,39 ,102 ,116,183 ,189 ]. Indeed,all
these characteristicsare basically unalterablewithout trauma to the
indi-vidual. Abehavioraltraitontheotherhand,hassome physiologicalbasis,
butalso re ects a person's psychological (emotional)condition. The most
commonbehavioraltraitused inautomatedbiometricvericationsystems
isthe humanvoice[3 , 10 ,20 , 22 , 31 ,35 , 36 , 52 ,60, 62 , 63 ,64 , 65 , 66 ,69,
72 ,73 , 76 , 81 , 80 ,105 , 111 , 112 , 131 , 132 , 133, 134, 151,154 , 160 ]. Other
behavioral traits are gait [126 ], keystroke dynamics [127], and (dynamic)
systems thatrelyon behavioralcharacteristicsshouldideally update their
enrolledreferencetemplate(s)onaregularbasis. Thiscouldbedoneeither
in an automatic manner, each time a reference is used successfully (i.e.
thesystemdecidesthat anaccess claimisan authenticclientclaim), orin
a supervisedmanner, by re-enrolling each client periodically. The former
method has theadvantage to be user-friendly,but has thedrawback that
one updatesthe client referenceswith atemplate from animpostor inthe
casethatthesystemcommitsaFalse Acceptance. Thelatterapproach has
theadvantageto updatetheclientreferencesalwayswithclienttemplates,
buthasthedrawbackthatitisnotveryuser-friendly,sincetheclientsneed
to doadditional trainingsessions.
The dierences betweenphysiological and behavioral methodsare
impor-tant. On one hand, the degree of intra-person variability is smaller in a
physiological thanin a behavioralcharacteristic. On theother hand,
ma-chinesthatmeasurephysiologicalcharacteristicstendtobelargerandmore
expensive,and mayseemmore threateningorinvasive to users(thisis for
instancethe caseforretinascanners). Becauseof these dierences,no one
biometricwillserveall needs.
2.4 General structure of a mono-modal biometric
system
Automated mono-modal biometric verication systems usually work
ac-cordingtothefollowingprinciples. Inatypical functionalsystemasensor,
adaptedtothespecicbiometric,generatesmeasurementdata. Fromthese
data,features thatmaybe usedforverication areextracted, usingimage
and/orsignalprocessingtechniques. Ingeneral, eachbiometrichasitsown
featureset. Patternmatchingtechniquescomparethefeaturescomingfrom
thepersonundertest withthose storedinthedatabaseunder theclaimed
identity, to providelikely matches. Last butnot least, decisiontheory
in-cludingstatistics providesa mechanism foransweringthequestion\Isthe
person undertest whoheorsheclaimstobe?" and forevaluating
biomet-rictechnology[77 , 78, 158]. Automaticmono-modalbiometricverication
systemsareusuallybuiltarrangingtwo mainmodulesinseries: (1)a
mod-ulewhichcomparesthemeasuredfeaturesfromthepersonundertestwith
areference client model andgivesa scalarnumber
2
asoutput,followed by
2
(2)adecisionmodule realizedbyathresholdingoperation. Thisthreshold
can bea functionoftheclaimed identity.
Thearchitectureofanautomaticmono-modalbiometricvericationsystem
isrepresentedingure 2.2.
model
selection
matching
feature
extraction
decision
decision
forming
identification key
score
biometric signal
Figure2.2: Typicalmono-modalbiometricvericationsystemarchitecture.
2.5 The need for multi-modal biometric systems
There can be several reasons why one would prefer multi-modal
biomet-ric verication systems over mono-modal ones. Generally, the criterion
to choose between mono- and multi-modalsystems will be system
perfor-mance. The end-user typically desires a guarantee that the classication
errors (FAR and FRR) will be limited by maximal values that will
de-pendon the application. Andalthoughthere existmono-modal biometric
verication techniques that do oer very small classication errors, the
mainproblem with thiscategory of biometrics is that they are either too
expensivetobeusedinageneralpurposecontext(forinstanceidentity
ver-ication inthecaseof credit cardpaymentsoverthe Internetusinga PC)
or perceived by the user as too invasive. So very often one is confronted
with the obligation of using inexpensive hardware and non-invasive
user-friendlybiometrics. Two ofthe mostpopularbiometricsthat canconform
to these constraintsare faces and voices. However, thedrawback of using
inexpensive hardware (cheap black and white CCD-cameras and low-cost
microphones)toobtaintherawdatameasurementsofthesebiometrics,has
asadirectconsequencethatthemeasurementsgenerallywillbecorrupted
metricsarethatthevisualmodalityisrathersensitivetolightingconditions
andthatthevocalmodalitytendstovarywithtime(sinceitisabehavioral
biometric). Thismakestheuseofamono-modalbiometricverication
sys-tem based solely either on the facial or on the vocal modality a very big
challenge,especiallysinceitisusuallynotpossibleto updatethedatabase
referencesof theauthorizedusersona regular basis.
One possible solution to cope with this problem is to use not one single
mono-modal biometric system, but to use several of them in parallel to
form a so-called multi-modal biometric verication system. It can be felt
intuitivelythatsuchastrategycan behelpful,ifoneconsidersc
omplemen-tarybiometrics. Thiscomplementaritycan beachieved withrespecttothe
dierent requirements as they were presented in section 2.2. A possible
exampleof complementary biometrics withrespect to thepermanence
re-quirementwouldbethecombineduseofaphysiological(face: more
invari-ant in time) and a behavioral (voice: less invariant) biometric. The main
and very general idea of using multi-modalbiometric verication systems
insteadofmono-modalonesisthustheabilitytousemore(complementary)
informationwith respectto theperson under test intheformer approach,
than in the latter approach. In chapter 9, a more detailed step-by-step
analysisofa multi-levelstrategyto graduallyimprovetheperformancesof
anautomated biometricsystemis presented.
A possibleand straightforward way of buildinga multi-modalverication
systemfrom d such mono-modalsystems is to inputthe d scoresprovided
in parallel into a fusion module, which combines the d scores and passes
thefused score on to the decisionforming module. This module then has
to take the decision accept or reject, based on a threshold. Just as inthe
case of the mono-modal system, this threshold can be a function of the
claimedidentity. However, two alternativesremain for thefusionmodule:
a global (i.e. the same for all persons) or a personal (i.e. tailored to the
specic characteristics of each authorized person) approach. For the sake
of simplicityand because the personalapproach needs more trainingdata
(sinceinthiscase thefusionmoduleneedsto beoptimizedforeachclient),
we haveopted inthiswork fora globalfusionmodule.
Figure 2.3 shows the typical architecture of a general multi-expert
veri-cationsystem,includingthepossibleuseofpersonalizedfusionordecision
physical
identification
key
appearance
scores
scores
fusion
fused
decision
forming
decision
expert i
expert j
Figure 2.3: Multi-expertarchitecture.
2.6 Characterization of a verication system
Inthiswork,we willconsiderthevericationof theidentityofaperson as
atypical two-class problem: eithertheperson istheone (inthiscasehe is
calleda client),orisnotthe one (inthatcase he iscalled animpostor) he
claimsto be. Thismeansthatwe aregoingtoworkwithabinaryfaccept,
rejectgdecisionscheme.
When dealing with binary hypothesis testing, it is trivial to understand
that the decision module can make two kinds of errors. Applied to this
problemof thevericationof theidentityof aperson,these two errorsare
called:
False Rejection (FR): i.e. when an actualclient isrejected as being
animpostor;
False Acceptance (FA): i.e. when an actualimpostor is acceptedas
being aclient.
Theperformancesofaspeakervericationsystemareusuallygiveninterms
of the global error ratescomputed duringtests: the False Rejection Rate
(FRR) and the False Acceptance Rate (FAR) [18 ]. These error rates are
denedasfollows:
FRR=
numberof FR
FAR=
numberof FA
numberofimpostor accesses
(2.2)
Aperfectidentityverication(FAR=0andFRR=0)isinpractice
unachiev-able. However, as shown by the study of binary hypothesis testing [167 ],
any of the two FAR, FRR can be reducedto an arbitrary small value by
changingthedecisionthreshold,withthedrawbackof increasingtheother
one. Auniquemeasure canbeobtainedbycombiningthesetwoerrorsinto
theTotal ErrorRate(TER) orits complimentary,theTotal SuccessRate
(TSR):
TER=
numberof FA +numberof FR
total numberofaccesses
(2.3)
TSR=1 TER (2.4)
However, care should be taken when using one of these two unique
mea-sures. Indeed, from the denition just given it follows directly that these
twouniquenumberscould beheavilybiasedbyoneoreithertypeoferrors
(FARorFRR),dependingsolelyonthenumberofaccesses thathavebeen
used in obtainingthese respective errors. Asa matter of fact, due to the
proportional weighting as speciedin the denition, the TER willalways
be closer to that type of error (FAR or FRR) which has been obtained
usingthelargest numberof accesses.
The overall performanceof an identityverication system ishowever
bet-tercharacterizedbyit'sso-calledReceiverOperating Characteristic(ROC),
which represents the FAR as a function of the FRR [167]. The
Detec-tionErrorTradeo(DET)curve isaconvenient non-lineartransformation
of the ROC curve, which has become the standard method for
compar-ingperformancesof speakerverication methodsusedintheannualNIST
evaluationcampaigns[142]. InaDET curve,thehorizontalaxisshowsthe
normaldeviateoftheFalseAlarmprobabilityin(%),whichisanon-linear
transformationofthehorizontalFalseAcceptanceaxisoftheclassicalROC
curve. TheverticalaxisoftheDETcurverepresentsnormaldeviateofthe
Missprobability(in %), which is a non-lineartransformation of theFalse
Rejection axis of the classicalROC curve. The use of the normal deviate
scalemovesthecurvesawayfromthelowerleftwhen performanceis high,
making comparisons between dierent systems easier. It can also be
portionoftheirrange. Furtherdetailsofthisnon-lineartransformationare
presented in [115 ]. Figures 2.4 and 2.5 give respectivelyan example of a
typical ROCand atypical DET curve.
0
5
10
15
20
25
30
35
40
45
50
0
5
10
15
20
25
30
35
40
45
50
false acceptance (%)
false rejection (%)
NIST EPFL MALES
GMM EPFL
EER
Figure 2.4: Typical exampleofa ROCcurve.
Each point on a ROC or a DET characteristic corresponds with a
parti-culardecision threshold. The Equal Error Rate (EER: i.e. when FAR =
FRR),isoften usedasthe onlyperformancemeasure of an identity
veri-cationmethod,althoughthismeasure givesjustone pointoftheROCand
comparingdierentsystemssolelybasedonthissinglenumbercanbevery
misleading[129].
Highsecurityaccessapplicationsareconcernedaboutbreak-ins andhence
operateatapointontheROCwithsmallFAR.Forensicapplicationsdesire
tocatchacriminalevenattheexpenseofexaminingalargenumberoffalse
accepts and hence operate at small FRR/high FAR. Civilianapplications
attempt to operate at the operating points with both low FRR and low
FAR.Theseconcepts areshown inFigure2.6, whichwas foundin [88 ].
Unfortunately in practice, as will be shown further in the study of the
fusionmodulespresentedinthisthesis,itisnotalwayspossibletoexplicitly
identify a continuous decisionthresholdin a certain fusionmodule, which
meansthatinthatcaseitwillafortiorinotbepossibletovarythedecision
Figure2.5: Typical exampleof a DETcurve.
False Acceptance Rate
False Rejection Rate
Equal Error Rate
High Security Access Applications
Forensic Applications
Figure2.6: Typical examples ofdierent operatingpointsfordierent
alsotheonlycorrectwayofdeterminingtheperformanceof anoperational
system, sinceinsuch systemsthedecisionthresholdhas beenxed.
AllvericationresultsinthisthesiswillbegivenintermsofFRR,FAR,and
TER.Foreacherrorthe95%levelcondenceintervalwillbegivenbetween
squarebrackets. The concept of condenceintervals refersto the inherent
uncertaintyintestresultsowingtosmallsamplesize. Theseintervalsarea
posterioriestimatesoftheuncertaintyintheresultsonthetestpopulation.
They do not include the uncertainties caused by errors (mislabeled data,
forexample)inthetestprocess. Thecondenceintervals donotrepresent
a prioriestimates ofperformanceindierentapplicationsorwithdierent
populations[182 ].
These condence levels will be calculated assuming that the probability
distributionforthenumberoferrorsisbinomial. Butsincethebinomiallaw
cannotbeeasilyanalyzedinananalyticalway,thecalculationofcondence
intervalscan notbe donedirectlyin ananalyticalway. Thereforewe have
usedthe Normal law asan approximation of the binomiallaw. This large
sampleapproach isalready statisticallyjustiedstartingfrom 30 samples.
Usingthisapproximation,the95%condenceintervalofan errorE based
onN tests, isdenedbythefollowinglower(givenbytheminussign)and
upper(given bythe plussign)bounds:
E1:96
r
E(1 E)
N :
Moredetailedinformationaboutthecalculationofcondenceintervalscan
be foundin[41 , 44,155 ].
2.7 State of the art
2.7.1 General overview
Some work on multi-modal biometric identityverication systems has
al-ready been reported in the literature. Hereafter, an overview is given of
themostimportantcontributions,withabriefdescriptionofthework
per-formed.
1. Asearlyas1993,ChibelushietAl. haveproposedin[40 ] tointegrate
acousticandvisualspeech(motion ofvisiblearticulators) forspeaker
recognition. The combination scheme used is a simple linear one.
2. In1995, Brunelliand Falavignahave proposed in[33 ] aperson
iden-tication system based on acoustic and visual features. The voice
modality is based on a text-independent vector quantization and it
uses two typesof information: staticand dynamic acoustic features.
The face modality implements a template matching technique on
three distinct areas of the face (eyes, nose, and mouth). They use
adatabasecontainingup tothreesessionsof87 persons. Onesession
wasusedfortraining,theothersfortesting,whichdidleadtoa total
numberof 155 tests. Themostperformingfusionmodule isa neural
network. The best results obtained on thisparticular database are:
FAR =0:5% andFRR =1:5%.
3. In1997, Dieckmann et Al. have proposedin [50 ] a decisionlevel
fu-sion scheme, based on a 2-out-of-3 majority voting. This approach
integrates two biometricmodalities (face and voice), which are
ana-lyzedbythree dierentexperts: (static) face, (dynamic) lipmotion,
and (dynamic) voice. The authors have tested their approach on a
specic database of 15 persons, where the best verication results
obtainedwere FAR=0:3% andFRR =0:2%.
4. In1997, Ducet Al. didproposein[55]a simpleaveraging technique
and compared it withthe Bayesian integration scheme presented by
Bigun et Al. in [13 ]. In this multi-modal system the authors use a
frontal face identication expert based on Elastic Graph Matching,
anda text-dependentspeechexpertbasedon person-dependent
Hid-denMarkovModels(HMMs) forisolateddigits. Allexperimentsare
performedontheM2VTSdatabase,andthebestresultsareobtained
fortheBayesianfusionmodule: FAR =0:54% and FRR=0:00%.
5. In1997,JourlinetAl. haveproposedin[93 ]anacoustic-labialspeaker
vericationmethod. Theirapproachusestwoclassiers. Oneisbased
on a liptracker using visualfeatures, and the other one is based on
atext-dependentperson-dependentHMMmodelingofisolateddigits
usingacoustic features. The fusedscoreis computedastheweighted
sumof thescores generated by thetwo experts. Allexperimentsare
performedontheM2VTSdatabase,andthebestresultsobtainedfor
theweightedfusionmodule are: FAR =0:5% and FRR=2:8%.
6. In 1998, Kittler et Al. have proposed in [98 ] a multi-modal person
prole expert is usinga chamfer matching algorithm, and the voice
expertisbasedontheuseoftext-dependentperson-dependentHMM
modelsfor isolateddigits. Allthese experts give theirsoft decisions
(scoresbetweenzero andone)to thefusionmodule. Allexperiments
are performed on the M2VTS database, and the best combination
resultsareobtainedfora simplesum rule: EER =0:7%.
7. In 1998, Hong and Jain have proposed in [82 ] a multi-modal
per-sonalidenticationsystem which integrates two dierent biometrics
(face and ngerprints) that complement each other. The face
ver-ication is done using the eigenfaces approach, and the ngerprint
expertisbasedon aso-calledelasticmatching algorithm. Thefusion
algorithm operates at the expert decision level, where it combines
thescoresfrom thedierentexperts(underthestatistically
indepen-dencehypothesis),bysimplymultiplyingthem. Thefaccept,rejectg
decision is then taken by comparing the fused score to a threshold.
The databases used in this work are the Michigan State University
ngerprint database containing 1500 images from 150 persons, and
a facedatabase coming from theOlivetti Research Lab, the
Univer-sity of Bern, and the MIT Media Lab, which contains 1132 images
from86persons. Theresultsobtainedforthefusionapproachonthis
databaseare: FAR =1:0% and FRR =1:8%.
8. In 1998, Ben-Yacoub did propose in [7 ] a multi-modal data fusion
approach for person authentication, based on Support Vector
Ma-chines (SVM). In his multi-modal system he uses the same experts
and the same database as Duc et Al. in the work presented above.
The best results which he obtained for the SVM fusion module are
FAR =0:07% andFRR =0:00%.
9. In 1999, Pigeon did propose in [135 ] a multi-modal person
authen-tication approach based on simple fusion algorithms. In this
multi-modal system the author uses a face identication expert based on
templatematching,aproleidenticationexpertbasedonachamfer
matching algorithm, and a text-dependent speech expert based on
person-dependent HMM modelsfor isolated digits. All experiments
areperformed onthe M2VTS database, and thebest resultsare
ob-tained for a fusion module based on a linear discriminant function:
recognitionsystemusingunconstrainedaudioandvideo. Thesystem
doesnotneedfullyfrontal faceimagesorclean speechasinput. The
face expert is based on the eigenfaces approach, and the audio
ex-pert uses a text-independent HMM using Gaussian Mixture Models
(GMMs). The combination of these two experts is performed using
a Bayes net. The system was tested on a specic database
contain-ing 26 persons and the best results obtained using the best images
and audio clips from an entire sessionare: FAR = 0:00% and FRR
=0:00%.
2.7.2 Results obtained on the M2VTS database
To facilitate the comparison with the work presented in this thesis, we
haveisolatedfromthepreviousstateofthearttheresultswhichhavebeen
obtained on the same M2VTS database asthe one we have been working
on. TheseresultsarepresentedinTable2.1hereafter. Whereavailable,the
condence interval is indicated between square brackets. Care should be
takenhoweverwhencomparingtheseresults,sincetheexpertsusedarenot
necessarilythesame forallmethods. The lastlineinthisTable represents
thebestresultsobtainedinthisthesis,usinga logistic regressionmodel.
Table2.1: StateoftheartofthevericationresultsobtainedontheM2VTS
database.
Author(s) Experts FRR (%) FAR (%)
DucetAl. frontal,vocal 0.00 0.54
JourlinetAl. lips,vocal 2.80 0.50
Kittler et Al. frontal,prole, vocal 0.70(EER) 0.70 (EER)
Ben-Yacoub frontal,vocal 0.00 0.07
Pigeon frontal,prole, vocal 0.78 0.07
Verlinde frontal,prole, vocal 0.00 0.00
2.8 Comments
lim-of all applications. We have seen that voice is one of the most popular
biometrics, thanks to its high acceptability and its user-friendliness [88 ].
Since voice is a behavioralbiometric modality and sincein a multi-modal
approachitiswiseto complementabehavioralmodalitywitha
physiolog-icalone,wewantedto add a physiological modalitywhich also washighly
acceptable. These considerations have led to choose the visual modality.
In the framework of the M2VTS application, another important criterion
forchoosing the dierent biometrics was the availabilityof the hardware.
Withrespectto thetele-services,theideawastouseso-calledmulti-media
PC's, which are equipped with low-cost microphones and CCD-camera.
Theseconsiderationsreinforceeachotherandtheyexplainwhyinthe
multi-modalsystempresentedinthiswork,voiceandvisionwereusedasthetwo
(complementary) biometric modalities. Analyzing the state of the art in
automaticbiometricmulti-modalidentityvericationsystems, ithasbeen
shown that on the M2VTS database, the best method presented in this
Experimental setup
3.1 Introduction
Thischapter starts bypresenting theM2VTS databaseused inthiswork.
Afterthis,theexperimentalprotocolusedfortestingtheindividualexperts
and thefusionmodulesis described. Finally,the three dierent biometric
experts (a frontal,a prole and a vocal one) we have beenusing
through-outthiswork arebrie y introducedand theirindividualperformancesare
highlighted. Thisisfollowedbyathoroughstatisticalanalysisoftheresults
givenbythesethree dierentexpertsforbothclientandimpostoraccesses.
Inthisanalysisitisshownthatthedistributionofthescoresperexpertand
per type of access(the so-calledconditional probabilitydensityfunctions)
donot satisfytheNormality hypothesis. Furthermore itisshownthatthe
chosen experts do have good discriminatory power, and are
complemen-tary. The potentialgain obtained by combining theresults of these three
dierentexperts areshown bymeans ofa simplelinearclassier.
3.2 The M2VTS audio-visual person database
The M2VTS [1] multi-modal database comprises 37 dierent persons and
provides5 shotsforeach person. Theseshotswere taken at intervals of at
leastoneweek. Duringeachshot,peoplewereasked (1)to countfrom \0"
to \9" in French (which was the native language for most of the people)
and (2) to rotate their head from 0 to -90 degrees, back to 0 and further
to +90 degrees, and nally back again to 0 degrees. The most diÆcult
shot to recognize is the 5
th
shot. Thisshot mainly diersfrom the others
presence of ahat/scarf, ::: ), voice variations orshot imperfections(poor
focus, dierent zoom factor, poor voice signal to noise ratio, ::: ). More
detailswithrespectto thisdatabasecan befoundin[136,137 , 135 ].
Takinginto account thespecicityof ourproblem(i.e. combining outputs
of several experts) we are not going to use this5
th
shot, since we arenot
interested in developing individual powerful experts that work well even
underthese extremeconditions aspresentedbyshotnumber5.
ToshowthequalityofthepicturescontainedinthesmallM2VTSdatabase,
Figures3.1, 3.2, and 3.3showrespectivelythefrontal viewsof some
per-sons,therotation sequence and the5 dierent shotsforone and the same
person [135].
3.3 Experimental protocol
3.3.1 General issues
Inthemostgeneral(butrich) case,three dierentdatasetsareneeded for
training, ne-tuningand testing theindividualexperts. Therst dataset
iscalled thetraining set and is usedbyeach expert to model thedierent
persons. The second data set is called the development or validation set
andisusedtone-tunethedierentexperts,forinstancebycalculatingthe
decisionthresholds. Thethirddatasetiscalledthetestsetanditisusedto
test the performances of theobtained experts. For thefusion module, we
candeneinthemostgeneralcaseexactlythesamedatasetsasinthecase
of the individualexperts. This general concept of the use of the dierent
datasets is illustratedin Figure 3.4. Thisdoes notnecessarily mean that
onealwayswillneedsixcompletelyseparateddatasets,sincethefactthat
the test set for the individual experts is completely dissociated from the
development of the experts, makes it suitable to be reused for the fusion
module. Furthermore, not all types of experts, nor all fusion modules do
includethemodelingofthepersons. Thismeansthatintheparticularcase
ofexperts andfusionmoduleswhichdo notusedatatomodelpersonsand
intheobviouscaseinwhichwedoreusetheexperttestsetasadatasetfor
thefusionmodule,one onlyneeds threedierentdatasets insteadofsix in
themostgeneralcase. ThisisillustratedinFigure3.5. Intheintermediate
case, where the experts do need separate trainingand development data,
butthefusionmoduledoesnotneedanydevelopmentdata,one needsfour
dierentdata sets,asillustrated inFigure3.6.
Figure3.2: M2VTS database: viewstaken from a rotationsequence.
Figure3.3: M2VTS database: frontalviewsofone personcomingfromthe
Training
Development
Testing
Testing
Development
Training
Fusion module
Expert
Dataset 1
Dataset 4
Dataset 5
Dataset 6
Dataset 3
Dataset 2
Figure 3.4: Themostgeneral casewhere 6dierent datasets areused.
Expert
Testing
Training
Dataset 1
Dataset 2
Dataset 2
Training
Dataset 3
Testing
Fusion module
Development
Testing
Training
Dataset 1
Expert
Dataset 2
Dataset 3
Dataset 3
Training
Dataset 4
Testing
Fusion module
Figure3.6: Theintermediatecasewherefourdierentdatasetsareneeded.
ifthetestdataisthesame asthetrainingdata,performanceswillbe
overestimated. This is true for both the individualexperts and the
fusionmodule. Thisis ofcourseduetothefactthat theexperts and
the fusion module will generate the best results for the same data
theyhave beentrainedon.
ifthetrainingdatafortheexperts isthesame asforthefusion
mod-ule,thefusionmodulewillbeunderperforming. Thereasonforthisis
thatthefusionmoduledoesn'tgetenoughinformation. Indeed,inthe
extremecaseofexpertsthatperformperfectlyontheirtrainingdata,
the outcome of such an expert would be either 0 or 1, which leaves
thefusion module with the arbitrary choice of setting the threshold
somewhere inbetween.
3.3.2 Experimental protocol
Forourexperiments,wehaveoptedforaverysimpleexperimentalprotocol.
Inthisprotocolwe useonlytherst foursessionsof theM2VTSdatabase
experts. This means that each access has been used to model the
respective client,yielding 37dierent clientmodels.
2. Thenthe accesses fromeach personinthe secondenrollment session
have beenusedto generatevalidationdataintwo dierentmanners.
Oncetoderiveone singleclientaccessbymatchingtheshotofa
spe-cic person with its own reference model, and once to generate 36
impostor access by matching it to the 36 models of the other
per-sons of the database. This simple strategy thus leads to 37 client
and 3637=1.332 impostor accesses, which have been used for
vali-datingtheperformanceof the individualexperts and forcalculating
thresholds.
3. Thethirdenrollmentsessionhasbeenusedtotesttheseexperts,using
thethresholdscalculated onthe validation dataset. Thissame data
sethasalsobeenusedtotrainthefusionmodules,whichagain leads
to 37 clientand 1.332 impostor reference points.
4. Finally,thefourthenrollmentsessionhasbeenusedtotestthefusion
modules,yieldingoncemorethesame numberofclient andimpostor
claims.
The drawback of this simple protocol, is that the impostors are known
at the expert and supervisor training time. In section 8.3.2, validation
results will be presented using a protocol that does not suer from the
same drawback. This validationprotocolis implementedusinga so-called
leave-one-outmethod[49].
3.4 Identity verication experts
3.4.1 Short presentation
Alltheexperimentsinthisthesishavebeenperformedusingthreedierent
identity verication experts. Each one of these experts will be described
brie yhereafter.
Prole image expert
The prole image verication expert is described in detail in [138] and
its description hereafter has been inspiredby the presentation of this
corresponding to the claimed identity. The candidate image prole is
ex-tractedfromtheproleimagesbymeansofcolor-basedsegmentation. The
similarityof thetwo prolesis measuredusingtheChamferdistance
com-putedsequentially[28 ]. TheeÆciencyofthevericationprocessisaidedby
pre-computing a distance map for each reference prole. The map stores
thedistance of each pixel in the proleimage to the nearest point on the
reference prole. As the candidate prole can be subject to translation,
rotation and scaling,theobjective of thematchingstage is to compensate
forsuch geometric transformations. The parameters of the compensating
transformationaredeterminedbyminimizingthechamferdistancebetween
the template and the transformed candidate prole. The optimization is
carried out using a simplex algorithm which requires only the distance
function evaluation and no derivatives. The convergence of the simplex
algorithmtoalocalminimumispreventedbyacarefulinitializationofthe
transformation parameters. The translationparameters are estimated by
comparing the position of the nose tip in the two matched proles. The
scale factor is derived from the comparison of the prole heights and the
rotationisinitiallysetto zero. Once theoptimalsetoftransformation
pa-rameters is determined,the user is accepted orrejected dependingon the
relationship of the minimal chamfer distance to a pre-speciedthreshold.
The system can be trained very easily. It is suÆcient to store one prole
perclientin thetrainingset.
Frontal image expert
Thefrontalimageverication expertisdescribed indetailin[116]andthe
descriptionhereafter was based on the presentation of this expert in [98 ].
This frontal image expert is based on robust correlation of a frontal face
imageofthepersonundertest andthestoredfacetemplatecorresponding
totheclaimedidentity. A searchfortheoptimumcorrelationisperformed
inthespace of all validgeometricand photometric transformations ofthe
inputimagetoobtainthebestpossiblematchwithrespecttothetemplate.
The geometric transformation includes translation, rotation and scaling,
whereasthephotometric transformationcorrects forachangeof themean
levelof illumination. The search technique fortheoptimaltransformation
parameters is based on random exponential distributions. Accordingly,
at each stage the transformation between the test and reference images
is perturbed by a random vector drawn from an exponential distribution
transformedfaceimageandthetemplate,andthesimilarityoftheintensity
distributionsofthetwoimages. Thedegreeofsimilarityismeasuredwitha
robustkernel. Thisensuresthatgrosserrorsdueto,forinstance,hairstyle
changes do notswamp thecumulative errorbetween thematchedimages.
Inotherwords,thematchingisbenevolent,aimingtondaslargeareasof
the face as possible, supporting a close agreement between the respective
gray-level histogramsof thetwo images. Thegross errors willbe re ected
inareducedoverlapbetweenthetwoimages,whichistakenintoaccountin
theoverall matchingcriterion. The systemistrainedvery easilybymeans
ofstoring one templateforeach client. Eachreference image issegmented
tocreateafacemaskwhichexcludesthebackgroundandthetorso asthese
arelikelyto changeovertime.
Vocal expert
The vocal identity verication expert is presented in detail in [22 ]. This
text-independent speakerverication expert is based on a similarity
mea-surebetween speakers,calculated onsecond order statistics[21].
In this algorithm a rst covariance matrix X is generated from a
refe-rencesequence,consistingofM m-dimensionalacousticalvectors,and
pro-nouncedbytheperson who'sidentityisclaimed:
X= 1 M M X i=1 X i X T i ; whereX T i is X i transposed.
AsecondcovariancematrixY isthengeneratedinthesamewayfroma
se-quence,consistingofM m-dimensionalacousticalvectors,andpronounced
bythe person undertest.
Thena similaritymeasure betweenthese twospeakersisperformed,based
onthe sphericitymeasure
AH ( X;Y) : AH (X;Y)=log A H ; A( 1 ; 2 ;:::; m )= 1 m m X i=1 i =m 1 tr YX 1 ; H( 1 ; 2 ;:::; m )=m m X 1 i ! 1 =m tr XY 1 1 :
It can be shown that this sphericity measure is always non-negative and
it is equal to zero only in the case that the two covariance matrices X
and Y are the same. The verication process consists then of comparing
theobtainedsphericitymeasure witha decisionthreshold,calculatedon a
validation database.
Oneofthegreatadvantagesof thisalgorithmisthatno explicitextraction
ofthemeigenvalues
i
isnecessary,sincethesphericitymeasureonlyneeds
thecalculation ofthetrace tr( ) ofthematrix productYX
1
orXY
1 .
3.4.2 Performances
The performances achieved by the three mono-modal identity verication
systemswhichhavebeenusedinthese experimentsaregiven inTable 3.1.
The resultshave been obtainedbyadjusting thethresholdat theEER on
the validation set and applying this thresholdas an a priorithreshold on
thetest set. Observingthe resultsfor theprole an thefrontal experts it
can be seen that, although the optimization has been done according to
theEERcriterion,theFRRandtheFARareverydierent. Thisindicates
that for these two experts, the training and validation sets are not very
representative of thetest set.
Table 3.1: Verication resultsforindividualexperts.
Expert FRR(%) FAR (%) TER (%)
(37tests) (1.332 tests) (1.369 tests)
Prole 21.6[11.4,37.2] 8.5[7.1,10.1] 8.9[7.5,10.5]
Frontal 21.6[11.4,37.2] 8.3 [6.9, 9.9] 8.7[7.3,10.3]
Vocal 5.4[ 1.5,17.7] 3.6 [2.7, 4.7] 3.7[2.8, 4.8]
3.4.3 Statistical analysis of the dierent experts
Introduction
Astatistical analysisoftheindividualexperts
1
isimportantto getanidea
onone handoftheirindividualdiscriminatorypower,and oftheir
comple-mentarityonthe otherhand.
1
Thepowerofan expert to discriminatebetween clientsand impostorswill
increase(forgivenvariances)withthedierencebetweenthemeanvalueof
thescores obtainedforclient accessesand themean value ofthescores
ob-tainedforimpostoraccesses. Thetypicalstatisticaltesttoseeifthereexist
signicant dierences between the means (or more generally between the
statisticalmomentofrstorder)ofseveralpopulationsistheso-called
ana-lysisofvariance(ANOVA).Inthegeneralcase,thisanalysisisimplemented
usinganF-test. Inthespeciccaseoftwopopulations,thisANOVAcould
also be performedusingan independentsamplest-test [123]. Another
im-portantcharacteristicofanexpertisitsvariance(ormoregenerallythe
sta-tistical moment of second order). The equality of variances can be tested
by a Levene test, which is also implemented using an F-test [114 ]. It is
advantageous thatthevarianceofan expert isthesame forclientsand for
impostors, because thisleadsto simplermethods to combine thedierent
experts (see chapter 6). Obviously we will needto performt- and F-tests
to analyze themeans and thevariances of the dierentexperts. However,
thet-and F-testsgive onlyexact resultsifthepopulationshavea Normal
distribution. So beforewe canuset-orF-tests, we needto verifythe
Nor-malityofthedierentpopulations. Thusthisistherststatisticalanalysis
thatweneedto perform. SincetheANOVA isonlyvalidifthevariancesof
thedierentpopulationsperexpertareequal, we have to checkthe
equal-ityofvariances before performingtheANOVA.These remarksexplainthe
forcedorder of therst three analysesthatare presentedbelow.
Wecangetanideaoftheindependenceofthedierentexperts(andthusof
theamount of extra informationthateach expert bringsin), byanalyzing
theircorrelation. Anda lineardiscriminantanalysisgivesusarst ideaof
thecombined discriminatorypoweroftheexperts.
Last butnotleast, the analysis ofthe extremevaluesgives usinsight into
thepossibleuse ofpersonalized approaches.
Analysis of Normality
ThepurposeofaNormalityanalysisisto checkwhether theobserveddata
do or do not support the hypothesis H
0
that the underlying probability
densityfunction is Normal. There exist two types of teststo perform this
analysis: objective (numerical) and subjective (graphical) tests. An
im-portant remark related to the vericationof H
0
is that theassumption of
NormalityismuchmorediÆculttoverifywhenusingsmallsamplesizes. In