UNIVERSITÉ MOHAMMED V – AGDAL
FACULTÉ DES SCIENCES
Rabat
Faculté des Sciences, 4 Avenue Ibn Battouta B.P. 1014 RP, Rabat – Maroc Tel +212 (0) 5 37 77 18 34/35/38, Fax : +212 (0) 5 37 77 42 61, http://www.fsr.ac.ma
N° d’ordre 2456
THÈSE DE DOCTORAT
Présentée par :
Mr. Mohamed EL AROUSSI
Discipline : Sciences de l’ingénieur
Spécialité : Informatique et Télécommunications
Information Fusion towards a Robust
Face Recognition System
Soutenue le 11 Juillet 2009
Devant le jury
Président :
Prof. Driss ABOUTAJDINE, PES (Faculté des Sciences de Rabat)
Examinateurs :
Prof. M. BAKRIM PES, PES (F.S.T. de Marrakech)
Prof. A. HAMMOUCH, PES (ENSET, Rabat)
Prof. R. MESSOUSSI, PES (FS, Kénitra)
Prof. L. MASMOUDI, PES (FS, Rabat)
Prof. A. TAMTAOUI, PES (FS, INPT)
Prof. M. RZIZA, PA (FS, Rabat)
R´
ESUM´
ELa s´ecurit´e des personnes, des biens ou des informations est l’une des pr´eoccupations majeures de nos soci´et´es actuelles. La reconnaissance faciale est une des solutions la plus commun´ement employ´ee pour effectuer une identification automatique des personnes. Toutefois, le processus automatique de reconnaissance des visages doit prendre en compte plusieurs facteurs qui contribuent `a la complexit´e de sa tˆache `a savoir l’occultation, les changements d’´eclairage, de pose, d’expression et des composantes structurelles (cheveux, barbe, lunette, etc.). Plusieurs techniques ont ´et´e propos´ees dans le pass´e afin de r´esoudre le probl`eme de la reconnaissance faciale. Chacune d’entre elles poss`ede ´evidemment ses forces et ses faiblesses qui, dans la majorit´e des cas, d´ependent des situations d’acquisition de l’information. Il est possible d’utiliser une combinaison de classifieurs bas´es sur des techniques vari´ees dans le but d’unir les forces de chacun et ainsi pallier `a leurs faiblesses. C’est dans cette optique que nous avons propos´e une m´ethode de reconnaissance faciale qui se base sur la fusion de l’information. De fait, nous savons que le Syst`eme Visuel Hu-main (SVH) traite l’information de fa¸con multi r´esolution. En cons´equence, une premi`ere contribution de cette recherche se focalise sur l’adoption d’une nouvelle approche multi r´esolution bas´ee sur la transform´ee ’Steerable pyramide’ (SP). Une deuxi`eme contribution de cette th`ese est de trouver et combiner diff´erentes primitives pertinentes, discriminantes et r´esistantes `a tous types de variations afin d’am´eliorer les performances de la reconnais-sance. Ainsi, nous avons propos´e de fusionner deux m´ethodes d’extraction de primitives : DCT (Discret Cosine Transform) global et LBP (Local Binary Pattern) local. Une autre
contribution, r´ealis´ee dans le cadre d’un projet R&D en collaboration avec Maroc Tele-com, est l’int´egration compl`ete d’un syst`eme d’identification de personnes, con¸cu pour op´erer dans un contexte de contrˆole d’acc`es.
A
BSTRACTAs a hot research topic since the eighties, face recognition still seems to be a difficult and largely problem. Distortions caused by variations in illumination, expression and pose are the main challenges to be dealt with by researchers in this field. Efficient recogni-tion algorithms, robust against such distorrecogni-tions, are the main motivarecogni-tions of this research.
Based on a detailed review on the background and wide applications of Gabor wavelet, this powerful and biologically driven mathematical tool is adopted as features extrac-tion method in literature. The features contain important local frequency informaextrac-tion and have been proven to be robust against commonly encountered distortions. However the use of Gabor filters dramatically increase the computational cost of the face recogni-tion method. As solurecogni-tion new transforms (Contourlet, Curvelet and Steerable Pyramid) have emerged despite their improved directional elements and other promising abilities compared to traditional wavelet transform. In this thesis we have introduced Steerable pyramid as a new feature extraction tool for face representation respectively recognition. Each image is described by a subset of bands (channels) filtered images containing steer-able pyramid coefficients which characterize the face textures.
To investigate the usefulness and effectiveness of this technique against other multi-resolution techniques such as: Discrete Wavelet Transform (DWT), Gabor Wavelets, Contourlet and Curvelet, we conceive two different frameworks Global-features based
scheme and Local-features based scheme. These methods have been evaluated on well-known databases like ORL, Yale and FERET Face databases. The Local-features based Steerable pyramid scheme has been shown to have the potential to supersede other multi-resolution approaches. Since the local observations that utilize multiple channels can appear somewhat redundant and may contain less information, the prudent combination of the best performing channels can be proven to be superior for classification. In the face recognition based multiple channels fusion, we have empirically chose the best S-P sub-bands for fusion stage.
We also proposed face recognition based on multiple features fusion that combine two different representations of the face image. The underlying algorithm utilizes the uniform LBP for local representation and holistic DCT for global information of the face image. We examine the impact of information fusion both at the feature level and at the score level.
Finally an automatic user identification system, consisting of detection, recognition and user management modules, has been developed. The system can effectively detect faces from real video streams, identify them and retrieve corresponding user information from the application database. Different detection and recognition algorithms can also be easily integrated into the framework.
v
Avant-propos
Le travail de cette th`ese a ´et´e r´ealis´e au sein du Laboratoire de Recherche en Informa-tique et T´el´ecommunications (LRIT) de la Facult´e des Sciences de Rabat - Universit´e Mohammed V Agdal, Maroc.
Je tiens en tout premier lieu `a exprimer ma profonde reconnaissance `a Monsieur Driss ABOUTAJDINE, professeur `a la Facult´e des Sciences de Rabat et responsable du LRIT, qui a dirig´e cette th`ese avec un int´erˆet constant et une grande comp´etence et qui de plus m’a fait l’honneur de pr´esider le Jury de cette th`ese. Je le remercie vivement pour sa disponibilit´e, son soutien, ses pr´ecieux conseils, et les encouragements qui m’ont permis de mener `a bien ce travail.
J’exprime ma profonde gratitude `a Monsieur RZIZA Mohammed, professeur assistant `a la Facult´e des Sciences de Rabat, pour son aide pr´ecieuse, les efforts qu’elle a prodigu´es pour l’accomplissement de ce travail, ainsi que pour la qualit´e de l’encadrement qu’elle m’a assur´ee lors des 5 ann´ees de doctorat, et aussi pour m’avoir fait l’honneur de par-ticiper au Jury de soutenance.
Je suis tr`es honor´ee par la pr´esence de Monsieur BAKRIM Mohamed, professeur `a la Facult´e des Sciences et Techniques, Marrakech, et Monsieur HAMMOUCH Mohammed, professeur `a l’´ecole Normale Sup´erieure de l’Enseignement Technique qui ont accept´e d’ˆetre les rapporteurs de cette th`ese. Qu’ils trouvent ici mes plus vifs remerciements pour la rapidit´e avec laquelle ils ont lu mon manuscrit et l’int´erˆet qu’ils ont port´e `a mon travail. Ils ont ´egalement contribu´e par leurs nombreuses remarques et suggestions `a am´eliorer la qualit´e de ce m´emoire, et je leur en suis tr`es reconnaissante.
Mes sinc`eres remerciements vont ´egalement `a Monsieur Lahssen MASMOUDI, professeur `
a la Facult´e des Sciences de Rabat, Monsieur Ahmed TAMTAOUI, professeur `a institut National des Postes et T´el´ecommunications de Rabat, Monsieur Rochdi MESSOUSSI, professeur `a la Facult´e des Sciences de Kenitra, et Monsieur Mohammed El HASSOUNI, professeur assistant `a la Facult´e des Lettres de Rabat, pour leur participation au jury de cette th`ese.
J’exprime ´egalement mes remerciements aussi bien au Centre National pour la Recherche Scientifique et Technique de m’avoir accord´e la bourse d’excellence.
Mes remerciements vont aussi `a tous mes coll`egues dans les laboratoires LRIT. En partic-ulier, un grand merci `a Mademoiselle Sanaa GHOUZALI, `a Monsieur Zytoune OUADOUDI, Youssef FAKHRI, Aouatif AMINE et `a Madame Sanaa El FKIHI, pour leurs aides et con-seils.
Je tiens `a remercier mes fr`eres Kacem et Mustapha pour leurs aides durant la r´edaction de cette these.
Finalement, je remercie chaleureusement mes parents, ma famille et mes tr`es chers amis pour leur soutien que pour leurs f´elicitations lors de mes r´eussites.
C
ONTENTS1 INTRODUCTION . . . 1
1.1 Application contexts . . . 1
1.2 Face Identification and Verification . . . 2
1.3 Motivation and Solutions . . . 3
1.4 Thesis Outline. . . 4
1.5 List of Publications . . . 5
2 FACE RECOGNITION: A STATE OF THE ART . . . 9
2.1 Introduction . . . 10
2.2 Face Recognition Methods . . . 11
2.2.1 Global Methods . . . 11
2.2.2 Local Methods . . . 16
2.3 Multi-resolution approaches . . . 19
2.3.1 Multi-resolution feature extraction schemes. . . 20
2.3.2 Gabor filters and wavelets . . . 20
2.3.3 Curvelet transform . . . 23
2.3.4 Contourlet transform . . . 26
2.3.5 Comparative works . . . 28
2.4 Performance Evaluation . . . 29
2.4.2 Verification System . . . 30
2.4.3 Distance Measure and Classification . . . 31
2.4.4 Databases . . . 32
2.5 Summary . . . 35
3 FACE RECOGNITION BASED MULTI-RESOLUTION ANALYSIS . 37 3.1 Introduction . . . 38
3.2 Steerable Pyramid . . . 39
3.2.1 Mathematical Background . . . 39
3.2.2 Face Feature extraction. . . 42
3.3 Global features based Multi-resolution approaches . . . 44
3.3.1 Proposed Global Scheme . . . 44
3.3.2 Experimental Results . . . 45
3.4 Local features based Multi-resolution approaches. . . 47
3.4.1 Proposed Local Scheme. . . 48
3.4.2 Experimental Results . . . 49
3.5 Chapter summary . . . 51
4 FACE RECOGNITION BASED INFORMATION FUSION . . . 53
4.1 Introduction . . . 54
4.2 Face Recognizers Combination . . . 54
4.2.1 Pre-Classification Fusion . . . 54
4.2.2 Post-Classification Fusion . . . 55
4.2.3 Information Normalization . . . 57
4.2.4 Related Work . . . 58
4.3 Face Recognition based Multiple Features Fusion . . . 60
4.3.1 Local and Global Features Extraction . . . 60
4.3.2 Combined Face Recognition system . . . 63
4.3.3 Experiments . . . 66
4.4 Face Recognition based Multiple Channels Fusion . . . 71
4.4.1 S-P sub-bands Selection . . . 72
4.4.2 Best performing sub-bands fusion . . . 73
ix
4.5 Conclusion . . . 77
5 THE DEVELOPED USER IDENTIFICATION SYSTEM . . . 79
5.1 Introduction . . . 80 5.2 System Architecture . . . 80 5.2.1 Registration . . . 80 5.2.2 Identification . . . 81 5.3 System Modules. . . 82 5.3.1 Face detection . . . 82 5.3.2 Recognition . . . 84 5.3.3 User Management . . . 85
5.4 R&D Demo Settings & Description . . . 86
5.5 Conclusions . . . 88
6 CONCLUSIONS AND FUTURE WORKS . . . 91
6.1 Summary of Works . . . 92
6.1.1 An Overview of M-R transform: Background and Applications . . 92
6.1.2 Global and Local features based Multi-resolution approaches . . . . 92
6.1.3 Information Fusion . . . 93
6.1.4 User Identification System . . . 93
6.2 Future Works . . . 94
6.2.1 SP Feature Selection with Larger Search Space . . . 94
6.2.2 Pose Invariant Face Recognition . . . 94
6.2.3 Face Recognition in Wireless Communication System . . . 95
A Appendix A: Basic Theory of Support Vector Machine . . . 97
B Appendix B: Symmetry Transform . . . 101
B.1 The Generalized Symmetry Transform . . . 101
B.2 The Radial Symmetry Measure . . . 102
C Appendix C: Local Binary Pattern LBP . . . 105
C.1 Local Binary Patterns . . . 105
D Appendix D: Discrete Cosine Transform DCT . . . 109
D.1 DCT Theoretical Background . . . 109
D.2 DCT properties . . . 111
L
IST OFF
IGURES2.1 Principle of an identification process with face recognition. . . 10
2.2 Different bases of linear projections: LDA, PCA + LDA and PCA bases are shown on the first, second and third row respectively [ZKC+98]. . . . . 12
2.3 The diagram for a RBF based face recognition system [EWLT02]. . . 14
2.4 Binary SVM tree [GLC01]. . . 15
2.5 Geometric features used for face recognition [BP93]. . . 16
2.6 Face images represented by graphs [LVB+93]. . . . . 18
2.7 2D embedded HMM structure [NH99]. . . 19
2.8 Example 2D Gabor wavelets in the spatial and the frequency domain (a) f = 0.4, θ = 0, γ = 4, ν = 2, (b)f = 0.2, θ = π/4, γ = 2, ν = 2. . . 21
2.9 Edge representation by Wavelet and Curvelet Transform [CDDY06]. . . 23
2.10 Curvelets in Fourier frequency (left) and spatial domain (right) [CDDY06] 24 2.11 Example of Curvelet coefficients . . . 26
2.12 Contourlet tiling in the frequency domain . . . 27
2.13 Example of Contourlet coefficients . . . 27
2.14 Faces from the ORL Face Database . . . 34
2.15 Faces from the YALE Face Database . . . 34
2.16 Faces from the essex grimace . . . 35
3.1 Block diagram of the steerable pyramid transform for k = 2. The decom-position/reconstruction filter bank is iterated at node 4. The down- and
up-arrows indicate binary down- and up-sampling,respectively. . . 39
3.2 The transfer functions of the pre- and post processing filters used in Fig-ure 3.1.. . . 41
3.3 The transfer functions of the iterated filters used in Figure 3.1. . . 42
3.4 Tree representation of one-level 2D steerable pyramid transform. . . 43
3.5 Three-stage & four orientation steerable pyramid transform. . . 43
3.6 Recognition process block diagram . . . 45
3.7 Diagram of the Block-based M-R features extraction process . . . 49
3.8 faces from ORL, YALE and FERET Databases . . . 50
4.1 Example of DCT feature extraction (a) original image (b) DCT matrix (c) image reconstructed using only 4% of the low-frequency DCT coefficients . 61 4.2 Illustration of the procedure of LBP extraction. . . 62
4.3 Illustration of the combined classifier at the feature level. . . 64
4.4 Illustration of the combined classifier at the score level. . . 65
4.5 Impact of varying the number of DCT coefficients on recognition accuracy. 66 4.6 Recognition accuracy of the combined classifier at the feature-level. . . 67
4.7 Impact of the value of α on the performance of the combined classifier. . . 69
4.8 Performances comparison of different fusion rules. . . 70
4.9 Multi-resolution comparison on ORL (upper), YALE (middle) and FERET (bottom). . . 76
5.1 Registration flow chart. . . 81
5.2 Identification flow chart. . . 82
5.3 A sample image with detected face. . . 83
5.4 The system output at different stages. (a) the input image; (b) the radial symmetry map; (c) the filtered symmetry map; (d) the thresholded binary symmetry map .. . . 83
5.5 A sample face image and the located eye center. . . 84
5.6 Recognition module diagram. . . 85
LIST OF FIGURES xiii
5.8 Materiel architecture. . . 87
5.9 A snapshot of the user registration. . . 87
5.10 LRIT-GSCM database. . . 88
5.11 A snapshot of the user identification system. . . 89
B.1 The contribution of points Pi and Pj to the symmetry measure [RWY95]. . 101
C.1 The original LBP operator. [Jul07] . . . 105
C.2 Circularly neighbor-sets for three different values of P and R. [Jul07] . . . 106
C.3 Different texture primitives detected by the LBPu2 8,2. [Jul07]. . . 106
D.1 (a) Two dimensional DCT basis functions (N = 8). (b) Zigzag Scanning . (c) Frequency components . . . 110
A
BBREVIATIONAAM Active Appearance Model ATM Automatic Tailer Machine
BLDA Boosted Linear Discriminant Analysis CIS Class Individual Score
CMU Carnegie Mellon University DCT Discrete Cosine Transform
DFB Directional Filter Bank structure. DFT Discrete Fourier Transform
DLA Dynamic Link Architecture DWT Discrete Cosine Transform EBGM Elastic Bunch Graph Matching
EFM Enhanced Fisher Linear Discriminant Model EGM Elastic Graph Matching
FAR False Acceptance Rate
FERET The Facial Recognition Technology FDCT Fast Discrete Curvelet Transform FFT Fast Fourier Transform
FRGC Face Recognition Grand Challenge FRR False Rejection Rate
ICA Independent Component Analysis kAM kernel Associative Memory
KPCA Kernel Principle Component Analysis HMM Hidden Markov Models
GDA Generalized Discriminant Analysis GSF Graph Similarity Function.
LBP Local Binary Pattern
LBPS Local Binary Pattern Sequence LDA Linear Discriminant Analysis
MIT Massachusetts Institute of Technology NFL Nearest Feature Line.
ORL Olivetti Research Laboratory PCA Principle Component Analysis PIN Personal Identification Number RBF Radial Basis Function
ROC Receiver Operating Curve SP Steerable Pyramid
SSS Small Sample Size
SVM Support Vector Machines
UMIST University of Manchester Institute of Science and Technology USFFT Unequally Spaced Fast Fourier Transform
Chapter
1
INTRODUCTION
1.1
Application contexts
With the advent of electronic banking, e-commerce, smartcards and an increased em-phasis on the privacy and security of information stored in various databases, automatic personal identification has become a very important topic. Accurate automatic personal identification is now needed in a wide range of civilian applications involving the use of passports, cellular phones, automatic teller machines and driver licenses. Traditional knowledge-based (password or Personal Identification Number (PIN)) and token-based (passport, driver license, and ID card) identification are prone to fraud because PINs may be forgotten or guessed by an impostor and the tokens may be lost or stolen. Therefore, traditional knowledge-based and token-based only approaches are unable to satisfy the security requirements of our electronically interconnected information society. To over-come a such weakness biometric solutions provide a mean to achieve a more secure and accurate authentication.
A perfect identity authentication system will need a biometric component. Biometric is a representation of a unique part or characteristic of an individual which has the po-tential capability to distinguish between an authorized person and an impostor. Since biometric characteristics are distinctive, cannot be forgotten or lost, and the person to be authenticated needs to be physically present at the point of authentication,
biomet-rics are inherently more reliable and more capable than traditional knowledge- based and token-based techniques. Currently there are many biometric technologies used for per-sonal authentication: face, fingerprint, hand geometry, iris, retina, signature, voice, etc. Despite the fact that other methods of authentication (such as fingerprint, or iris scans) can be more accurate, face recognition has always remained a major focus of research because of its non-invasive nature and because it is human’s primary method of identifi-cation. The technology of face recognition can be widely applied in security surveillance, authentication, access control and human computer interfaces. Since the late eighties there has been an explosive growth in research on face recognition because of the prac-tical importance of the topic and theoreprac-tical interest from both cognitive scientists and computer vision and pattern recognition researchers.
1.2
Face Identification and Verification
A face recognition system can be operated in two modes: Verification mode and Iden-tification mode. In the verification mode, the system either accepts or rejects a user’s claimed identity while during identification and without any claim of user identity, the system establishes whether a user biometric record exists. Face identification is a more complex problem costly and time consuming process than face verification, because of performing matches against a huge database of biometric templates. There are a number of potential civilian applications for a system working in verification mode. For example, an ATM (Automated Teller Machine) system which verified a user’s face with a biometric would perform, upon each transaction, only one match of the current face image (acquired at point of transaction) against a single template stored on the ATM card. A typical face verification system can be divided into two modules: Enrolment and Verification. The Enrolment module scans the face of a person through a sensing device and then stores a representation (template) of the face in the database. The Verification module is in-voked during the operation phase. The same representation used in enrolment phase is extracted from the input face and matched against the template of the claimed identity to give a ”yes/no” answer. On the other hand, an identification system matches the input face with a large number of faces in the database and as a result, algorithm efficiency is a critical issue in an identification system.
1.3. MOTIVATION AND SOLUTIONS 3
1.3
Motivation and Solutions
As a hot research topic since the eighties, a large number of face recognition algorithms have been proposed in the literature. Though quite a tough task for a computer, face recognition seems to be much easier for human beings. The ability to recognize faces and understand the emotions they convey is one of the most important human abilities. It is very common that one can instantly recognize thousands of people. As with many per-ceptual abilities, the ease with which humans can recognize faces disguises the complexity of the task even when considering the many potential variations in such a dynamic real world object.
The biggest issue of this research is to develop an accurate frontal view face recognition al-gorithm, which should be robust against variations in illumination, expression and limited out of plane face rotation. An important outcome of research on artificial vision systems has shown that more than half of the brain cortex becomes more active during visual pro-cessing [HGY+99]. The visual cortex thus plays a very important role in face recognition. Simple cells in the visual cortex are known to be selective for four coordinates, each cell having an x, y location in visual space, a preferred orientation and a preferred spatial fre-quency [Dau85]. Based on this observation, a number of researches have actually shown that the various 2D receptive-field profiles encountered in populations of simple cells are well described by a family of 2D Gabor wavelets, which were first proposed by Gabor (1946) for simultaneous time and frequency analysis. In addition to this biological moti-vation, it is also widely believed that local texture features in face images, extracted by a spatial-frequency wavelet analysis, are basically more robust against distortions caused by various illumination, expression and pose [ZCPR03]. In particular, among various wavelet bases with good characteristics of space-frequency localization, the Gabor function pro-vides the optimal resolution in both spatial and frequency domain [Gab46, Dau85]. As a result, this research will focuss on a new multi-resolution approach based on Steerable Pyramid transform to extract features for face recognition. This multi-resolution analysis allows us to extract the most appropriate features in different scales and angles (sub-bands). Although at first sight, these sub-bands can appear somewhat redundant and may contain less information, their judicious combination can prove often to be superior for classification. Another key challenge for improving face recognition performance is
finding and combining efficient and discriminative information about face patterns that are resistant to all kind of variations. Many studies in the literature have shown that both global and local features are crucial for face perception [ZCPR03]. Global features describe the characteristics of the whole face and they are often used as coarse represen-tation. Whereas local features reflect and capture more detailed variations within some local features for finer representation. In this thesis we will propose a face recognition system by fusing two features extraction methods: global DCT(Discrete Cosine Trans-form) and local LBP(Local binary Pattern).
This thesis has also as goal the developing of an automatic face recognition system sub-scribed in R&D project with historical telecommunication company in Morocco (IAM). Indeed, this project aim to develop an access control system based on face recognition which is efficient and applicable to real time applications. At the same time, the system has to take computation and memory cost into consideration. When the recognition al-gorithm has been comprehensively tested against a number of different databases and its performance maximized, it will be implemented as a component of a fully automatic face recognition system, complete with face detection module.
1.4
Thesis Outline
The thesis is organized as follows:FACE RECOGNITION: LITERATURE REVIEW
Chapter two reviews the state of the art face recognition algorithms in the literature and shows performance evaluation technics and most used databases. Particularly, the major concern of the thesis, i.e., Multi-resolution based methods are explored in detail.
FACE RECOGNITION based on MULTI-RESOLUTION ANALYSIS
Chapter three introduces multi-resolution technics for face recognition, looking particu-larly at the problem of facial feature extraction. A novel Steerable Pyramids (SP) feature extraction method for face recognition is described using two representation schemes: global and local.
1.5. LIST OF PUBLICATIONS 5
FACE RECOGNITION based on INFORMATION FUSION
Chapter four gives an overview of face recognizers combination. To obtain further improvement, we have investigated two Face Recognition based on Information Fusion approaches. In the first approach, we proposed to fuse LBP and DCT as local and global features to obtain rich facial information for identification. The second approach aim to optimize the local SP features by fusing the best performing sub-bands.
THE DEVELOPED USER IDENTIFICATION SYSTEM
In Chapter five, an automatic User Identification System developed in the research is represented. Both system design and function modules are explained.
Finally, Chapter six gives the author’s conclusions on Face Recognition and discusses further works that could be done to improve and extend the capabilities of the imple-mented algorithms.
1.5
List of Publications
The research contributions of this PhD Thesis are as follows:
International Journals:
1. M. El Aroussi, S. Ghouzali, M. El Hassouni, M. Rziza, D. Aboutajdine, ”Local Steerable Pyramid Binary Pattern Sequence LSPBPS for face recognition method.”, International Journal on Signal Processing (IJSP), 5(4): 281-284, 2009.
2. M. El Aroussi, S. Ghouzali, M. El Hassouni, M. Rziza, D. Aboutajdine, ”Multi-Sage Fusion of Local and Global Features Based Classification for Face Recogni-tion.”, International Journal on Tomography and Statistics (IJTS) - special issue Special on Image Processing, 14(S10), 95-108, 2010.
3. M. El Aroussi, S. Ghouzali, M. El Hassouni, M. Rziza, D. Aboutajdine, Local Steerable Pyramid Binary Pattern Sequence LSPBPS: A Novel illumination robust face recognition method”, IJCS , Accepted.
4. M. El Aroussi, S. Ghouzali, M. El Hassouni, M. Rziza, D.A boutajdine, Steerable Pyramids Feature Based Classification Using Fisher Linear Discriminant for Face Recognition”, Infocomp Journal of Computer Science, 2009, Accepted.
5. M. El Aroussi, S. Ghouzali, M. El Hassouni, M. Rziza, D. Aboutajdine, Face Recognition using enhanced Fisher Linear Discriminant”, Submitted to IAJIT
International Conferences:
1. M. El Aroussi, S. Ghouzali, M. El Hassouni, M. Rziza, D.Aboutajdine. ”Novel face recognition approach based Steerable Pyramid Feature Extraction”. ICIP’09, 7-11 November 2009 Cairo, Egypt. Accepted.
2. M. El Aroussi, S. Ghouzali, M. El Hassouni, M. Rziza, D.Aboutajdine. ” Block based Curvelet Feature Extraction for Face Recognition”. ICMC’09, 1-5 Avril 2009, Ouarzazate, Maroc.
3. M. El Aroussi, S. Ghouzali, M. El Hassouni, M. Rziza, D. Aboutajdine. ”Curvelet-Based Feature Extraction with B-LDA for Face Recognition”. AICCSA’09, Mai 2009, Rabat, Maroc, pp 444-448.
4. M. El Aroussi, A. Amine, S. Ghouzali, M. El Hassouni, M. Rziza, D. Aboutaj-dine. ”Combining DCT and LBP Feature Sets For Efficient Face Recognition. 3rd International Conference on Information and Communication Technologies: From Theory to Applications, 2008. ICTTA
5. S. Ghouzali, M. El Aroussi, A. Amine, M. El Hassouni, M. Rziza, D. Aboutajdine. ”Combined face recognition classifiers based DCT and LBP feature sets. 9th African Conference on Research in Computer Science CARI’08, 153-160, 2008.
6. M. El Aroussi, S. Ghouzali, M. El Hassouni, M. Rziza, D.Aboutajdine, ”Local Steerable Pyramid Binary Pattern Sequence LSPBPS for face recognition method” Accepted in international conference in signal processing, Oslo, Norway during July 29-31, 2009
1.5. LIST OF PUBLICATIONS 7
Other contributions
International Journals:
1. R. Saadane, M. El Aroussi, A. Hayar, D. Aboutajdine, ”UWB Propagation Chan-nel Modeling in Indoor Based on Large and Small Scales Analysis”. 2009, IJCS , Accepted
2. Z. Ouadoudi , M. EL Aroussi, M. Rziza , D. Aboutajdine , ” Stochastic Low Energy Adaptive Clustering Hierarchy”, ICGST International Journal on Computer Network and Internet Research, CNIR, VOL 08, Issue (I), 2008, pp: 47-51.
3. Z. Ouadoudi , M. EL Aroussi, D. Aboutajdine , ”Web based Home Automation System. IJCSNS International Journal of Computer Science and Network Security, VOL.6 No3, August 2007
Conferences:
1. Rachid Saadane, Mohammed Wahbi, Aawatif Hayar, Moulay Ahmed Faqihi, Mo-hamed El Aroussi, Driss Aboutajdine. ”On the Analysis of Propagation Channel based on Sub Band Approach.” AICCSA’09, Mai 2009, Rabat, Maroc.
2. Z. Ouadoudi, M. El Aroussi, M. Rziza, D. Aboutajdine ”Stochastic Low Energy Adaptive Clustering Hierarchy” IEEE International Conference on Sensor Networks, Ubiquitous, and Trustworthy Computing (SUTC2008) Accepted
3. M. El Aroussi , A. Berrazzouk, D. Aboutajdine, ”r´ealisation d’un syst`eme de gestion et de suivi en temps r´eel pour un chantier `a engins mobiles”, JAI’2005, Oujda, Maroc, 25-26, May, 2005
4. Z. Ouadoudi, M. El Aroussi, ” Conception et r´ealisation d’un syst`eme de super-vision des postes ´electriques”, wotic’07, Rabat, 24-25 juin 2007
1. ”Reconnaissance faciale pour l’acc`es s´ecuris´e: Manuel d’utilisation”, MAROC TELE-COM ’R&D’, Septembre 2008.
2. ”Reconnaissance faciale pour l’acc`es s´ecuris´e: Phase III”, MAROC TELECOM ’R&D’, Mars 2008.
3. ”Reconnaissance faciale pour l’acc`es s´ecuris´e: Rapport de Synth`ese”, MAROC TELECOM ’R&D’, F´evrier 2008.
4. ”Reconnaissance faciale pour l’acc`es s´ecuris´e: Phase II”, MAROC TELECOM ’R&D’, Juillet 2007.
5. ”Reconnaissance faciale pour l’acc`es s´ecuris´e: Phase I”, MAROC TELECOM ’R&D’, Mars 2006.
Chapter
2
FACE RECOGNITION: A STATE OF THE ART
Contents
2.1 Introduction . . . 10
2.2 Face Recognition Methods . . . 11
2.2.1 Global Methods . . . 11
2.2.2 Local Methods . . . 16
2.3 Multi-resolution approaches . . . 19
2.3.1 Multi-resolution feature extraction schemes . . . 20
2.3.2 Gabor filters and wavelets . . . 20
2.3.3 Curvelet transform . . . 23 2.3.4 Contourlet transform. . . 26 2.3.5 Comparative works . . . 28 2.4 Performance Evaluation . . . 29 2.4.1 Identification System . . . 29 2.4.2 Verification System. . . 30
2.4.3 Distance Measure and Classification . . . 31
2.4.4 Databases . . . 32
2.1
Introduction
Face recognition debts its popularity to the potential application areas, ranging from human computer interaction to authentication and surveillance. Face recognition problem has become one of the most relevant research areas in pattern recognition. The process of person identification by using face recognition can be divided into three main phases, see figure 2.1. These are registration and normalization, feature extraction and then classification. In the registration and normalization phase: the image is transformed (scaled and rotated) till it has the same ’position’ as the images from the database (this means that the eyes are at the same positions). In this part also problem factors like illumination differences are reduced. In the feature extraction phase: the most useful and unique features (properties) of the face image are extracted. With these features obtained, the face image can be compared with the images from the database: which usually is done in the classification phase.
The output of the classification part is the identity of a face image from the database with the highest matching score, thus with the smallest differences compared to the input face image. Also a threshold value can be used to determine if the differences are small enough. After all, it could be that a certain face is not in the database at all.
Figure 2.1: Principle of an identification process with face recognition.
In most cases, an image is represented in the spatial domain by pixels, but there are al-ternate representations, the most popular being the frequency domain representation ob-tained by the Fourier Transform. However, the Fourier Transform of an image is not very informative from the perspective of object recognition. Other transforms like Wavelets, Curvelets, Contourlets etc. provide alternative image representations, other than pixels or frequency. These transforms depict images in such a way that recognition is facilitated. In this chapter we present a survey of the literature on face recognition from still intensity
2.2. FACE RECOGNITION METHODS 11
images. An exhaustive review is out of the scope of this dissertation due to the large body of existing work. We will focus on those approaches based on multi-resolution transforms that we see as the most significant ones. Literature surveys on face recognition approaches can be found in [CWS95].
2.2
Face Recognition Methods
Various approaches for face recognition have been proposed in literature, which can be split into two parts: global approaches and local approaches. While the later approaches consider local features or components detected from the face, global approaches make use of the information derived from the whole face pattern. Local approaches are generally more robust to variabilities in the face appearance such as rotation of the head, occlusion and gross variations due for instance to the presence or absence of facial hair, they also generally require significantly more computation than the global ones. By combining both local and global features, hybrid methods attempt to produce a more complete representation of facial images.
2.2.1
Global Methods
Global face recognition approaches, which work well for frontal view face images, have attracted more attention but they are sensitive to translation, rotation and pose changes. Usually normalization is an important and inevitable process for these methods. A small number of prominent points in the face such as eyes, nostrils or center of the mouth are required to resize and rotate the input face image. After normalization, the input face image can be aligned with the model face and recognition can be performed thereafter. In this section we review those approaches to face recognition that consider the face pat-tern as a whole. We first present approaches based on subspace analysis: the popular Eigenfaces (Principal Component Analysis, PCA) [TP91], Fisherfaces (Linear Discrimi-nant Analysis, LDA) [BHK97] and blind source separation technique, called Independent Component Analysis (Independent Component Analysis, ICA) [BMS02]. Then we will briefly present the Boosted LDA technique which has been used to solve the ”small sam-ple size” problem. The most useful model-based approaches using neural networks and support vector machines (SVM) will be briefly discussed in this section.
Subspace Analysis approaches
Based on PCA, Kirby and Sirovich [KS90] first developed the well known Eigenface method for both face representation and recognition. In this method, the whole face pattern is transformed to a feature vector and a set of training samples are used to com-pute Eigenfaces [TP91]. PCA can achieve the optimal representation in the sense of maximizing the overall data variance. However, the difference between faces from the same person due to illumination and pose (within-class scatter) seems to be larger than that due to facial identity (between-class scatter). Based on this observation, LDA is applied for Fisher face methods [BHK97].
LDA defines a projection that makes the within-class scatter small and the between-class scatter large. This projection has shown to be able to improve classification performance over PCA. However, it requires a large training sample set for good generalization, which is usually not available for face recognition applications. To address such Small Sample Size (SSS) problems, Zhao et al [ZKC+98] perform PCA to reduce feature dimension before LDA projection, see Figure 2.2 for the different bases of LDA, PCA + LDA, and PCA projection.
By using higher order statistical analysis, ICA was first adopted by Bartlett et al [BMS02] for face recognition, the work showed that ICA outperformed PCA. However, other re-searchers [DBBB03] observed that when the right distance metric is used, PCA signifi-cantly outperforms ICA on the FERET database. The main disadvantage of the PCA, LDA, Fisherfaces is their linearity.
Figure 2.2: Different bases of linear projections: LDA, PCA + LDA and PCA bases are shown on the first, second and third row respectively [ZKC+98].
2.2. FACE RECOGNITION METHODS 13
Recently, kernel methods have been successfully applied to solve pattern recognition prob-lems because of their capacity to handle nonlinear data. By mapping sample data to a higher dimensional feature space, effectively a nonlinear problem defined in the original im-age space is turned into a linear problem in the feature space [SMB+99]. PCA or LDA can subsequently be performed in the feature space and are thus called Kernel Principal Com-ponent Analysis (KPCA) and Generalized Discriminant Analysis (GDA) [BA00]. Experi-ments show that KPCA and GDA are able to extract nonlinear features and thus provide better recognition rates in applications such as face recognition [KJJ02,YFY04,SB4b].
Boosted LDA
LDA searches the directions for maximum discrimination of classes in addition to dimen-sionality reduction. Given training data, discriminant vectors are computed using LDA. The function of the discriminant vectors is two-fold. First, discriminant vectors are used as a transform matrix, and LDA features are extracted by projecting data onto discrim-inant vectors. Second, discrimdiscrim-inant vectors are used to select discrimdiscrim-inant features, the number of which is much less than that of the whole data. LDA finds a transform WLDA
, such that WLDA = Argmax( CTS BC CTS WC ) (2.1)
where C is the covariance matrix, SB the between-class scatter matrix and SW the
within-class scatter matrix, defined as
SB = c X i=1 Ni(xi− µ)(xi− µ)T (2.2) SW = c X i=1 X xk (xk− µi)(xk− µi)T (2.3)
In the above expression, Ni is the number of training samples in class i, c is the number
of distinct classes, µi is the mean vector of samples belonging to class i and Xi represents
the set of samples belonging to class i.
Of the appearance-based face recognition methods, LDA has shown promising results However this method often suffers from the so-called ”small-sample-size” problem, en-countered in high-dimensional pattern recognition tasks where the number of training
samples available for each subject is smaller than the dimensionality of the samples. In addition, the performance of LDA among linear appearance-based methods often dete-riorates rapidly when face patterns are subjects to large variations in viewpoints, illu-mination or facial expression. To break the limitations, Lu et al [LPVL06] proposed a novel ensemble-based approach to boost the performance of traditional LDA-based meth-ods. The ensemble-based approach is based on the recently emerged technique known as “boosting”. The idea behind boosting is to sequentially employ a base classifier (also called learner) on a weighted version of the training sample set to generalize a very accu-rate and strong classifier. The learners in this case are the LDA-based ones, which differ from the traditional learners used in boosting. It has been shown, both theoretically and experimentally, that boosting is particularly robust in preventing overfitting and reducing the generalization error by increasing the margins of the training examples. In B-LDA method, each round of boosting generalizes a new LDA subspace particularly targeting those examples from the hard-to-separate pairs of classes, so that the separability be-tween these classes is enhanced in the new LDA subspace. The final result obtained by the process is an ensemble of multiple relatively weak but very specific LDA solutions. The ensemble-based solution is able to take advantage of both boosting and LDA.
Model-based approaches
Figure 2.3: The diagram for a RBF based face recognition system [EWLT02].
While the popular back-propagation (BP) neural network may be trained to recognize face images, the direct application of this principle is often impossible due to the size of the input features as it would lead to a complex network which would be difficult to train.
2.2. FACE RECOGNITION METHODS 15
Therefore, before classification is performed, a dimension reduction technique should be applied [FC90,EWLT02,Liu04]. When face images were treated as 1D signals and wavelet analysis was used for feature extraction [Liu04], the Radial Basis Function (RBF) network was applied to the projection of face images to Fisherfaces for classification [EWLT02]. The diagram for Er’s method [EWLT02] is plotted in Figure 2.3. While PCA + LDA were first used to decrease the feature dimension of face patterns, sample information was adopted to determine the structure and initial parameters of the RBF network.
Figure 2.4: Binary SVM tree [GLC01].
As a hyper plane classifier, Support Vector Machines (SVM) have also been successfully applied to face recognition[HHP01, HH03,GLC01]. Readers can refer to Appendix Afor more details on SVM. In [HHP01], a set of SVM classifiers is applied to extract different facial components and the grey values of each component are then combined into a single feature vector. The component based method has been compared with a SVM classifi-cation based global method and the results show its robustness against variance of pose and illumination. In their later work [HH03], Huang and Heisele used a 3D morphable model to generate synthesized images with different illumination and pose for training. A binary tree system was adopted by Guo et al [GLC01] to use SVMs for the multi-class face recognition problem. In [JKLM02] each person is associated with a SVM that was trained to discriminate the face images from the same people and those from others. Both PCA and LDA were used for feature extraction and tested on a verification application. By applying different illumination normalization techniques, the results show that SVMs are robust and relatively insensitive to the feature space and pre-processing methods. However, when the representation feature already captures and emphasizes the
discrimi-natory information, e.g., features extracted using LDA or SVMs lose their superiority in comparison with the simplest Euclidean distance + nearest neighbor classifier.
2.2.2
Local Methods
In this section, we consider those approaches to face recognition which view the face as a collection of features. We first very briefly review the geometric feature-based approaches. We then consider the graph-based approaches such as the elastic graph matching (EGM) and the elastic bunch graph matching (EBGM). Next we present approaches based on the hidden Markov model (HMM). Finally, we present the local feature analysis.
The main advantage of local approaches is to allow a flexible deformation at the key feature points so that pose changes can be compensated.
Geometrical Feature
In [BP93] Brunelli et al implement and compare geometrical and template feature based analytic methods. For geometrical feature based methods, eyes, mouth and nose facial features are firstly detected, the nose width and length, mouth position and chin shape features are then input to a Bayes classifier for identification. Figure 2.5shows how these geometry features are measured, e.g. the chin shape is represented by the distance between the edge of the chin and the center of the mouth. For template based method, facial regions are matched with templates of eyes, nose and mouth respectively and the similarity scores of each facial feature are simply added into a global score for face recognition.
2.2. FACE RECOGNITION METHODS 17
Component-based approaches
Many global approaches to face recognition were extended to consider different compo-nents of the face, such as the eyes, the mouth or the nose instead of the whole face pattern. The design of a component-based approach thus relies mainly on the choice of the com-ponents, the measure of similarity between two components and the way the outputs of the different matchers are fused. Brunelli and Poggio [BP93] considered four components: the eyes, the nose, the mouth and the face (from eyebrows downward). Each feature is matched separately and the combination is done through the use of a HyperBF network. The experimental analysis shows that the most discriminating features are the eyes, then the nose, the mouth and that the whole face is the least discriminating one. In Pentland et al. [PMS94] proposed an extension of the Eigenfaces, called modular Eigenfaces, using dif-ferent face components. The notion of Eigenface is thus extended to Eigeneyes, Eigennose and Eigenmouth. For a small number of eigenvectors, the Eigenfeatures approach out-performed the Eigenface approach and the combination of Eigenfaces and Eigenfeatures outperformed each algorithm taken separately.
Elastic Bunch Graph Matching
Lades et al [LVB+93] propose a graph structure, called Dynamic Link Architecture (DLA) to represent face images. In this framework, an elastic graph matching (EGM) process is used to learn the representing graph of face images. Once faces are represented by appropriate graphs, Gabor features extracted from graph nodes, named Gabor jets, are then used for face recognition. Figure 2.6 shows two example face images overlaid with the representative graph [LVB+93]. Later on, Wiskott et al [WFKM97] extend DLA to Elastic Bunch Graph Matching (EBGM), where graph nodes are located at a number of selected facial landmarks. The EBGM has shown very competitive performance and been ranked as the top method in the FERET evaluation [PMRR00]. Details of Gabor wavelet based methods will be presented in the following section 2.3.2 .
Hidden Markov Models
The Hidden Markov Models have been used in speech recognition for more than a decade ( [RJ86], [HAJ90]), and were also promoted for gesture recognition in recent years
Figure 2.6: Face images represented by graphs [LVB+93].
( [SP95]), only little work has been done on applying HMM to face recognition.
HMMs are normally trained from examples that are represented by a sequence of observa-tions. The parameters of the HMM are firstly initialized and then adjusted to maximize the probability of the observation of the given training samples. The observation of test samples can then be input to the trained HMMs for classification according to the output probabilities given different HMMs. Samaria and Young [SY94] first proposed a HMM architecture for face recognition. A face pattern is divided into several regions such as forehead, eyes, nose, mouth and chin. These regions occur in the natural order from top to bottom and they are used to form the hidden states of 1D or pseudo 2D HMMs. To train a HMM, each face image is represented by a sequence of observation vectors, which are constructed from the pixels of a sub window. Nefian and Hayes [NH99] proposed the embedded 2D HMM, which consists of a set of super states with each super state being associated with a set of embedded states. Super states represent primary facial regions while embedded states within each super state describe in more detail the facial regions. As shown in Figure2.7, transitions between embedded states in different super states are not allowed. However, HMM based systems require lots of images for training, and are only capable of operating on small databases. The performance drops dramatically as the size of database is scaled up.
2.3. MULTI-RESOLUTION APPROACHES 19
Figure 2.7: 2D embedded HMM structure [NH99].
2.3
Multi-resolution approaches
All face recognition algorithms, however, witness a performance drop whenever face ap-pearances are subject to variations by factors such as illumination, expression, occlusion, accessories, pose and aging. In fact, often these factors culminate in intra-individual variability of face images, to the extent that they can be larger than the inter-individual variability. Multi-resolution techniques is applied in order to diminish the loss of clas-sification performance due to changes in facial appearance. The underlying idea in the use of the multi-resolution analysis is to obtain multiple evidences from the same face, and search for those components that are less sensitive to intrinsic deformations due to expression or due to extrinsic factors, like illumination. Also, multi-resolution analysis [SED04] allows for the preservation of an image according to certain levels of resolution or blurring. Broadly speaking, multi-resolution analysis allows for the zooming in and out on the underlying texture structure. Therefore, the texture extraction is not affected by the size of the pixel neighborhood. The most popular multi-resolution analysis technique is wavelets, which have been useful in so many applications from image compression to im-age de-noising and edge detection [SDS95]. In the recent years, the success of wavelets in other branches of computer vision, inspired face recognition researchers to apply wavelet based multi-resolution techniques for face recognition [ZLL08, CW02].
Over the past two decades, following wavelets, other multi-resolution tools like Contourlets [BB08] , Steerable Pyramid [SF95] and Curvelets [Can03], to name a few, were developed. These tools have better directional decomposition capabilities than wavelets.
2.3.1
Multi-resolution feature extraction schemes
The Multi-resolution transforms have been used for feature extraction in face recognition. These are used in two ways:
• Direct use of Multi-resolution transform coefficients. The simplest applica-tion of the Multi-resoluapplica-tion transform for face recogniapplica-tion uses directly coefficients as features. The Multi-resolution transform can locally detect the multiscale edges of facial images, the distinctive edge information exists in the lowest spatial-frequency subband, while finer edge information presents in the higher spatial-frequency sub-band.
• From combination of Multi-resolution transform coefficients. The direct use of Multi-resolution transform coefficients may not extract the most discrimina-tive features for two reasons: (1) There is much redundant or irrelevant information contained in Multi-resolution transform coefficients, and (2) One can not find new meaning underlying features which has more discriminative power. In order to over-come the insufficiency of direct use of multi-resolution transform coefficients, it is possible to construct features from the combinations of these coefficients to produce a low dimensional manifold with minimum loss of information, so that the rela-tionships and structure in the data can be identified. These can be done in two ways:
– Use the statistical quantum of Multi-resolution transform coefficients (e.g., mean, variance, entropy) in each spatial-frequency subband as discriminative features.
– Employ traditional transforms (e.g., PCA, LDA, ICA, AM, Neural Networks) to enhance and extract discriminative features in one or several special spatial-frequency subbands.
2.3.2
Gabor filters and wavelets
The Gabor filters represent a powerful tool both in image processing and image coding, with the capability to capture important visual features, such as spatial localization, spatial frequency and orientation selectivity.
2.3. MULTI-RESOLUTION APPROACHES 21
2D Gabor wavelet can be defined as Lades et al [LVB+93]:
φ(x, y) = πγηf2 exp(−(α2x2
r+ β2y2)) exp(j2π f xr); (2.4)
xr = x cos θ + y sin θ, yr= −x sin θ + y cos θ;
where f is the frequency of the modulating sinusoidal plane wave and θ is the orientation of the major axis of the elliptical Gaussian. The 2D Gabor wavelet as defined in Equation2.4
has the Fourier transform:
Φ(u, v) = exp(−π2(α12(ur− f )
2+ 1 β2v
2
r)); (2.5)
ur = u cos θ + v sin θ, vr= −u sin θ + v cos θ;
The plots for two Gabor wavelets in the spatial and frequency domains are shown in Figure 2.8.
Figure 2.8: Example 2D Gabor wavelets in the spatial and the frequency domain (a) f = 0.4, θ = 0, γ = 4, ν = 2, (b)f = 0.2, θ = π/4, γ = 2, ν = 2.
Generally the Gabor filters are exploited to extract the main features from the face images. Indeed, Lades et al. [LVB+93] have been applied to specific areas of the face region, corresponding to nodes of a rigid grid. In each node of the grid the Gabor coefficients
are extracted and combined in jets. The nodes are linked to form such a Dynamic Link Architecture (DLA), so that the comparisons among different subjects can be made by means of a graph matching strategy. Wiskott et al. [WFKM97] further expanded on DLA and developed a Gabor wavelet based EBGM method to label and recognize human faces. Furthermore, comparisons are made in two consecutive steps: a rigid alignment of the grid only accounts for global transformations, such as translations and scale, then the local misplacement of the grid nodes is evaluated by means of a Graph Similarity Function (GSF).
By and large, dynamic link architecture is superior to other face recognition techniques, in terms of rotation invariant; however, the matching process is computationally expensive. Perronnin et al [PDR04] considered a further deformable model, whose philosophy is similar to the EBGM. They introduced a novel probabilistic deformable model of face mapping, based on a bi-dimensional extension of the 1D-HMM. Given a template face FT, a query face FQ and a deformable model M , the proposed method try to maximize
the likelihood P (FT|FQ, M ). There are two main differences between this method and the
original EBGM. First of all the HMM is extended to the 2D case to estimate P (FT|FQ, M ),
automatically training all the parameters of M , so taking into account for the elastic properties of the different parts of the face. Secondly, the model M is shared among all faces, so the approach works well also when little enrolment data is available. On the contrary, a quite different approach has been proposed by Liu [Liu04]. A mother wavelet is defined and forty Gabor filters are derived, considering five scales and eight orientations. Each of these filters is convoluted with the input image, resulting in forty filtered copies of the face image. To incorporate all the features produced by the different Gabor kernels, the resulting Gabor wavelet features are concatenated to derive an augmented Gabor feature vector. Then, in order to reduce the dimensionality of the feature vector, both the PCA and the Enhanced Fisher Linear Discriminant Model (EFM) are investigated. The use of Gabor filters renders this method very robust to changes in expression and illumination. However they dramatically increase the computational cost of the method. A faster wavelet based approach has been proposed by Garcia et al. [GZT98b], which presented a novel method for recognition of frontal views of faces under roughly constant illumination. It is based on the analysis of a wavelet packet decomposition of the face images, because very fast implementations of this procedure are available in hardware.
2.3. MULTI-RESOLUTION APPROACHES 23
Each face image is first located and then described by a subset of band filtered images containing wavelet coefficients. From these wavelet coefficients, which characterize the face texture, they build compact and meaningful feature vectors, using simple statistical measures. Then, they show how an efficient and reliable probabilistic metric derived from the Bhattacharrya distance can be used in order to classify the face feature vectors into person classes, so that even very simple statistical features can provide a good basis for face classification.
2.3.3
Curvelet transform
Since Curvelet transform is the pioneer of all the non-adaptive transforms available today, we will discuss it in a more detail. Once the reader understands the basics of Curvelets, it will be easier to understand Contourlets.
Curvelet Transform was developed by Candes and Donoho in 1999. Its development was motivated by the need of image analysis [CD00]. The transform has improved directional capability, better ability to represent edges and other singularities along curves as com-pared to other traditional multiscale transforms, e.g. wavelet transform. In the past few years Curvelet construction has been redesigned in order to make it simpler to under-stand and use. Figure 2.9 shows the edge representation capability of wavelet (left) and Curvelet transform (right). For the square shape of wavelets at each scale, more wavelets are required for an edge representation than that compared to the number of required Curvelets, which are of elongated needle shape.
Figure 2.9: Edge representation by Wavelet and Curvelet Transform [CDDY06].
This second generation Curvelet transform [CDDY06], introduced in 2006 is not only simpler, but is faster and less redundant compared to its first generation versions [CD01].
Curvelet transform is multiscale and multidirectional. Curvelets exhibit highly anisotropic shape obeying parabolic-scaling relationship (they take the shape of elongated needles at finer scales). In order to implement Curvelet transform, first a 2D FFT of the image is taken. Then the 2D Fourier frequency plane is divided into ’parabolic’ wedges. Finally an inverse FFT of each wedge is taken to find the Curvelet coefficients at each scale j and angle l . Figure 2.10 (left) shows the division of wedges of the Fourier frequency plane. The wedges are the result of partitioning the Fourier plane in radial (concentric circles) and angular divisions. Concentric circles are responsible for decomposition of the image in multiple scales (used for bandpassing the image) and angular divisions correspond to different angles or orientation. Hence, to address a particular wedge we need to define the scale and angle first. In the spatial domain, each wedge corresponds to a particular Curvelet at that given scale and angle. Figure 2.10 (right) one represents Curvelets in spatial Cartesian grid associated with a given scale and angle [CDDY06].
Figure 2.10: Curvelets in Fourier frequency (left) and spatial domain (right) [CDDY06]
There are two different digital implementations of Fast Digital Curvelet Transform (FDCT) [CDDY06]:
• Curvelets via USFFT (Unequally Spaced Fast Fourier Transform). • Curvelets via Wrapping.
These transforms are linear and take as input a Cartesian array f [t1, t2], 0 ≤ t1, t2 < n
2.3. MULTI-RESOLUTION APPROACHES 25
same digital coronization, they differ in the choice of spatial grid to translate Curvelets at each scale and angle. Both the FDCTs run in O(n2logn) flops for n by n Cartesian
arrays [CDDY06]. In case of wrapping a rectangular grid is assumed. All the experimental work presented in this thesis use numerically tight FDCT Wrapping, as this is the fastest Curvelet transform algorithm currently available [CDDY06]. The algorithm has been described below. Let ˆf [n1, n2] denote 2D discrete Fourier transform f [t1, t2]. Let Uj(ω)
be a localizing window and ˜U [n1, n2] is supported on some rectangle of length L1,j and
width L2,j [CDDY06]
Pj = (n1, n2) : n1,0 ≤ n1 < n1,0+ L1,j, n2,0 ≤ n1 < n2,0+ L2,j (2.6)
Implementation steps of FDCT via Wrapping [CDDY06]:
1. Apply 2D FFT and obtain Fourier samples ˆf [n1, n2], −n/2 ≤ n1, n2 ≤ n/2.
2. For each scale j and angle l, form the product ˜Uj,l[n1, n2] ˆf [n1, n2]
3. Wrap this product around the origin and obtain ˆfj,l[n1, n2] = W (˜(Uj,l) ˆf )[n1, n2],
where the range n1 and n2 is now 0 ≤ n1 < L1,j and 0 ≤ n2 < L2,j.
4. Apply the inverse 2D FFT to each ˆfj,l, hence collecting discrete coefficients.
In first two steps, the Fourier frequency plane of the image is divided into radial and an-gular wedges owing to the parabolic relationship between Curvelet’s length and width, as shown in figure2.10. Each wedge corresponds to Curvelet coefficients at a particular scale and angle. Step 3 is basically re-indexing the data around the origin. Finally, using inverse FFT, discrete Curvelet coefficients are collected in the spatial domain. It is not possible to delve into the mathematical details of Curvelet transform within the scope of this thesis; interested reader may refer to the works of Candes and Donoho [CD00,CDDY06,CD01].
Figure2.11shows the Curvelet coefficients of a face image from FERET database [PMRR00].
1. The low frequency (coarse scale) coefficients are stored at the center of the display.
2. The Cartesian concentric coronae show the coefficients at different scales; the outer coronae correspond to higher frequencies.
Figure 2.11: Example of Curvelet coefficients
3. There are four strips associated to each corona, corresponding to the four cardinal points; these are further subdivided in angular panels.
4. Each panel represent coefficients at a specified scale and along the orientation sug-gested by the position of the panel.
2.3.4
Contourlet transform
So far, our discussion on Curvelets was completely for continuous signals. The Curvelet literature defines its concepts in this domain and digitizes it during implementation. The algorithms for digital implementations are quite involved. To overcome this and also for combating the redundancy issues, Do and Vetterli have conceived the Contourlet Transform [DV05] (CT) directly in the digital domain. The Contourlet keeps all the desirable properties of Curvelets including directionality and anisotropicity and at the same time reduces the redundancy of the Curvelets. However, the Contourlet transform does not follow the nice properties of operator theory and hence can not be easily analyzed. The Contourlet representation of an image is constructed by applying two successive decomposition stages. The first stage transforms the original image into a Laplacian pyramid (LP) having L + 1 scale levels. The second stage is a decomposition of each LP scale level into D subbands through a directional filter bank structure (DFB) using quincunx filters. This two-stages decomposition is constructed as follows: the Laplacian pyramid is first used to capture the point discontinuities, and then followed by a directional
2.3. MULTI-RESOLUTION APPROACHES 27
filter bank to link point discontinuities into linear structures. The overall result is an image expansion using basic elements like contour segments, and thus are named Contourlets. In particular, Contourlets have elongated supports at various scales, directions, and aspect ratios. This allows Contourlets to efficiently approximate a smooth contour at multiple resolutions. In the frequency domain, the Contourlet transform provides a multiscale and directional decomposition. For more details refer to [DV05].
As shown in Figure2.13, the CT produces 2lj high frequency images per each level of the
LP decomposition j, where lj is the number of subbands of the DFB. For instance, l1 = 2,
l2 = 3 and l3 = 3 will produce 4, 8 and 8 high frequency images on the finer level, middle
level and coarser level, respectively.
Figure 2.12: Contourlet tiling in the frequency domain
2.3.5
Comparative works
In [Maj07b] the author compared wavelets and Curvelets as feature sets for face tion. It was found empirically that Curvelets were better than wavelets for face recogni-tion. A similar comparative empirical study, but between Contourlets and wavelets [MM07] showed that Contourlets were not as good as wavelets for recognition.
A comprehensive empirical study comparing Curvelets, Contourlets and wavelets [MB] decomposed the face images to several resolutions from fine to coarse using the three trans-forms. At each resolution the recognition accuracy of each of the three transforms were compared. An interesting observation was that the recognition accuracy from wavelets decreased as the scale was made coarser, where as for Curvelets and Contourlets the accuracy increased.
The studies [Maj07b, MM07] do not be in conflict with each other [MB] . In [Maj07b,
MM07] the Curvelet, wavelet and the Contourlet transform was taken at a satisfactory resolution. The studies showed that Curvelet coefficients are the best for face recognition, followed by wavelet coefficients and lastly the Contourlet coefficients. However [MB] re-vealed that at coarser resolutions both Curvelets and Contourlets show better recognition accuracy than wavelets.
Applications of Contourlet to face recognition have been also investigated in work pre-sented by Boukabou et al [BB08] who proposes to employ Contourlet with PCA in order to extract discriminant features and to obtain higher recognition rates. They have evalu-ated the proposed method on two different databases (Yale and FERET Face Databases). The authors state that the Contourlet Transform outperforms the original PCA method. More experiments have to be performed on large database as well as many comparisons against well established existing multi-resolution techniques must be done to asses this conclusion. Mandal et al [MMW07] proposes Curvelet based face recognition system by fusing results from multiple SVM classifiers trained with Curvelets coefficients from images having different gray scale resolutions (2, 4 and 8 bits). In whatever way, this algorithm is computationally expensive since it requires taking the Curvelet Transform of the original image and its quantized representations.
Mandal et al [MWY09] introduce the application of Digital Curvelet Transform in con-junction with different dimensionality reduction tools. This technique appears to be robust to the changes in facial expression as it shows good results for the Essex and the
2.4. PERFORMANCE EVALUATION 29
ORL face databases. But still do not perform well in YALE face database that contains image with great variations in illumination and facial expression.
Based on previous review on local based approaches, which have proven to be robust to most face recognition challenges comparing to global based approaches, we have then proposed a new face recognition method based on local presentation of Curvelet trans-form [AGH+09a]. First Curvelet transform is applied to the face image and each of the resulting sub-bands is partitioned into a set of equally-sized blocks in a non-overlapping way. Then the statistical measures such as mean, variance and entropy of the energy distribution of the Curvelet coefficients for each sub-band at each decomposition level is used to identify a texture. The feature vector of a face is then constructed by concate-nating each block measure to one big feature vector. Finally, City Block metric is used in order to classify the face feature vectors into person classes. The proposed method has been evaluated on Yale, ORL and FERET Databases and has shown better recognition accuracy.
2.4
Performance Evaluation
2.4.1
Identification System
A face identification systems performance is usually evaluated by recognition rate, which is calculated by matching a set of test face images with those in the database. In the testing protocol, an algorithm is given two sets of images: the training set and the test set. the target set is given to the algorithm as the set of the known facial images. The images in the test set consist of unknown facial images to be identified. For each image tri in the training set T R, an algorithm reports a similarity si(k) between tri and each
image tsk in the target set T S.
The identification model allows one to ask how good an algorithm is at identifying a probe image; the question is not always ”is the top match correct?” but ”is the correct answer in the top n matches?”. This let one know how many images have to be examined to get desired level of performance. The performance statistics are reported as cumulative match score, which are plotted on graph. The horizontal axis of graph is the rank and the vertical axis is the probability of identification (PI) (or percentage of correct match).