Filtres d’´energie de Gabor - Suivi de points saillants pour la reconnaissance d’expressions

Chapitre 4: Suivi de points saillants pour la reconnaissance d’expressions

4.4 Filtres d’´energie de Gabor

Par rapport aux filtres classiques de Gabor, le filtre d’énergie de Gabor produit un résultat plus lisse en réponse à un contour ou à un trait d’une certaine largeur, avec un maximum local exactement au milieu du trait [76, 142] (voir figure 4.4). Ce filtre est obtenu par “superposition” de phases. Le filtre le plus utilisé combine par la norme L2 les deux convolutions correspondant aux deux phases (ϕ0 = 0) et (ϕ1 = π/2).

Pour plus de détails concernant ce type de filtres voir l’article présenté en an- nexe G.

(ARTICLE) ITERATIVE GABOR PHASE–BASED

DISPARITY ESTIMATION FOR “PERSONALIZED”

FACIAL ACTION RECOGNITION

Ce chapitre présente une extension de deux articles ayant été publiés (voir an- nexes D et E) comme l’indique les références bibliographiques [38, 42]

M. Dahmane et J. Meunier. Enhanced phase–based displacement estimation - An application to facial feature extraction and tracking. Dans Proc. of Int. Conference on Computer Vision Theory and Applications, pages 427–433, 2008.

M. Dahmane et J. Meunier. Individual feature–appearance for facial action recognition. Dans Proc. of Int. Conference on Image Analysis and Recogni- tion, pages 233–242, 2011.

La version étendue a été soumise pour publication dans la revue scientifique Signal processing: Image Communication, par Mohamed Dahmane, Jean Meunier et Sylvie Cossette.

Abstract

Within the affective computing research field, researchers are still facing a big challenge to establish automated systems to recognize human emotions from video sequences. Performances are quite dependent on facial feature localization and tracking.

In this paper, we present a method based on a coarse–to–fine paradigm to char- acterize a set of facial fiducial points using a bank of Gabor filters. When the first face image is captured, the coarsest level is used to estimate a rough position for each facial feature. Afterward, a coarse–to–fine displacement refinement on an image pyra- mid is performed. The positions are then tracked over the subsequent frames using a modification of a fast Gabor–phase based technique. This includes a redefinition of the confidence measure and introduces an iterative conditional disparity estimation procedure.

We used the proposed tracking process to implement a “personalized” feature– based facial action recognition framework, motivated by the fact that the same facial expression may vary differently across humans.

Experimental results show that the facial feature points can be localized with high accuracy and tracked with sufficient precision leading to a better facial action recognition performance.

5.1 Introduction

The computer vision community is interested in the development of techniques, such as automated facial expression analysis (AFEA), to figure out the main element of facial human communication, in particular for human–computer interaction (HCI) applications or, with additional complexity, in meeting video analysis, and more recently in clinical research.

AFEA is highly sensitive to face tracking performance, a task which is rendered difficult owing principally to environment changes, and appearance variability under different head orientations, and non–rigidity of the face. To meet these challenges, various techniques have been developed and applied. Prior works have focused on both images and video sequences, and different approaches were investigated including feature–based and appearance–based techniques [57, 139]. Most of these techniques

show satisfactory results using databases that were collected under non realistic con- ditions [194]. An advance emotion recognition system needs to deal with more natural behaviors in large volumes of un–segmented, un–prototypical, and natural data [156]. Moreover, these methods are still providing inaccurate results due to the variation of facial expression across different people and even for the same individual, since facial expression is context–dependent. It is noted that the notion of “universality”, as opposed to “personalization”, has been fashionable in the area of facial expression recognition [97].

However, it is advantageous to design a personalized model for facial emotion recognition, since facial physiognomy that characterizes each person leads to a per- sonal facial action display [56] (see figure 5.1). This would explain why facial action units may be considered as “Facial behavior signatures” to recognize individuals [26]. A great number of results on facial expression recognition were reported in the literature of the last few decades. Nevertheless, the approach we report here, is reminiscent of only a few of them.

Surprise Sadness Angry Fear Disgust Happiness

S037 S035

Figure 5.1. Two sets of facial displays from different persons.

In [113], authors employed for each individual in the datasets a person–dependent active appearance model (AAM), that was created for Action Units (AUs) recognition by using similarity normalized shape and similarity normalized appearance. By integrating user identification, the person–dependent method proposed in [22] per-

forms better than conventional expression recognition systems, with high recognition rate reaching 98.9%. In reference [56], authors obtained the best facial expression recognition results by fusing facial expression and face identity recognition. Their personalized facial expression recognition setup combines outputs of convolutional neural networks that were either trained for facial expression recognition or face identity recognition. In [80], the authors conclude that the recognition rates for familiar faces reached 85%. In contrast, for unfamiliar faces, the performance score does not exceed 65%. Their system utilizes elastic graph matching and a personalized gallery to recognize expression on identified face. More recently, authors in [97], designed a “personalized” classifier using a neurofuzzy approach for “personalized” facial expression recognition. They reported a recognition rate of 91.8% on the Cohn–Kanade database (described below).

For both person–independent and person–dependent approaches, facial expression recognition (FER) rate is highly dependent on facial tracking performance. This task is rendered difficult due to environment changes, appearance variability under different head orientations, and the face non–rigidity. Several approaches have been suggested to alleviate these problems, which can be divided into knowledge–, feature–, template–, and appearance–based approaches. The feature–based techniques demon- strate high concurrent validity with manual FACS1_{coding [27, 32]. Furthermore, they}

have some common advantages such as explicit face structure, practical implemen- tation, and collaborative feature–wide error elimination [87]. However, the tracking performance depends on the precise configuration of the local facial features, which poses a significant challenge to the geometric–based facial expression analysis, since subtle changes in the facial expression could be missed due to errors in facial point localization [137]. Though an effective scheme for the facial feature points tracking can compensate for this drawbacks, it could be possibly not sufficient. Feature–based

approaches including only shape information may be rather irrelevant [113]. Fig- ure 5.2 shows an example of two different facial expressions (fear vs. happy) where the respective appearances were significantly different, while the two expressions have a high degree of shape similarity. Therefore, including appearance matching should improve the recognition rate, which can be done by including the local appearance around each facial feature [102].

Figure 5.2. Facial point position may not be sufficient to achieve reliable FER (e.g. fear vs. happy).

In this paper, we propose a modified Gabor phase–based tracking approach that we used to track a set of facial key points using an iterative conditional displacement estimation algorithm (for simplicity, along this work, we use the initials ICDE). These points are then used to perform a feature–based “personalized” facial action recognition system using a prestored facial action graphs. At the first frame of the video, four facial key points are automatically found and tracked over time. Then, at each frame, the most similar graph, through the prestored ones, is chosen based on these key points. The selection utilizes Procrustes transformation and a set of Gabor jets that are stored at each node of the graph.

In what follows, we will describe the modified Gabor phase–based tracking approach, and give details about the proposed ICDE algorithm in section 5.2. In section 5.3, we describe the facial action recognition process. Performance evaluation is presented

in section 5.4. Finally, in section 5.5 we draw some conclusions.

Dans le document Analyse de mouvements faciaux à partir d'images vidéo (Page 84-90)