• Aucun résultat trouvé

Presence or Absence of Common Structural Features. Face detection is also made dicult because certain common but signicant features, such as glasses or a

Learning an Object Detection Task | Human Face Detection

2. Presence or Absence of Common Structural Features. Face detection is also made dicult because certain common but signicant features, such as glasses or a

moustache, can either be present or totally absent from a face. Furthermore, these features, when present, can cloud out other basic facial features (eg. the glare in one's glasses may de-emphasize the darkness of one's eyes) and have a variable appearance themselves (eg. glasses come in many dierent designs). All this adds more variability to the range of permissible face patterns that a comprehensive face detection system must handle.

Figure 2-1: (a): A \canonical" face pattern. (b): A 1919 mask for eliminating near-boundary pixels of canonical face patterns. (c): The resulting \canonical" face pattern after applying the mask.

3.

External Imaging Factors.

Face detection can be further complicated by un-predictable imaging conditions in an unconstrained environment. Because faces are essentially 3-dimensional structures, a change in light source distribution, for instance, can cast or remove signicant shadows from a particular face, hence bringing about even more variability to 2D images of face patterns.

Clearly, one of the most critical issues in face detection is to devise a reliable scheme that can accurately account for the wide range of permissible variations in face patterns.

2.2 Approach and System Overview

In this section, we outline our approach for detecting faces in images. Faces are a highly structured class of image patterns that can be detected by examining only local image information from within a spatially well-dened boundary. We present an overview of a face detection system, based on a technique that represents and detects image patterns using only local image measurements.

The detection paradigm works by testing candidate image locations for local patterns that appear like faces. At the heart of the paradigm is a classication procedure that determines whether or not a given local image pattern is a face. Our approach formulates the classication problem as learning to identify faces from annotated examples of face and non-face patterns. Here, we use learning methods to help capture complex non-face pattern variations that may otherwise be dicult to parameterize by classical programming techniques.

2.2.1 Detecting Faces by Local Pattern Matching

Figure 2-2: The system's task at each scale. The image is divided into many possibly overlapping windows. Each window pattern gets classied as either \a face" or \not a face", based on a set of local image measurements.

Human faces are a highly structured class of objects, with the same key features geo-metrically arranged in roughly the same fashion. One can treat human faces as a target class of spatially well-dened patterns with very stable boundaries in the image domain.

To detect faces, one can therefore dene a xed shaped semantically stable \canonical"

face notion in the image domain, and use a \template-like" matching paradigm to search for similar face-like patterns in an image. Figure 2-1(a) shows the canonical face structure used by our approach. It corresponds to a square portion of the human face whose upper boundary lies just above the eyes and whose lower boundary falls just below the mouth.

The face detection task thus becomes one of appropriately representing the class of all such

\face-like" image patches, and nding instances of these patterns in a scene.

The overall search paradigm for faces works as follows: We exhaustively scan an image for these \face-like" window patterns at all image locations over a range of scales. Figure 2-2 depicts the system's task at one xed scale. The image is divided into multiple, possibly overlapping sub-images of the current window size. At each window, the system attempts to classify the enclosed image pattern as being either \a face" or \not a face". Each time a \matching" window pattern is found, the system reports a face at the window location, and also returns the scale as given by the current window size. We handle multiple scales

Figure 2-3: The key components of theface pattern identicationprocedure in greater detail. The face pattern identication procedure classies new patterns as \faces" or \non-faces". The algorithm uses a distribution-based model to represent the space of all possible canonical face patterns. For each new pattern to be classied, it computes a set of \dierence" measurements between the new pattern and the canonical face model. A trained classier identies the new pattern as being either \a face" or \not a face", based on the set of \dierence" measurements.

by testing window patterns of dierent sizes for these \face-like" properties. The actual search procedure works by matching a pyramidal representation of the image with a xed size \template". To detect faces at a larger scale than the \template" size, our system rst resizes the input image by sub-sampling, so that the desired scale corresponds to the xed

\template" dimensions before searching through the image for matches.

2.2.2 The Face Classication Procedure

Clearly, the most critical and dicult part of our approach is the algorithm for identifying window patterns as \faces" or \non-faces". A good identication procedure must not only correctly label all valid face patterns as faces. It must also reject all background window patterns as non-faces. The task becomes especially complex if one has to deal with both a wide variety of faces and background patterns.

The rest of this chapter focuses on the identication procedure which makes up the crux of our detection scheme for spatially well-dened objects and pattern classes. Figure 2-3 shows the key components of the procedure. Basically, the approach is one of appropriately modeling the distribution of canonical face patterns in a reasonably chosen image feature space, and learning a functional mapping of input feature measurements to output classes

from a representative set of \face" and \non-face" window patterns. More specically, our approach works as follows: