• Aucun résultat trouvé

Seridi Hamid Département d’informatique,

Université 8 mai 1945 Guelma, B. P.

401, Algérie

LabSTIC, Université 8 mai 1945 Guelma, B. P. 401, Guelma 24000,

Algérie rouabhia.h@gmail.com

Farou Brahim

Département d’informatique, Université 8 mai 1945 Guelma, B. P.

401, Algérie

LabSTIC, Université 8 mai 1945 Guelma, B. P. 401, Guelma 24000,

Algérie farou@ymail.com

Seridi Hamid

Département d’informatique, Université 8 mai 1945 Guelma, B. P.

401, Algérie

LabSTIC, Université 8 mai 1945 Guelma, B. P. 401, Guelma 24000,

Algérie seridihamid@yahoo.fr

Abstract— Despite the large number of promising approaches existing, it’s still a hard work to obtain desirable results in the area of people detection, especially in terms of time. In this paper, we present a real-time and accurate human detector for video surveillance system, which includes background subtraction technique to extract candidate region, CENTRIST descriptor to encode the contour information and a linear SVM to make a decision, whether the selected region is a person or not. The used method achieves 25 to 30 fps speed and state-of-the-art detection accuracy.

Index People detection, CENTRIST descriptor, background subtraction.

I. INTRODUCTION

Detecting people in images and videos is one of the main tasks in many applications of computer vision , including human-robot interaction , image and video content management, assisted automotive system (which warns the driver if there is a person in the road ) and video surveillance system which is our interest topic.

One of the main goals of video surveillance system is to monitor and identify people in public places such as airports and supermarkets or in sensitive places such as banks and ministries for security purposes. For these reasons, detecting and identifying people are very important phase in this system. At the same time, it is one of the most challenging problems, due to the difficulties related to the variations in people appearance and poses; these variations increase when it is combined with motion and clothes. The surrounding environment also plays an important role in the appearance (changing in illumination and cluttered background). In some cases, the position and orientation of the camera caused considerable difficulties. In the case of the presence of several people with a certain degree of interaction, it may cause an occlusion, which increases the difficulty of the task

of detecting people.

It can be seen that the task of detecting people is related to the classification problem, which consists of three basic phases, the first one is the selection of a candidate region. The sliding window method [1] is the simplest method used in the stat of the art, to reduce the computational cost the background subtraction technique is used when the input data is a video from a fixed camera [2].

The second phase is the feature extraction. Many feature are used in the literature and the well-known are wavelet [3], edgelet [4],shape context [5],shapelet [6],LBP[7] and HOG [8] which is characterized by its efficiency and robustness when representing the human body. An extended version of HOG used a variable size block is proposed in [9] and they propose a rejection cascade to solve the problem of computational costs existing in HOG. In [10] a combination between HOG and LBP is used.

The classification is the last phase in detecting people, in this phase a decision is taken whether the vector of feature extracted from the candidate region represent a person or not. Most works in this field used Variants of Boosting [11][12] and SVMs [13][14] machine learning

Despite the large number of research in this area ,the current algorithms are characterized by their quality and efficiency and suffer from high computational cost or the contrary

in this paper, we build a system for detecting people in video surveillance system in real time, the system is characterized by its quality and speed since it use a fast and powerful discriminant descriptor CENTRIST [15],and background subtraction technique to reduce the computational cost in the phase of candidate selection.

II. OVERVIEW OF THE METHOD :

As illustrated in Fig1. This method is divided into three main processes: background subtraction technique to select region of interest (ROI); CENTRIST descriptor for describing the selected region; SVM learning machine to make a decision , whether the selected region is a person or not . Background subtraction CENTRIST Descriptor SVM

III. USING BACKGROUND SUBTRACTION TECHNIQUE TO SELECT CANDIDATE REGIONS :

The first step of our system shown in Fig.1 is background subtraction technique, with the purpose to localized regions of interest (ROIs) that contain person. We apply the algorithm from [16] to detect moving object and produce the foreground objects , then for every object detected a bounding box is extracted as illustrated in fig 2 a.

The bounding boxes extracted from the foreground image is then used to extract the candidate region from the current frame as illustrated in fig 2 b.

The reason for using the algorithm in [16] is that it used a recursive equations to constantly update the parameters of a Gaussian mixture model [17] and automatically select the needed number of components per pixel. These improvements making it fast and has more accuracy in the segmentation than the old GMM[17] which is one of the best algorithm for background subtraction.

Fig. 2a. extract bounding box: where a: frame from a video, b: foreground image, c: extract object from the foreground , d: bonding box

a b c

d

a

b c

IV. CENTRISTE DESCRIPTOR :

HOG is the most popular and one of the best descriptor in the field of people detection, but it cannot be used because its high computational cost in video surveillance.

CENTRIST[15] descriptor use contour information as HOG and use the signs of comparisons among neighboring pixels to encode this information, in addition, it does not need any image preprocessing or feature vector normalization, which make it better then HOG in the computational cost with almost the same accuracy [18]

CENTRIST means CENsus TRansform hIStogram, the census transform(CT) has developed by [19] for establishing correspondence between local patches .

the CT value of a pixel is obtained by comparing its intensity with those of its neighbors using Eq.1

 

 

  

,

1

0 , 1 0 , 0 7 0 i

2

a a N n i n

a

S

S

P

P

CT

Where PN (N=0,1…..7) is the Nth neighbors of Pi pixel, and Pi is the center pixel in a scanning window of 3 * 3 range

CENTRISTE visual descriptor is a histogram of census transform value. An image is described as a histogram sequence by the following procedure:

 Compute the well know Sobel gradient of an input image.

 Compute the Census Transform image using the Sobel filter result

 The Census Transform image (CT) is divided into (N*M) rectangle regions with specific size(h*w), and a histogram of CT values is computed for each 2*2 adjacent region.

 The CT histograms of all regions are concatenated to form the final vector feature used for training the SVM machine learning ,

 In the phase of testing we don’t need to extract the vector feature ,because the feature extraction component is seamlessly embedded into classifier evaluation[18] and a patch is classified as containing person if :      

 

   

             

2 0 2 0 1 2 1 1 2 1 1 , 1 ,

2

N i M j h x w y y w j x h i CT j i

W



Where W is the linear SVM classifier and θ is a threshold.

V. EXPERIMENTS:

To evaluate our algorithm, we use the INRIA database that contain 614 positive images and 1218 negative images for training a linear SVM machine learning, we extract person from the positive images and choose randomly patches sample from the negative images. In order to be used by the SVM, both positives and negatives patches are resized; because the SVM classifier cannot be adapted to a variant patches size.

In the testing phase we use CAVIAR database , the evaluation is based on two aspects , the speed and the detection accuracy.

Fig 3 : Time spent by the background subtraction and CENTRIST versus number of objects per frame . Figure 3 shows curves of the time spent by CENTRIST descriptor and the background subtraction technique relative to the number of found objects (ROI). From the figure, we find that the time of the background subtraction technique is not affected by the number of found object, unlike the time of the CENTRIST which increases with the rise in the number of objects, but the amount of this increase is about( 1 millisecond per object). Since the time consumed by CENTRIST is small compared to background subtraction time, the total time is not much affected by the increasing of the number of objects , which lead to achieve 25 to 30 fps speed using only 1 processing core of a 2.0 GHz.

To evaluate the detection accuracy we use the Precision- recall curve.

 3

Recall

umans

AnnotatedH

ves

TruePositi

 4

Precesion

ives

FalsePosit

ves

TruePositi

ves

TruePositi

0 2 4 6 8 10 0 5 10 15 20 25 30 35 40

Number of objects per frame

T im e i n m ill ise co n d CENTRIST Background subtraction

A bonding box(BB) is counted as true positives when it is classified as human by the algorithm and match the ground-truth. But BB that is classified as human and not matched to any annotated human are counted as false positives, the matching is determined if the following inequality is satisfied [20].

 5

BB

BB

BB

BB

dt gt dt gt

area

area

When is the ground truth bonding box and is the detected bonding box .

Fig4: Precision-Recall curve.

From figure 4, we find that recall is proportional to the precision; the interpretation of this relationship is that the used method does not produce a lot of false positives detection, which leads to good accuracy. On the other hand, this method is limited by detecting only moving people, because the background subtraction technique detects only moving object and that lead to produce false negative detection (people not detected).

VI. CONCLUSION :

The main work in this paper is consist of: experimentally verify the effectiveness and efficiency of the CENTRIST descriptor for people detection on the CAVIAR database and using the background subtraction technique, to reduce the computational cost and improve the detection accuracy. In the future, we will use another method to select the candidate region to resolve the problem of standing people.

REFERENCES

[1] C. Papageorgiou et T. Poggio, « A Trainable System for Object Detection », Int. J. Comput. Vis., vol. 38, no 1, p. 15‑33, juin 2000.

[2] O. Javed et M. Shah, « Tracking and Object Classification for Automated Surveillance », in Computer Vision — ECCV 2002, A. Heyden, G. Sparr, M. Nielsen, et P. Johansen, Éd. Springer Berlin Heidelberg, 2002, p. 343‑357.

[3] P. Viola, M. J. Jones, et D. Snow, « Detecting Pedestrians Using Patterns of Motion and Appearance », Int. J. Comput. Vis., vol. 63, no 2, p. 153‑161, juill. 2005.

[4] B. Wu et R. Nevatia, « Detection of multiple, partially occluded humans in a single image by Bayesian combination of edgelet part detectors », in Tenth IEEE International Conference on Computer Vision, 2005. ICCV 2005, 2005, vol. 1, p. 90‑97 Vol. 1.

[5] E. Seemann, B. Leibe, K. Mikolajczyk, et B. Schiele, « An evaluation of local shape-based features for pedestrian detection », in In Proc. BMVC, 2005.

[6] P. Sabzmeydani et G. Mori, « Detecting Pedestrians by Learning Shapelet Features », in IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR ’07, 2007, p. 1‑8.

[7] Y. Mu, S. Yan, Y. Liu, T. Huang, et B. Zhou, « Discriminative local binary patterns for human detection in personal album », in IEEE Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, 2008, p. 1‑8. [8] N. Dalal et B. Triggs, « Histograms of oriented gradients for

human detection », in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, 2005, vol. 1, p. 886‑893 vol. 1.

[9] Q. Zhu, M.-C. Yeh, K.-T. Cheng, et S. Avidan, « Fast Human Detection Using a Cascade of Histograms of Oriented Gradients », in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2006, vol. 2, p. 1491‑1498.

[10] X. Wang, T. X. Han, et S. Yan, « An HOG-LBP human detector with partial occlusion handling », in 2009 IEEE 12th International Conference on Computer Vision, 2009, p. 32‑39. [11] H.-X. Jia et Y.-J. Zhang, « Fast Human Detection by Boosting Histograms of Oriented Gradients », in Fourth International Conference on Image and Graphics, 2007. ICIG 2007, 2007, p. 683‑688.

[12] M. J. Jones et D. Snow, « Pedestrian detection using boosted features over many frames », in 19th International Conference on Pattern Recognition, 2008. ICPR 2008, 2008, p. 1‑4. [13] S. Zhou, Q. Liu, J. Guo, et Y. Jiang, « ROI-HOG and LBP

Based Human Detection via Shape Part-Templates Matching », in Neural Information Processing, T. Huang, Z. Zeng, C. Li, et C. S. Leung, Éd. Springer Berlin Heidelberg, 2012, p. 109‑115.

[14] H. Bin et Z. Chunxia, « An HOG-LGBPHS human detector with dimensionality reduction by PLS », Pattern Recognit. Image Anal., vol. 24, no 1, p. 36‑40, janv. 2014.

[15] J. Wu et J. M. Rehg, « CENTRIST: A Visual Descriptor for Scene Categorization », IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no 8, p. 1489‑1501, août 2011.

[16] Z. Zivkovic et F. van der Heijden, « Efficient adaptive density estimation per image pixel for the task of background subtraction », Pattern Recognit. Lett., vol. 27, no 7, p. 773‑780, mai 2006.

[17] C. Stauffer et W. E. L. Grimson, « Adaptive background mixture models for real-time tracking », in Computer Vision

0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall Pre ci si o n

and Pattern Recognition, 1999. IEEE Computer Society Conference on., 1999, vol. 2, p. -252 Vol. 2.

[18] J. Wu, C. Geyer, et J. M. Rehg, « Real-time human detection using contour cues », in 2011 IEEE International Conference on Robotics and Automation (ICRA), 2011, p. 860‑867. [19] R. Zabih et J. Woodfill, « Non-parametric local transforms for

computing visual correspondence », in Computer Vision — ECCV ’94, J.-O. Eklundh, Éd. Springer Berlin Heidelberg, 1994, p. 151‑158.

[20] P. Dollar, C. Wojek, B. Schiele, et P. Perona, « Pedestrian detection: A benchmark », in IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, 2009, p. 304‑311.