• Aucun résultat trouvé

An Ef fi cient Multi-view Based Activity Recognition System for Video Surveillance

Using Random Forest

J. Arunnehru and M.K. Geetha

Abstract Vision-based human activity recognition is an emergingfield and have been actively carried out in computer vision and artificial intelligence area. However, human activity recognition in a multi-view environment is a challenging problem to solve, the appearance of a human activity varies dynamically, depending on camera viewpoints. This paper presents a novel and proficient framework for multi-view activity recognition approach based on Maximum Intensity Block Code (MIBC) of successive frame difference. The experimare carried out using West Virginia Uni-versity (WVU) multi-view activity dataset and the extracted MIBC features are used to train Random Forest for classification. The experimental results exhibit the accuracies and effectiveness of the proposed method for multi-view human activity recognition in order to conquer the viewpoint dependency. The main contribution of this paper is the application of Random Forests classifier to the problem of multi-view activity recognition in surveillance videos, based only on human motion.

Keywords Video surveillance

Human activity recognition

Frame difference

Motion analysis

Random forest

1 Introduction

In recent decades, the video surveillance system is an essential technology to ensure safety and security in both public and private areas like railway stations, airports, bus stands, banks, ATM, petrol/gas stations and commercial buildings due to

J. Arunnehru (&)M.K. Geetha

Speech and Vision Lab, Department of Computer Science and Engineering, Faculty of Engineering and Technology, Annamalai University, Annamalai Nagar, Chidambaram, Tamil Nadu, India

e-mail: arunnehru.aucse@gmail.com M.K. Geetha

e-mail: geesiv@gmail.com

©Springer India 2015

L.C. Jain et al. (eds.),Computational Intelligence in Data Mining - Volume 2, Smart Innovation, Systems and Technologies 32, DOI 10.1007/978-81-322-2208-8_12

111

terrorist activity and other social problems. The study of human activity recognition have been extensively utilized in the area of computer vision and pattern recog-nition [1], whose point is to consequently segment, keep and perceive human activity progressively and maybe anticipate the progressing human activities.

Recognizing user activities mean to infer the user’s current activities from available data in the form of videos, which is considered as a rich source of information. The automatic recognition of human activities opens a number of interesting application areas such as healthcare activities, gesture recognition, automatic visual surveillance, human-computer-interaction (HCI), content based retrieval and indexing and robotics using abstract information from videos [2].

Actions performed by humans are dependent on many factors, such as the view invariant postures, clothing, illumination changes, occlusion, shadows, changing backgrounds and camera movements. Different people will carry out the similar action in a different way, and even the same person will perform it another way at different times. Due to the variability in view, human body size, appearance, shape and the sophistication of human actions, the task of automatically recognizing activity is very challenging [3].

1.1 Outline of the Work

This paper accord with multi-view activity recognition, which aims to distinguish human activities from video sequences. The proposed method is evaluated using West Virginia University (WVU) multi-view activity dataset [4] considered actions such as clapping, waving one hand, waving two hands, punching, jogging, jumping jack, kicking, picking, throwing and bowling. Difference image is obtained by subtracting the successive frames. Motion information is obtained by the Region of Interest (ROI). The obtained ROI is separated into two blocks B1 and B2. Maximum Intensity Block Code (MIBC) is extracted as a feature from the block showing maximum motion in the ROI. The extracted feature is fed to the Random Forest (RF) classifier.

The rest of the paper is structured as follows. Section2 reviews related work.

Section 3 illustrates the feature extraction and related discussions. Section 4 explains the workflow of the proposed approach. Experimental results and the WVU multi-view activity dataset are described in Sect. 5 and Finally, Sect. 6 concludes the paper.

2 Related Work

Human action recognition in view-invariant scenes is still an open problem. Several excellent survey papers illustrated various applications related to human action analysis, representation, segmentation and recognition [1–3, 5, 6]. Motion is an important cue has been widely applied in the fields like visual surveillance and

112 J. Arunnehru and M.K. Geetha

artificial intelligence [7]. In [8] present a space-time shape based on 2D human silhouettes, which have the spatial information with reference to the pose and shape descriptors which were used to recognize the human action. The temporal templates called Motion History Image (MHI) and Motion Energy Image (MEI) is used to represent the actions [9] and classification was done by SVM classifier [10]. In [11]

an automatic activity recognition is carried out based on motion patterns extracted from the temporal difference image based on Region of Interest (ROI) and the experiments are carried out on the datasets like KTH and Weizmann datasets. In [12] propose a Motion Intensity Code (MIC) to recognize human actions based on motion information. The PCA and SVM were used to learn and classify action from KTH and Weizmann dataset which yields better performance. In [4] uses a Local Binary Pattern (LBP) with Three Orthogonal Planes (TOP) to represent the human movements with dynamic texture features and temporal templates is used to classify the observed movement. In [13] uses a collective approach of spatial-temporal and optical flow motion features method, which uses the improved Dynamic Time Warping (DTW) method to enhance temporal consistence of motion information, then coarse-to-fine DTW limitation on motion features pyramids to distinguish the human actions and speed up detection performance. In [14] adopted a 2D binary silhouette feature descriptor joined together with Random Forest (RF) classifier and the voting strategy is used to signify the each activity, then multi-camera fusion performance is evaluated by concatenating the feature descriptors for multiple view human activity recognition. In [15] utilize a combination of opticalflow and edge features to model the each action and boosting method is adapted for efficient action recognition. In [16] proposed a method Maximum Spatio-Temporal Dissimilarity Embedding (MSTDE) based on human silhouettes sequences and recognition results obtained by similarity matching from different action classes.

3 Feature Extraction

The extraction of discriminative feature is most fundamental and important problem in activity recognition, which represents the significant information that is vital for further analysis. The subsequent sections present the depiction of the feature extraction method used in this work.

3.1 Frame Difference

To recognize the moving object across a sequence of image frames, the current image frame is subtracted either by the previous frame or successive frame of the image sequences called as temporal differencing. The difference images were obtained by applying thresholds to eliminate pixel changes due to camera noise, changes in lighting conditions, etc. This method is extremely adaptive to detect the An Efcient Multi-view Based Activity Recognition System for 113

movable region corresponding to moving objects in dynamic scenes and superior for extracting significant feature pixels. The difference image obtained by simply subtracting the previous frametwith current frame is at time t+ 1 on a pixel by pixel basis.

Dsðx;yÞ ¼jPsðx;yÞ Psþ1ðx;yÞj

1xw; 1yh ð1Þ

The extracted motion pattern information is considered as the Region of Interest (ROI). Figure 1a, b illustrates the successive frames of the WVU multi-view activity dataset. The resulting difference image is shown in Fig.1c.Ds(x,y) is the difference image,Ps(x,y) is the pixel intensity of (x,y) in thesth frame,handware the height and width of the image correspondingly. Motion information Ts or difference image is considered using

Tsðx;yÞ ¼ 1; if Dsðx;yÞ[t 0; otherwise;

ð2Þ wheretis the threshold.

3.2 Maximum Intensity Block Code (MIBC)

To recognize the multi-view activity performed, motion is a significant signal normally extracted from video. In that aspect, Maximum Intensity Block Code (MIBC) is extracted from the motion information about the activity video sequences. The method for extracting the feature is explained in this section.

Figure 2 shows the motion information extracted for kicking, jogging and bowling activities from WVU multi-view activity dataset. Initially ROI is identified and considered as motion region as seen in Fig.3a. The extracted ROIs are notfixed size. ROI separated into two blocks B1, B2 comprising of the upper part of the Fig. 1 a,bTwo successive frames. cDifference image of (a) and (b) from WVU multi-view activity dataset

114 J. Arunnehru and M.K. Geetha

body and lower part of the body for kicking activity captured on View 3 (V3) camera as shown in Fig.3b. In order to reduce the amount of calculation, only the maximum motion identified block is considered for further analysis. In the Fig.3b, Fig. 2 Sample sequences of extracted ROIs from WVU activity dataset.aKicking.bJogging.

cBowling

Fig. 3 aROI of the motion pattern.bROI divided into B1 and B2 blocks.cB2 is divided into 3×6 sub-blocks

An Efcient Multi-view Based Activity Recognition System for 115

block B2 shows maximum motion. So, B2 alone is considered for MIBC extraction as shown in Fig.3c. The block under consideration is equally divided into 3× 6 sub-blocks. The average intensity of each block in 3× 6 regions is computed and thus an 18-dimensional feature vectors are extracted from the B2 and it is fed to Random Forest (RF) classifiers for further processing.