• Aucun résultat trouvé

Automatic Video Monitoring Goals

Dans le document The DART-Europe E-theses Portal (Page 162-167)

P ROPOSED A PPROACH

6.2 Automatic Video Monitoring Goals

In this work, we are interested of an accurate recognition of both simple and complex video activities. For simple activities, we interested to:

¥ Determine whether one or several individuals are present in the environment.

¥ Determine the location of each person (e.g. person close TV).

139

¥ Recognize body postures such as standing, bending, sitting.

For complex activities, we interested to:

¥ Recognize how a person interacts with the environment (e.g. use kettle, sit on a chair).

¥ Recognize activities of daily living (ADLs) such as eating, dressing, manage medication for monitoring people with dementia and in particular Alzheimer people.

6.3 Implementation

The proposed algorithms in this thesis are implemented in SUP1(i.e. Scene Understanding Platform) developed by STARS team. SUP platform is written in C++ language. It provides algorithms for video interpretation systems (e.g. image acquisition, image segmentation, object tracking). All experiments presented in this chapter have been performed in a machine of Intel(R) Xeon(R) CPU E5430 @ 2.66GHz (4 cores) and of 4GB RAM. In the following, we detail the video analysis algorithms used during our experimentation.

6.3.1 Video Analysis Algorithms

Video analysis aims at segmenting, detecting and tracking people moving in the scene. To achieve this task, we have used a set of vision algorithms coming from the video interpretation platform SUP (Figure. 6.1).

The first task is the segmentation task which is applied to each coming image to detect motion in the scene, obtaining a set of moving regions represented as the bounding boxes en-closing them (called blobs). The algorithm segments moving pixels in the video into a binary image by subtracting the current image with the reference image. The reference image is up-dated along the time to take into account changes in the scene (e.g. light, object displacement, shadows).

A Calibration task is used to compute the transformation of a 2D image referential point to a 3D scene referential point (fig.6.2). The keeneo2 calibration tool allows to obtain the calibration matrix which is used to compute the 3D position of mobile objects.

The 3D position of the mobile object is computed from the detected blob and the calibration matrix associated with the video camera by considering that the bottom of the 3D mobile object is on floor level. When the blob representing the person is not completly visible (i.e. occluded by a specified contextual object), the person is supposed to be just behind the object.

1https://team.inria.fr/stars/software/ platform

2http://www.genetec.com/Publications/technicalnotes/Pages/keeneo-analytics-en.aspx

Figure6.1: Visions Algorithms of SUP platform.

A blob merging task consists in assembling small 2D blobs to improve the classification task.

The classification task uses the obtained 2D blobs, the calibration matrix of the camera and predefined 3D models to classify the mobile blobs. It adds a class label to each moving blob (e.g. person, vehicle). A set of 3D features such as 3D position, width and height are computed for each blob. A unique identifier is associated to each new classified blob by using the frame to frame tracker. A long term tracker allow to maintain this identifier throughout the whole video.

6.3.1.1 Background Substraction Algorithm

The algorithm first constructs a background representation for each pixel as described in [NGHIEM, 2010]. Then to detect foreground pixels in the current frame, the algorithm com-pares the current frame with the background representation of the algorithm. The pixels in the current frame which are different from the background representation are classified as fore-ground pixels. The backfore-ground representation is updated regularly after each incoming frame.

When the scene conditions change, the background subtraction algorithm has to adapt itself to the new conditions. However, working only at the pixel level, it is difficult for the background subtraction algorithm to fulfill this work. In this algorithm, a controller for the background sub-traction algorithm is used to help the background subsub-traction algorithm to adapt to the current scene conditions. To do this, the controller uses the feedback from the classification task and the information about the background subtraction algorithm and the scene. With the controller for the background subtraction algorithm, the object detection framework works as follows:

- The framework takes as input a video sequence from a single and fixed camera. From this

Figure6.2: 3D geometrical information (empty scene, calibration matrix) computed from the Keeneo calibration tool.

video sequence, for each frame, the background subtraction algorithm produces a list of poten-tial foreground pixels.

- The algorithm to remove shadow and highlight receives this list and it removes from the list the pixels corresponding to shadow or highlight. The results are sent to the blob construction task and the controller.

- The blob construction task constructs the blobs from the foreground pixels, then sends the blobs to the blob classification task.

- The blob classification task classifies these blobs and sends the list of blobs together with their types to the higher tasks and to the controller.

6.3.1.2 Tracking Algorithm

The tracking algorithm tracks mobile objects based on their trajectory properties as de-scribed in [Chau, 2012]. The used tracker includes two stages : tracking and global tracking.

The tracking stage follows the steps of a Kalman filter including estimation, measurement and correction. First for each tracked object, its state including position and 2D bounding box is estimated by a Kalman filter. Second, in the measurement step, this tracked object searches for the best matching object based on four descriptors : 2D position, 2D area, 2D shape ratio and color histogram. Third, the best matching object and its estimated state are combined to update the position and 2D bounding box sizes of the tracked object. However, the mobile ob-ject traob-jectories are usually fragmented because of occlusions and misdetections. Therefore, the global tracking stage aims at fusing the fragmented trajectories belonging to the same mobile object and removing the noisy trajectories. The advantages of this algorithm [Chau, 2012] over the existing state of the art ones are : (1) no prior knowledge information is required (e.g. no calibration and no contextual models are needed), (2) the tracker can be effective in differ-ent scene conditions : single/several mobile objects, weak/strong illumination, indoor/outdoor scenes, (3) a global tracking stage is defined to improve the object tracking performance.

6.3.1.3 Posture Recognition

We have used the posture recognition algorithm proposed in [Boulay et al., 2006] in order to recognize in real-time a set of human postures from any position of the camera and for any orientation of the person.

The goal of the human posture recognition approach is to provide accurate information about the people observed in the scene. Human posture recognition is a difficult and challeng-ing problem due to the huge quantity of possible cases. The number of postures depends on the degree of freedom of the human body. Moreover the morphology of the person influences the appearance of the same posture. Finally, clothes can give different types of appearance for the same posture. The posture recognition task combines the 2D techniques and 3D posture models to generate silhouettes. The generated silhouettes are then compared with the silhouettes of the detected objects moving in the scene based on 2D techniques to determine the posture. The algorithm takes as input the contextual base to generate a virtual camera (using camera calibra-tion informacalibra-tion) and uses the informacalibra-tion given by the deteccalibra-tion task to generate silhouettes (Figure. 6.3 ).

Figure6.3: Posture Recognition Approach [Boulay et al., 2006].

Dans le document The DART-Europe E-theses Portal (Page 162-167)