• Aucun résultat trouvé

Rejection-Cascade of Gaussians: Real-time adaptive background subtraction framework

N/A
N/A
Protected

Academic year: 2021

Partager "Rejection-Cascade of Gaussians: Real-time adaptive background subtraction framework"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02568713

https://hal.archives-ouvertes.fr/hal-02568713

Submitted on 10 May 2020

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-entific research documents, whether they are pub-lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Rejection-Cascade of Gaussians: Real-time adaptive background subtraction framework

B Kiran, Aravind Das, Senthil Yogamani

To cite this version:

(2)

Rejection-Cascade of Gaussians: Real-time

adaptive background subtraction framework

B Ravi Kiran

1

, Arindam Das

2

, Senthil Yogamani

3

ravi.kiran@navya.tech, {arindam.das,senthil.yogamani}@valeo.com

Navya, Paris, France

1

, Detection Vision Systems, Valeo India

2

, Valeo Vision Systems, Galway, Ireland

3

x

Problem Definition & Model

I Background Subtraction :

. Inputs : Video stream containing static and dynamic backgrounds

. Output : Binary classification problem per pixel b/w

foreground/background classes.

. Model : Gaussian Mixture Models (GMM) are parametric models used to

estimate the background class at each pixel of the input image.

. Contribution : Decomposition of GMM into Adaptive Rejection Cascade

of binary classifiers using strong prior information. The classifiers are ordered by the negative class rejection rate following the Viola-Jones rejection cascade, as well as increasing computational complexity.

I Cascade of Gaussians (CoG) Framework

. CoG constitutes of k+1 binary classifiers :

I Consistent Hypothesis Propagation (CHP) classifier : Propagates

previous time’s class (FG/BG) if the value has not changed.

I 1st dominant Gaussian ω0.η(µ0, σ0)

I 2nd dominant Gaussian ω1.η(µ1, σ1)

I kth dominant Gaussian ωk.η(µk, σk)

Figure 1:Right : Elements of CoG : CHP, first and second modes of gaussians and spatio-temporal window of a Cascade of Gaussians. Left : Different dynamics of a pixel : Dynamic Pixel Vs Oscillation Vs Pixel Drift.

CoG : Online parameter update

1. Get N frames & estimate pixel-wise µ(t), σ(t), ω(t)

2. Form matrix whose rows are adapted variance and ranked weight

observations, while columns are variables V and R,

V (tk, i ) = I (tk), k = 1 : N

3. Obtain covariance matrices Rcov = Cov (R), Vcov = Cov (V )

4. Perform K-means clustering with K=3 (for temporal pixel residue due to

dynamic, oscillating, or drifting BG).

5. Threshold for pixels within 0.7 − 0.5σ

6. Calculate the KDE of given cluster & the joint occurrence distribution and

associated weight ω1, µ1 and σ1

Computational analysis of Rejection-Cascade of Gaussians

I Average Speedup : Over single Image I with N pixels

N P

i sini

(1)

I ni refers the ratio of background pixels labeled mean or mean with variance

w.r.t the total number of background pixels in the image,

I si is the normalized ratio of the time it takes for cascade level i BG

classifier model to evaluate and label a pixel as background.

I The values of n and s were profiled over various videos for different

durations.

Figure 2:Left: Pixels in CHP(red), Mode 1(green), Mode 2(blue), Mode 3(violet) and Fore-ground(white). Right: Normalized pixel count over elements of Cascade of Gaussians CHP, first and Second modes of Gaussians.

Deep Rejection Cascade for VAEs

I Variational Autoencoder (VAE) : are generative models that

approximate the data distribution P(X ) of a high dimensional input X , an image or video.

Figure 3:A VAE represents a variational approximation of the latent space with an autoen-coder architecture, with a probabilistic enautoen-coder qφ(x |z) that produces Gaussian distribution in the latent space z (represented by mean and standard deviation vectors), and a probabilis-tic decoder pθ(z|x ), which given a code produces distribution over the input space. Loss function : Reconstruction error + KL Divergence between training data latent space vector distribution & standard normal.

I Deep Rejection Cascade over VAEs : A Rejection cascade

decomposition of the VAE can be achieved. The pixel-level tests in CoG are now performed by the VAE in the latent space.

Encoder Conv2D+BN+ReLU W,H,16 Conv2D+BN+ReLU W/2,H/2,32 Zmean~W/4*H/4*K,8 Zstd~W/4*H/4*K,8 Decoder Conv2D.T W/4,H/4,32 Conv2D.T W/2,H/2,16 Conv2D.T W,H,3 Latent space CHP Latent space Zmean Latent space Gaussian Convolutional VAE VAE-CoG

Figure 4:VAE-CoG on the latent space representation of a VAE. Filters are all 3x3. A convolutional VAE with latent space of 16 dimensions was trained on the CDW-2014 datasets.

I Invariance to positions, orientations, pixel level perturbations, and

deformations due to convolutional architecture.

Figure 5:The input-output pairs and absolute value of residue between input-output pairs from a Convolutional VAE : top half without foreground bottom half with foreground. We remark that the dynamic background such as the snow has been removed. The right column demonstrates the 2d-Histogram over the latent space z of the CVAE (top) and the histogram over the temporal residue over z for the same test sequence.

I Experiments and Analysis : VAE-COG

. The VAE is trained on frames with dynamic and static background to

estimate the normal and standard deviation vectors.

. The test samples are reconstructed and residue w.r.t input scaled by

training error standard deviation is used as the output for background subtraction.

. The Rejection Cascade elements : CHP and 1st level Gaussian frequency

are measured over the latent space for the current video in CDW-2014 dataset. The plot demonstrates many images with dynamic BG are

compressed and mapped to the same latent space vector for the CHP case.

Conclusions

I The CoG was evaluated on the wallflower dataset. We observed a speedup

of 4-5x, over the baseline GMM, with an average improvement of 17% in the mis-classification rate.

I The VAE-CoG was evaluated on the CDW-2014 datasets, providing a first

estimate in the speedup: CHP requires memory to store previous encoded latent space vector and output FG/BG image, while providing speedup by avoiding the VAE-COG’s Decoding into output domain. A speedup can be achieved with the Gaussian test though this is not trivial.

Références

Documents relatifs

Abstract This paper tackles the challenge presented by small- data to the task of Bayesian inference. A novel method- ology, based on manifold learning and manifold sampling,

Although such systems are more frequently used lately, their performances for an actual real-time application would be questionable if the delivered service is not predictable

2.3. Aggregation of color features by Choquet Integral Foreground objects can be made by the fusion of different features with a Choquet integral. For each pixel, similar- ity

method to group loal motion measurements from several moving objets based on

known to be controlled by serotonin, could potentially result from microglia sensitivity to serotonin. Using an in vivo model of synaptic refinement during early

A traditional method for real-time segmentation of moving objet in video based system is “background subtraction”, or computing the error between a background model and

More pre- cisely, we investigate the use of the well-known Mahalanobis distance [9], the combination of background subtraction results obtained on each spectral band of

We consider the fundamental problem of calibrating an ODE model with respect to a temporal logic specification of the behavior of the system, observed in experiments.. We are