• Aucun résultat trouvé

Multi-scale streaming anomalies detection for time series

N/A
N/A
Protected

Academic year: 2021

Partager "Multi-scale streaming anomalies detection for time series"

Copied!
2
0
0

Texte intégral

(1)

HAL Id: hal-02568715

https://hal.archives-ouvertes.fr/hal-02568715

Submitted on 10 May 2020

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Multi-scale streaming anomalies detection for time series

Ravi Kiran

To cite this version:

Ravi Kiran. Multi-scale streaming anomalies detection for time series. CAp 2017, Jun 2017, Grenoble, France. �hal-02568715�

(2)

Multi-scale streaming anomalies detection for time series

B Ravi Kiran Université Lille 3, CRIStAL, Lille, France

Streaming anomaly detection

I x (t ) are observations over time t where new data arrives over time t .

I Problem : Window x (t ) , [ t : t − p + 1 ] which “deviates” from normal behavior.

I Motivation : How to become invariant to the window-size/scale of pseudo-periodic structure in x (t ) ?

Streaming multi-scale Lag-matrix construction

X

tp

= [ x

t

, x

t1

, . . . , x

tp+1

]

T

∈ R

p

Streaming PCA

I Dimensionality reduction for time series lag embedding I Recursive update for principal subspace

I Linear Principal Component Analysis criterion : Given X

p

∈ R

T×p

, w

p

is defined as 1-D projection capturing most of the energy of samples :

w

p

= arg min

kwk=1

P

T

t=1

k X

tp

− (ww

T

) X

tp

k

2

Multiscale streaming PCA

Algorithm 1 Streaming PCA

Initialization: w

j

← 0, σ

j2

← ϵ with ϵ 1

for t = 1, . . . , T do for j = 1 , . . . , J do

Z

tj

← H

T

2j

X

tj

y

tj

← w

Tj

Z

tj

σ

j2

← σ

j2

+ (y

tj

)

2

e

jt

← Z

tj

− y

tj

w

j

w

j

← w

j

+ σ

j2

y

tj

e

jt

Z H

tj

← w

Tj

Z

tj

w

j

α

tj

← k Z H

tj

− Z

tj

k

2

end for

end for

return ααα ∈ R

T×J

Given x (t ) ∈ R for t = 1 : T I We evaluate the lag-matrix

X ∈ R

T×p

where p = 2

j

.

I For each X

t

∈ R

p

we perform a change of basis Z

t

: = Φ

T

X

t

I H refers to a unitary tx. that can . localize a deviation from the

local mean and variations.

. Preserve the variance.

I Haar transform Φ = H : H

2N

=

12

"

H

N

⊗ [ 1 , 1 ] I

N

⊗ [ 1, − 1 ]

#

Hierarchical (Approximation) PCA

2j1 2j1 2j1 2j1 Future

wTj1· wTj1·wTj1· wTj1· wTj · wTj ·

wTj+1·

For scale p

j+1

= 2p

j

, a reduced lag matrix Z

tj+1

is obtained by projection of each component of size p

j

on the principal direction obtained at this scale, i.e. Z

tj+1

= [ w

Tj

Z

tj

, w

Tj

Z

j

t−2j

]

T

with Z

t1

= X

t1

. The principal direc- tion is updated after this step.

Global flow

Input x

t

X

t2j=1

X

t2j=2

X

t2j=J

Update w

1

Update w

2

Update w

J

α

t1

= k X

t1

− X f

t1

k

α

t2

= k X

t2

− X f

t2

k

α

tJ

= k X

tJ

− X H

tJ

k

Norm k ααα

t

k

2

2nd iteration k ααα H

t

− ααα

t

k

2

Least corr. scale arg min

j

P

i

( ααα

T

ααα )

ji

... ... ... ...

AUC performances across different aggregation methods

Performance Evaluation : Area under the receiver operators characteristics curve (AUC) (FPR vs TPR), 0 (worst) & 1 (perfect).

Effect of 2nd iteration of Streaming PCA

Least correlated scale Vs 2nd iteration of Streaming PCA :

I Least correlated scale and 2nd iteration of streaming PCA decorrelates the reconstruction error across scales.

I The least correlated scale performs better when there is a single scale

structure across time series. Second iteration performs better when there are multiple scales.

Future work

I Understand bounds on reconstruction error ααα (t ) for Streaming PCA.

I Better base-line by comparing with streaming covariance estimation.

I Probabilistic anomaly score using gaussian neg-log score.

I Recursively calculable multi-scale time series representation H

T

X

t

to model long range dependencies.

References

I Papadimitriou et al. Optimal multi-scale patterns in time series streams, 2006 ACM SIGMOD

I Bin Yang, Projection approximation subspace tracking. Trans. Sigal Processing, Jan. 1995

I Evaluating real-time anomaly detection algorithms, Numenta benchmark, ICMLA 2015

https://beedotkiran.github.io/anomaly.html beedotkiran@gmail.com

Références

Documents relatifs

Instead of using axioms to define the program operation of parallel composition in the language of PDL enlarged with propositional quantifiers, we add an unorthodox rule of proof

trée dans l'établissement. Comme si elle n'était pas autorisée à sortir de nuit... Mais en admettant que l'heure de fermeture, par exemple, soit retardée à 24 heures,

Collectively, these studies and ours reveal a role for NK cells as mediators of an ‘‘innate cytokine balance’’ for the optimal generation of the immune response, whereby NK cells

Mobilized peripheral blood stem cells compared with bone marrow as the stem cell source for unrelated donor allogeneic transplantation with reduced-intensity conditioning in

They could be impacted even by shallow small-volume partial dome collapse (our dolomieu and south scenarios) if water is incorporated in the debris avalanche, increasing

Jointly esti- mating the positions and correlations of sources is a com- putational challenge, both in memory and time complexity since the entire covariance matrix of the sources

As hinted in my editorial I would see the practical role of NUMAT for cal- culating Embedded Value rather than for calculating Premiums.. As the classical initial net reserve is nil

Toutefois, certains facteurs (score de Glasgow, signes pupillaires, délai d’intervention, délai d’aggravation, âge, localisation et nature des lésions) semblent