Anomaly detection in mixed telemetry data using a sparse representation and dictionary learning

(1)

OATAO is an open access repository that collects the work of Toulouse

researchers and makes it freely available over the web where possible

Any correspondence concerning this service should be sent

to the repository administrator:

[email protected]

This is an author’s version published in:

http://oatao.univ-toulouse.fr/25360

To cite this version:

pilastre, Barbara and Boussouf, Loic and Escrivan, Stéphane d'

and

Tourneret, Jean-Yves

Anomaly detection in mixed telemetry data using a

sparse representation and dictionary learning. (2020) Signal Processing, 168.

1-10. ISSN 0165-1684

(2)

Anomaly detection in mixed telemetry data using a sparse

representation and dictionary learning

Barbara Pilastrea_·_•_{. Loïc Boussouf}h_{, Stéphane D'Escrivan}d_{, Jean-Yves Tournerec}c_,a •Té.SA laborarory, 7 boulevard de la Gare, Toulouse, 31500, France

b Airbus Defense and Space, 31 rue des Cosmonaures. Toulouse 31400, France <JNP-ENSEEJHT/IRIT. 2 rue Otaries Carnichel Toulouse, 31071, France

d Cenrre Narional d'Erudes Spatiales (CNES), 18 av Edouard Belin, Toulouse CEDEX 9 31401, France

ARTICLE INFO ABSTRACT

Keywords:

Spacecraft health monitoring Anomaly detection

Sparse representation Dictionary learning Shirt-Invariance

Spacecraft he.alth monitoring and failure prevention are major issues in space operations. ln recent years, machine le.aming techniques have received an increasing interest in many fields and have been applied to housekeeping telemetiy data via semi-supervised leaming. The idea is to use past telemetry describing normal spacecraft behaviour in order to learn a reference mode( to which can be compared most recent data in order to detect potential anomalies. This paper introduces a new machine learning method for anomaly detection in telemetry time series based on a sparse representation and dictionary Jearning. The main advantage of the proposed method is the possibility to handle multivariate telemetry time series described by mixed continuous and discrete parameters, taking into account the potential correlations be tween these parameters. The proposed method is evaluated on a representative anomaly dataset obtained from real satellite telemetry with an available ground-truth and compared to state-of-the-art algorithrns.

1. Introduction

Spacecraft health monitoring and failure prevention are major

issues in space operations. By monitoring housekeeping telemetry

data, an anomaly affecting an equipment, a system or a sub-system

can be detected from the abnormal behaviour of one or several

telemetry parameters. A simple method for detecting anomalies in

telemetry is the well-known out-of-limits (OOL) checking, which

consists of defining an upper and a lower bound for each param

eter and checking whether the values of this parameter exceed

these bounds. This method is very simple and useful but has also

some limits. lndeed, the determination of bounds for each param

eter can be difficult and costly given the number of spacecraft sen

sors. Moreover, ail anomalies are not detected by the OOL checking,

e.g, when the parameter affected by an anomaly does not exceed

the predefined bounds. An example of anomaly not detected by

OOL checking is displayed in

F

ig. 1

(box #2).

Anomaly detection (AD) is a huge area of research given its di

verse applications. Recent years have witnessed a growing inter

est for data-driven or machine Ieaming (ML) techniques that have

been used as effective tool for AD

(1-5).

Motivated by this success,

some ML methods have been applied to housekeeping telemetry

• Corresponding author.

E-mail address: [email protected] (B. Pilastre).

hnps :/ /doi.org/ 10.1016/j.sigpro2019.107320

after an appropriate preprocessing step

(6-10).

These methods usu

ally consider a semi-supervised learning that can be outlined in

two steps: 1) learning from past telemetry describing only nomi

nal spacecraft events and 2) detecting abnormal behaviour in the

different parameters by an appropriate comparison to the mode(

Iearned in step 1 ).

ML-based algorithms for AD in telemetry can be divided in two

categories depending on their application to univariate or multi

variate data. Univariate AD strategies process the different teleme

try parameters independently, which is the most widely used ap

proach. Popular ML methods that have been investigated in this

framework include the one-class support vector machine

(7)

, near

est neighbour techniques

(8-10)

or neural networks

[11.12)

. These

solutions showed competitive results and improved significantly

spacecraft heath monitoring. However, in order to improve AD in

telemetry, it is important to formulate the problem in a multivari

ate framework and take into account possible correlations between

the different parameters, allowing contextual anomalies to be de

tected. An example of contextual anomaly is shown in

Fig. 1

(box

#7). The detection of this kind of abnormal behaviour requires a

multivariate detection rule. Sorne recent multivariate AD are based

on feature extraction and dimensionality reduction

(13)

or on a

probabilistic model for mixed discrete and continuous telemetry

parameters

(14)

.

This paper studies a new AD method based on a sparse data

representation for spacecraft housekeeping telemetry. This method

(3)

Fig. 1. Examples of univariate and multivariate anomalies (highlighted in red boxes) that are considered in this work. (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

is inspired by the works conducted in [15]. However, it has the advantage of handling mixed continuous and discrete data, and taking into account possible correlations between the different telemetryparametersthankstoanappropriatemultivariate frame-work. The proposed algorithm requires to build a dictionary of normalpatterns.New telemetrysignalscan then be decomposed intothisdictionaryusinga sparserepresentationallowing poten-tialanomaliesto be detectedby analyzingthe residuals resulting fromthissparsedecomposition.

The paper is organized as follow. Section 2 introduces the context of AD in mixed telemetry data considered in this work. Section3 brieﬂy summarizesthetheoryofsparserepresentations and dictionary learning. Section 4 introduces the proposed AD methodadapted to mixed continuousanddiscrete telemetry pa-rameters. Section 5 evaluates the performance of the proposed methodusingaheterogeneousanomaly datasetwithacontrolled ground-truth.Acomparisontootherstate-of-arttechniquesshows thepotential of usinga sparse representationon a dictionary of normal patterns for detecting abnormal behaviour in telemetry. Conclusionsandfutureworkarereportedin Section6.

2. Anomaly detection for telemetry 2.1.Characteristicsofspacecrafttelemetry

Spacecraft telemetry consists of hundred to thousand house-keepingparameters. Alltheseparameters are quantized andtake theirvaluesin adiscrete set.However, itmakes sense tomakea distinctionbetweenparameterstakingfewdistinctvalues(suchas equipment operating modesor status), which can be considered asobservationsofdiscrete randomvariables,andparametersthat canbeconsidered asobservationsofcontinuousrandomvariables (suchastemperature, voltage,pressure etc.). Detectinganomalies in telemetry requires to consider these discrete and continuous randomvariables jointlyleadingto whatwe will callmixed vec-tors,in the sense that they contain discrete andcontinuous ran-domvariables.

In order to take into account relationships between the dif-ferentparameters,it isnecessary tolearn their behaviour jointly, which requires to consider vectors belonging to a possibly high dimensional subspace. Considering high-dimensional data leads

to major issues such as the well known curse of dimensional-ity [16,17]. Notealso that thishigh-dimensionality hasbeen con-sideredin manyrecent workssuch asthose devoted tobig data [18,19].

Anotherimportantcharacteristicsoftelemetrydataisthatthey are generallysubjected toseveral preprocessings.As an example, it is quite classical to remove trivialoutliers caused by errors in the dataconversion andtransmission usingsimple outlier detec-tionmethods [14].Some telemetryparameterscan havebeen re-sampled to account for the fact the data have been acquired at different sampling frequencies. In addition, some reconstruction methods mayhave beenapplied to compensate for missingdata [14,20].Finally,itisinterestingtonotethatadditional preprocess-ing isnecessary forthelearningphase to selecttelemetrywhich describesonlyusualnormalbehaviorofthespacecraft.Indeed, be-haviorsrepresentingrareoperations,e.g.,destockingorequipment calibration operations(abnormal inan other context) arenot se-lectedforlearning.

2.2. Anomaliesintelemetry

Anomaliesthatoccurinhousekeepingtelemetrydatacanbe di-videdin twocategories thatcan be referred toasunivariate and multivariateanomalies.Univariateanomaliescorrespondtoan un-usual individual behaviour (never seen before) affectingone spe-ciﬁc parameter. Univariate anomalies can be classiﬁed in three maincategories [1] summarizedbelow

• Collectiveanomalies:acollectionofconsecutivedatainstances ortimeseriesconsideredasanomalouswithrespecttothe en-tiresignal.Twoexamplesofcollectiveanomaliesaredisplayed in Fig.1 (boxes#1and#4).

• Point anomalies: an individual data instance considered as anomalouswithrespecttotherestofthedata.Apointanomaly istheeasiest to detectbecauseit corresponds toan excessive valueofindividualsamples.Itisnotnecessarytoobservea col-lectionof time samplesto detect thiskind of anomaly.Point anomalies can be generally detected by simple thresholding, e.g., usingthe OOL AD method.Two examples ofconsecutive pointanomaliesaredisplayedin Fig.1 (boxes#3and#5).

(4)

• Univariatecontextualanomalies:anindividualdatainstanceor atimeseriesconsideredasanomalousinaspeciﬁccontext,but nototherwise. Fig.1 displaysexamplesofcontextualanomalies forconsecutivedatainstances(box#2)ortimeseries(box#6). Note thatcollective anomalies andsome individual contextual anomaliesmaynotbedetectedifdatainstancesareprocessed in-dependently.Thedetectionoftheseanomaliesrequirestoconsider collectionsofdatainstancesortimeseries.

A multivariate or contextual anomaly results from a parame-terwhosebehaviourhasneverbeenobservedjointlywiththe be-haviour ofone orseveralother parameters recorded atthesame time. Fig. 1 (boxes #4 and #7) shows examples of contextual anomalies. Notethat the anomaly of Fig.1 inbox #7is a multi-variatecontextualanomalythataffectsasetoftworelateddiscrete and continuousparameters. Notealso that the top signal is sup-posed toevolve differently dependingon thestatus ofan equip-ment (thatcanbe ONorOFF) associatedwiththebinarybottom parameter.Inthisexample,theexpectedbehaviour ofthe contin-uousparameterisnotobservedintheredbox,whichcorresponds toamultivariatecontextualanomaly.Thedetectionofthiskindof anomaly requirestoworkinamultivariateframework inorderto learn the behaviour ofmultiple parameters. The objectiveof this workistopropose aﬂexibleAD methodabletodetectunivariate aswellasmultivariateanomaliesaffectingtelemetry.

3. Sparse representations and dictionary learning

Sparserepresentationshavereceivedan increasingattentionin manysignalandimageprocessingapplications.Theseapplications includedenoising [21–23],classiﬁcation [24–26] orpattern recog-nition [27–29].The use ofsparse representations forAD ismore original andhasbeen consideredin lessapplicationssuch as hy-perspectralimaging [30],detectionofabnormalmotionsinvideos [31], irregularheartbeat detectionin electrocardiograms (ECG)or specular reﬂectance and shadowremoval in natural images [15]. The next part of this section recalls some basic elements about sparserepresentationsanddictionarylearning.

3.1. Sparserepresentations

Buildingasparserepresentation(alsoreferredtoassparse cod-ing) consists in approximating a signal y _∈_RN _as _y_≈

_x_, _where

∈RN×L _is _an _overcomplete _dictionary _composed _of _L _columns

called atoms, and x ∈RL _is _a _sparse _coeﬃcient _vector. _In _others

words,thesignal y isexpressedasasparselinearcombinationof few atomsof the dictionary

. Once the dictionary

hasbeen determined,thesparserepresentationproblemreducestoestimate thesparsecoeﬃcientvector x bysolvingthefollowingproblem x=argmin

x

y−

x

2

2 s.t.

x

0≤ T (1)

where

.

0 is the 0 pseudo-norm which counts the number of

non-zeroentriesof x ,

.

2isthe2norm,Tistheallowednumber

ofnon-zerosentriesof x and“s.t.” means“subjectto”.

Problem (1) isNP-hardandcanbesolvedbygreedyalgorithms such asmatching pursuit(MP) [32],orthogonal matchingpursuit (OMP)[33],orbyconvexrelaxation.Convexrelaxationreplacesthe 0pseudo-normbythe1 normdeﬁnedby

x

1=Ll=1

|

xl

|

,

lead-ing to a convex problemwhose solution can be computedusing algorithms such astheleastabsoluteshrinkageandselection op-erator(LASSO) [34].

3.2. Dictionarylearning

The qualityoftheapproximation y ≈

x stronglyreliesonthe choice of the dictionary

. Dictionaries can be divided in two

mainclasses corresponding to parametric anddata-driven dictio-naries. Parametricdictionaries arecomposed of ﬁxed atomssuch as wavelets, curvelets, contourlets or short-time Fourier trans-forms.Data-driven dictionarieslearnthedictionaryfromthedata, whichhasshowntobe interesting inmanypractical applications [35].Thispaperfocusesonasemi-supervisedframeworkinwhich thedictionaryislearnedfromcleandatawhichdonotcontainany anomaly.Aclassicwaytolearnadictionaryfromthedataistouse data analysismethods such asthe well known principal compo-nentanalysis(PCA). However, moreeﬃcientdata-drivenmethods fordictionarylearning(DL),oftenreferredtoasDLmethods,have attractedmanyattentioninrecentyears.Thesemethodslearn dic-tionariestailoredforsparserepresentationsbysolvingthe follow-ingproblem

x,

=argmin

x,

y−

x

2

F s.t.

x

0≤ T. (2)

ClassicalDL algorithms alternate between the estimation of x (inaﬁrststepofsparsecoding)andtheestimationof

(ina sec-ondstepofdictionaryupdate).ManyeﬃcientDLalgorithms have beenproposed intheliterature includingK-SVD [36] oronlineDL (ODL) [37].InK-SVD,thesparsecodingstepisdoneusingagreedy algorithm.Inthesecondstep,thedictionaryandthesparsevector areestimated usinga singularvalue decomposition(SVD), allow-ingthecolumnsof

aswellastheassociatedcoeﬃcientsof x to beupdated.TheODLalgorithmhasbeendesignedtolearn dictio-nariesfromlargeanddynamicdatasets,usingasparsecodingstep performedbythe LARS-LASSOalgorithm [38,39] andadictionary updateusingblock-coordinatedescentwithwarmrestarts.

Unfortunately K-SVD and ODL algorithms have not been de-signedformixeddiscreteandcontinuousdataandthuscannotbe used for telemetrydata. Learning a dictionary with discrete and continuousatomsisaninterestingandchallengingproblem. How-ever,thedictionary canalsobe builtfromrepresentativetraining signalsthat are not affected by anomalies. Inthis work, the dic-tionaryhasbeenbuiltfrom“normal” vectors(thatarenotaffected byanomalies)belongingtoatrainingdatabase.Extendingstandard DLmethods (suchasK-SVDandODL)tomixeddatawill be con-sideredinfuturework.

4. Proposed anomaly detection method for mixed telemetry This section describes the proposed Anomaly Detection using DICTionary(ADDICT)algorithmwhichisanAD methodformixed data.Wefocushereonthedetectionstepandassumethatthe dic-tionaryhasbeenlearnedinapreviousstepusingtelemetrysignals associatedwithanormalspacecraftbehaviour.

4.1. Preprocessing

Telemetry timesseries acquiredatthe same time instant and considered as part of the same context are ﬁrst segmented into overlappingwindowsofﬁxed sizewwithashift

δ

(withan over-lappingarea equal tow−

δ

) asillustrated in Fig.2.The resulting matricesarethenconcatenatedintovectorsyieldingmixedvectors whose components are discrete or continuous dependingon the considered parameter.One concatenatedvector thus representsa speciﬁccontext containinginformationfrombothcontinuousand discrete signals on a duration w. Given this preprocessing,input data for AD are mixed signals composed of telemetry time se-riesformedbythedifferentparameters,i.e., y ₌[y T

1,...,y TK]T with y _k∈Rw_,_k₌₁_,_._._._,_K_,_where_K _is_the _number_of_telemetry

param-etersandwisthesizeofthetimewindow.Tosimplifynotations, theND ﬁrstcomponentsofthemixedsignal y ∈RN arecomposed

ofthediscreteparameterswhereasthelastNC componentsare

(5)

Fig. 2. Segmentation of telemetry into overlapping windows.

otherwords,themixedsignalispartitionedintodiscreteand con-tinuouscounterpartsdenoted as y _D₌[y

(

1

)

_,_{. . .}_,y

(

ND

)

]T and y C=

[y

(

ND+1

)

,. . .,y

(

ND+NC

)

]T such that y=

(

y TD,y TC

)

T.

4.2.Anomalydetectionusingasparserepresentation

Amixeddictionary

∈RN×2L_composed_of_discrete_and

contin-uousatomsisdeﬁnedas

=

D 0

0

C

where

D and

C containthe discreteandcontinuousdictionary

atoms, respectively. The two dictionaries

D∈RND×L and

C∈

RN_C×L_have_been_extracted_from_a_dictionary_composed_of_L_mixed

atomstakingintoaccountpossiblecorrelationbetweenthe differ-entparameters, especiallybetweendiscrete andcontinuousones. Inotherwords,thelthdiscreteatomofthediscretedictionary

D

andthe lthcontinuousatomofthe continuousdictionary

C are

composedof discrete andcontinuousbehavioursobserved inthe samemixedatom, whichare potentially correlatedThe proposed ADstrategydecomposesthemixedsignalasfollows

y=

x+e+b (3)

where

x is the nominal part of y , e ₌[e T

D,e TC]T is a possible

anomaly signal (e = 0 in absence of anomaly), x =[x T

D,x TC]T is a

sparsevectorand b ∈RN _is_an _additive_noise. _The_proposed

algo-rithmappliestwo distinctstrategiestoestimate thenominalpart of y processing x D and x C differently.Theanomaly signal e is

esti-matedbyanalyzingresidualsresultingfromthissparse decompo-sition.Theproposeddetectorassumesthattheanomaliesaffecting telemetrydataareadditive,whichisgenerallythecase.Notethat theproposedmodel(3)providesaspeciﬁcstructureoftheresidue

e ,whichallowsitsnonzerovaluestobeidentiﬁed.Thesenon-zero valuescorrespondtotheparametersaffectedbytheanomalies.

Thenominalcomponentof y isapproximatedbylinear combi-nationsofatomsdescribingonlynominalbehavioursofthe differ-entparameters,whichcanbewrittenas

x=

DxD

CxC

. withx D∈RN D

andx C∈RNC The discrete and continuous

counter-parts of the test signals will be approximated by two distinct strategies. However, it is important to preserve existing relation-shipsbetweenthesignal parameterstoallow forthedetectionof contextualanomalies.Tothisend,weproposetoestimatethe dis-creteapproximation

Dx Dandtheanomalysignal e Dinaﬁrststep

(leadingto estimatorsdenoted ase D andx D) andthecontinuous

approximation

Cx C andthe anomaly signal e C in a second step

basedone D andx D.Giventheproposedpreprocessing,thesignals e D and e C are dividedintoKD andKC discreteandcontinuous

pa-rameters,i.e., e D=[e TD,1,...,e TD,K_D]T and e C=[e TC,1,...,e TC,K_C]T.

4.2.1. Sparsecodingfordiscreteatoms

Inordertosolvethesparsecodingfordiscreteatoms,we pro-posetosolvethefollowingproblem

arg min xD∈B,eD∈RND

yD−

DxD− eD

22+bD KD k=1

eD,k

2 (4)

where

e D,k

2,k=1,...,KD is the Euclidean norm, e D,k

corre-spondstothekthtime-seriesof e D associatedwiththekth

param-eterandbD isaregularizationparameterthatcontrolsthelevelof

sparsityof e D. The sparsityconstraintfor the anomaly signal

re-ﬂects the fact that anomalies are rare andaffect few parameters atthe sametime. Notethat thediscrete vector x D isconstrained

to belong to B, where B is the canonical or natural basis ofRL_,

i.e.,B=

{

l,l=1,· · · ,L

}

,where

lisavectorwhoselthcomponent

equals 1 and whose other components equal 0. In other words, only one atom of the discrete dictionary

D ischosen to

repre-sent the discrete signal, thisamounts to looking forthe nearest neighbourof y D in thedictionary. Thisstrategy hasproved to be

an effective method to reconstruct discrete signals (compared to a representationusinga linearcombinationof atoms),which ex-plainsthischoice.Since x D belongstoaﬁniteset,itsestimationis

combinatorialandcanbesolvedforeachatom

φ

D,l (where

φ

D,l is

thelthcolumnof

D)asfollows

eD,l=argmin_e D,l

yD−

φ

D,l− eD,l

22+bD KD k=1

eD,k

2. (5)

The solutionofthe optimizationproblem (5) is classically ob-tainedusingashrinkageoperatore D,l=Tb_D

(

h

)

,with h =y D−

φ

D,l

TbD

(

h

)

k=

_h k

2− bD

hk

2 hk if

hk

2>bD 0 otherwise (6) where h kisthekthpartof h associatedwiththekthparameterfor

k=1,...,KD.Alltheatoms

φ

D,l yielding an anomalysignal equal

tozeroareselectedandthecorrespondingvaluesoflarestoredin asubsetMdeﬁnedas

M=

{

l∈

{

1,· · · ,L

}|

eD,l

2=0

}

. (7)

Notethat _Mcontainsthevaluesofl associatedwiththediscrete atomsof

D that aretheclosestto y D.The regularization

param-eterbDplaysanimportantroleintheatomselectionstepbecause

it ﬁxesthe levelof authorizeddeviationfroma discrete parame-ter andan atom ofthedictionary. Thelower thevalueofbD,the

(6)

thecontinuousnominalsignal y C.Conversely,thehigherthevalue

ofbD,thebetterthenominalestimationof y C,withahigherriskto

breakthelinksbetweendiscreteandcontinuousparametersby se-lectingnon-representative atomsandmissmultivariatecontextual anomalies.

4.2.2. Sparsecodingforcontinuousatoms

The nominalcontinuoussignal isapproximatedusinga sparse linearcombinationofatomscontainedinadictionarydenoted as

M, composed of the continuous atoms

φ

C,l,l=1,· · · ,L whose

discrete parts

φ

_D,_l have beenselected inthediscrete atom selec-tion(i.e.,l∈M).Moreprecisely,whenM=∅,adiscreteanomaly is detected and no sparse coding is performed for continuous atoms. When _M₌_∅_, thecontinuousatoms corresponding tothe elementsof M=∅ areselected anda continuous sparse decom-position is performedusing the resulting representative continu-ous atoms(inview ofthediscrete testsignal y D).These

continu-ous atomsareusedto detectanomaliesinmultivariatecorrelated mixeddata by preservingthe relationshipsbetween discrete and continuous parameters.As a consequence, the sparse representa-tionmodelusedforthecontinuousparametersisdeﬁnedas min xC,e C 1 2

yC−

MxC− eC

2 2+aC

xC

1+bC KC k=1

eC,k

2 (8)

where

x

1=n

|

xn

|

is the 1 norm of x , e C,k corresponds to

the kthtime seriesof e C associated withthekthparameter with

k=1,...,KC,aC andbC areregularizationparametersthatcontrol

the levelofsparsityofthe coeﬃcient vector x C andthe anomaly

signal e C,respectively.Notethat (8) considerstwodistinctsparsity

constraintsforthecoeﬃcientvector x C andtheanomalysignal e C.

Thisformulationreﬂectsthefactthatanominalcontinuoussignal can be well approximated by a linearcombination of few atoms ofthe dictionary(sparsityof x C) andthat anomalies arerare and

affectfewparametersatthesametime(sparsityof e C).

Problem (8) can be solved with the alternating direction method of multipliers (ADMM) [40] by adding an auxiliary vari-able z min xC,e C,z 1 2

yC−

MxC− eC

2 2+aC

z

1+bC KC k=1

eC,k

2 (9)

andthe constraint z ₌ x C.Note that,contraryto Problem (8),the

ﬁrstandsecondtermsof (9) aredecoupled,whichallowsaneasier estimationofthevector x C.TheADMMalgorithmassociatedwith (9)minimizesthefollowingaugmentedLagrangian

LA

(

xC,z,eC,m,

μ

)

= 1 2

yC−

MxC− eC

2 2+aC

z

1 +bC KC k=1

eC,k

2+mTC

(

z− xC

)

+

μ

₂C

z− xC

2F (10)

where m C isaLagrange multiplier vectorand

μ

C is a

regulariza-tionparametercontrollingthelevelofdeviationbetween z and x C.

TheADMMalgorithmisiterativeandalternativelyestimates x C, z , e C and m C.Moredetails abouttheupdateequationsofthe

differ-entvariablesatthekthiterationareprovidedbelow. Updating x C

x Cisclassicallyupdatedasfollows

xk+1 C =argmin_x C 1 2

yC−

MxC− e k C

22+mkC

(

zk− xC

)

+

μ

k C 2

z k_{− x} C

22. (11)

Simplealgebraleadsto xk+1 C =

(

M

T M+

μ

kCI

)

−1

(

T MrkC+mkC+

μ

kCzk

)

(12) where r k C= y C− ekC. Updating z

Theupdateof z isdeﬁnedas zk+1₌_arg_min z aC

z

1+

(

m k C

)

T

(

z− xkC+1

)

+

μ

k C 2

z− x k+1 C

2 2. (13)

Thesolutionof (13) isgivenbytheelement-wisesoftthresholding operator zk+1=S_γk

xk_C+1− 1

μ

k C mkC

with

γ

k₌ aC μk C,

where the thresholding operator S_γ(u ) is deﬁned by

S_γ

(

u

)

=

_u

₍

_n

₎

₋

_γ

_if_u

₍

_n

₎

_>

_γ

0 if

|

u

(

n

)

|

≤

γ

u

(

n

)

+

γ

ifu

(

n

)

<−

γ

(14)

where u (n)isthenthcomponentof u . Updating e C

The errorvector e isalso updated usingthe shrinkage opera-toralreadydeﬁnedin (6) forthesparsecodingofdiscreteatoms, i.e.,ase C=Tb_C[y C−

Mx C]TheADMMresolutionof (9) isdetailed

in [15] and summarized in Algorithm 1 (theoretical convergence propertiesaredetailedin [41]).

Algorithm 1 x ,e ,z ,m =ADMM(y ,

,

μ

,

ρ

,a,b). Initialisation: k=1, z 0_,_e0_,_m0_,

_μ

0_,

_ρ

_,

_,_a_,_b repeat x k+1₌

₍

T

+

μ

k_I

₎

−1_[

T

₍

y − ek

₎

₊_mk₊

_μ

k_zk_] z k+1₌_S_γ

₍

_xk+1₋ 1 μkm k

₎

_,

_γ

₌ a μk e k+1₌_T b

(

y −

x

)

m k+1₌_mk₊

_μ

k

₍

_zk+1_{− x}k+1

₎

μ

k+1₌

_ρμ

k k=k+1 until stopcriteria

4.2.3. Proposedanomalydetectionstrategy

The estimated anomaly signale associated withthe test sig-nal y is builtby theconcatenationof itsdiscrete andcontinuous counterpartse =

(

e T

D,e TC

)

T.Theproposedanomalydetectionruleis

describedbelow:

Adiscrete anomalyisdetectedif

e D

2>0(i.eM=∅).

More-over, a continuous or contextual anomaly is detected when

e C

2>SPFA.

whereSPFA isathresholddependingontheprobabilityoffalse

alarmofthedetector.Thisthreshold canbe adjustedby theuser ordeterminedusingreceiveroperatingcharacteristic(ROC)curves ifa ground-truthis available (which willbe the case inthis pa-per).Notethat theset_M isused todetect anomaliesindiscrete datawhen itreduces tothe empty set,i.e., whenM=∅, andto extractthecontinuousatomscorrespondingtotheelementsofM when_M₌_∅.Thesecontinuousatomsareusedtodetectanomalies inmultivariate correlated mixeddatausing thedecision rule(8), which preservesthe relationshipsbetween thediscrete and con-tinuousparameters.Moreprecisely,theﬁrststepofthealgorithm (detailedin Section4.2.1) considersthediscretepartof y ,namely y D,anddetects potentialanomaliesaffectingthediscrete

parame-ters.Inthesecondpartofthealgorithm(detailedinSection4.2.2), (8) isusedtodetectunivariatecontinuousanomaliesand contex-tual discrete/continuous anomalies using the continuous part of

(7)

Algorithm 2 Anomalydetectionruleinmixedtelemetryusinga sparserepresentation(y ,

C

M,

τ

max).

DiscreteModelandAtomSelection for l₌1toL do e D,l=Tb_D

y D−

φ

D,l

end for M=

{

l_∈

{

1_,_._._._,L

}|

e _D_,l

2=0

}

DiscreteAnomaly:ifM=∅,adiscreteanomalyisdeclared IfM=∅,thealgorithmconsidersacontinousmodel ContinuousModel

M=

{

C,l

|

l∈M

}

AnomalyDetection e =

(

e T

D,e TC

)

T

Joint Anomaly: if M=∅ and

e C

2>SPFA, a joint discrete/

continuousanomalyisdetected

theatomsselectedintheﬁrststep(denotedas

_M).Thetwosteps ofthealgorithmsaresummarizedbelowandin Algorithm2.

• First step : the discrete anomaly detection looks for the anomaly vectors e D,l,l=1,...,L resulting from the discrete

sparse decomposition that are equal to 0and buildsa subset Mdeﬁnedas

M=

{

l∈

{

1,· · · ,L

}|

eD,l

2=0

}

. (15)

WhenthesetMisempty,adiscreteanomalyisdetected.

• Second step : Theset M isused tobuild a dictionary of con-tinuousatoms(denotedas

_M₌

{

φ

_C_,_l_,l₌1_,_{· · · ,}L

}

)associated withthediscreteatomsselectedintheﬁrststep,i.e.,

{

φ

_D_,l,l= 1,· · · ,L

}

.Thisatom selectionallowsthecontinuoussparse de-compositiontobe performedusingonlyrepresentative contin-uousatoms(inviewofthediscretetestsignal y D).Asa

conse-quence,contextualanomaliesbetweendiscreteandcontinuous parameterscanbedetected.

4.3.Shift-Invariantoption

The proposed method hasa shift-invariance (SI)optionalstep that can be activated for the discrete model and for continuous atomselection.ThisSIoptionallows apossibleshiftbetweenthe data of interest and the atoms of the discrete dictionary to be mitigated.Notethatthisoptioncouldalsobe appliedto continu-ousdata.However,sinceitincreasesthecomputationalcomplexity signiﬁcantly,ithasonly beenconsidered fordiscrete data inthis work.The SIoptionconsistsofbuildingan overcompletediscrete dictionaryby applyingshifts toall the discreteatomsof the dic-tionary.Inotherwords,eachdiscreteatom

φ

D,l isshiftedof

τ

lags

tocreate a new discrete atom

φ

D,l−τ, with

τ

∈

{

−

τ

max,−(

τ

max−

1

)

,. . .,−1,0,1,...,

τ

max− 1,

τ

max

}

. Note that the maximum shift

τ

max hastobe ﬁxed bythe user.Byactivating theSI option,the

sizeofthediscretedictionaryincreasesfromLto2L

τ

max+Latoms.

Thisoptionispotentiallyinterestingsinceitallowsmore represen-tative atomsto be considered for the estimation of the nominal signalandforatomselection.

5. Experimental results 5.1.Overview

The ﬁrst experiment considers a simple dataset composed of KD=3 discrete and KC=7 continuous parameters with an

availableground-truth. Thedictionary wasconstructed usingtwo months of nominal telemetry (without anomalies), which repre-sentsapproximately 30000 mixed trainingsignals obtained after

applyingthepreprocessingdescribedin Section4 withthe param-eters

δ

=5andw=50(i.e.,the signal lengthis N=500). As ex-plained before, the existing dictionary learningmethods such as K-SVDorODLhavenotbeendesignedformixeddiscreteand con-tinuous data.In thiswork, we built thedictionary of mixed dis-crete and continuous parameters as follows: 1) the dictionary is initializedwithL₌2000trainingsignalsselectedrandomlyinthe trainingdatabase(the choice of Lwillbe discussed later), 2) the proposedsparsecodingalgorithmisappliedtothetrainingdatato determinethe sparse representation

x andselecttheLtraining signalshaving thehighestresiduals

y −

x

.This process is re-peated100timesandtheLsignalsmostoftenselectedamongthe iterationsareselectedasthecolumnsofthemixeddictionary.

TheperformanceofthedifferentADmethodsisevaluatedusing atestdatabaseassociatedwith18daysoftelemetry,i.e.,composed of1000signalsincluding90affectedby anomalies.Notethat the 90 anomaly signalsof the dataset are divided in 7 anomaly pe-riodswith various durationsdisplayedin Fig.1. Notealso that a speciﬁc attentionwasdevoted to the construction ofa heteroge-neoustestdatabasecontainingallkindsofanomalies,i.e., univari-atediscrete andcontinuousanomalies andtwo multivariate con-textual anomalies. Finally,it isimportantto note that the major-ityoftheseanomalies are actualanomalies observed inoperated satellites. This work investigates four AD methods whose princi-plesaresummarizedbelow

• Theone-class supportvector machine (OC-SVM)method [42]: the OC-SVM algorithm was investigated in a multivariate framework byusing input vectorscomposed ofmixed contin-uousanddiscreteparameters.Theinputvectorswereobtained usingthepreprocessingstepdescribedin Section 4.A.Denote as y _∈_RN _one_of_these_input_vectors_obtained_by_{concatenating}

timeseriesofthedifferenttelemetryparameters.The strategy adopted by OC-SVM is to map the training data in a higher-dimensionalsubspace_{H using}a transformation

ϕ

,andtoﬁnd alinearseparatorinthissubspace,separatingthetrainingdata (consideredasmostlynominal) fromtheoriginwiththe max-imummargin.The separatorisfound bysolving thefollowing problem min w ,ρ,Ei 1 2

w

2+ 1

ν

N N i=1 Ei−

ρ

s.t.

w,

ϕ

(

y

)

H≥

ρ

− Ei,Ei≥ 0 (16)

where w isthenormalvectorto thelinearseparator,_<_.,_._>_H istheHilbertinnerproductforH whereH isequipedwith re-producingkernel

ρ

istheso-calledbias,Ei,i=1,...,N areslack

variables(whichare equalto0when y satisﬁestheconstraint and are strictly positive when the constraint is not satisﬁed) and

ν

isarelaxationfactorthatcanbeinterpretedasthe frac-tionoftrainingdataallowedtobeoutsideofthenominalclass. Notethattheparameter

ν

hastobeﬁxedbytheuser.The ker-nelusedinthisworkistheGaussiankerneldeﬁnedas

k

(

y,y

)

=exp

−γ y− y

2

₍₁₇₎

where

γ

isa parameter(also adjustedbytheuser)controlling the regularity of the separator. Once the separator has been found, determining whethera newdata vector is nominalor abnormalisan easytaskasitonlyconsistsoftestingwhether thisvectorfallsinsideoroutsidetheseparatingcurve.Inother terms,thedecisionrulecanbeformulatedasfollows

f

(

y

)

=sign[k

(

w,y

)

−

ρ

] (18)

wheresignisthefunctiondeﬁnedby sign

(

y

)

=

_{−1 if}_y_<₀

0 ify=0 1 ify>0

(8)

Notethatananomalyscorecanbedefinedasthedistance be-tweenthetestvector y andtheseparatorwithapositivescore if f(y)<0 and a score equal to zero if f(y)>0. This anomaly scoreinthefirstcaseisdefinedby

a

(

y

)

=

ρ

− k

(

w,y

)

w

. (20)

Finally,wewouldliketomentionthattheparameters

ν

and

γ

havebeentunedbycrossvalidationinthisstudy.

• Mixtureofprobabilisticprincipalcomponentanalyzersand cat-egorical distributions(MPPCAD) [14]: thisis amultivariate AD methodbasedonprobabilisticclusteringanddimensionality re-duction.The input data vector y isdivided intotwo parts as-sociated withcontinuousanddiscrete vectorsdenotedas y C∈

RK_C _(containing _one _data _instance_of _each _continuous

param-eter) and y D∈RKD (containing one data instanceof each

dis-crete parameter) acquiredat thesame time instant. Each dis-creteparameter y _D

(

j

)(

j₌1_,_{. . .}_,KD

)

takesits valuesintheset

{

1,. . .,Mj

}

containing Mj different values. MPPCAD assumes

thatthevectorofcontinuousvariablesisdistributedaccording to a mixture of Gaussian distributions and that the vector of discrete variablesis distributedaccordingto amixture of cat-egorical distributions. This assumption leads to the following probabilitydensitydistributionforthecontinuousdata y C p

(

yC

|

C

)

=

G

g=1

π

gN

(

yC

|

μ

g,Cg

)

(21)

where the gthGaussian distribution hasmean vector

μ

g and

covariancematrix C g= W gW Tg+

σ

g2I KC with W gthefactor load-ingmatrix,

σ

2

g thenoisevariance, I K_C istheidentitymatrixof

sizeKC× KCand

π

gthepriorprobabilityofthegthcluster.The

distributionofthediscretedatabasedonamixtureof categor-icaldistributionsisdeﬁnedas

p

(

yD

|

D

)

= G g=1

π

g KD j=1 Cat

(

yD

(

j

)

|

θ

g, j

)

(22)

whereCat(.)isthecategoricaldistribution, y D(j)isthejth

com-ponentof y D and

θ

g, j=[

θ

g, j,1,...,

θ

g, j,Mj]denotes the

parame-tervectorsofthecategoricaldistributions,i.e.,P

(

y D

(

j

)

=l

|

g

)

=

θ

g, j,l.Finallythejointdistributionofthemixeddataisobtained

assumingindependencebetween y Cand y D

p

(

y_C,y_D

|

)

=p

(

y_C

|

C

)

p

(

yD

|

D

)

(23)

where

=

{

T

C,

TD

}

T,

C=

{

π

g,

μ

g,W g,

σ

g2,g=1,...,G

}

and

D=

{

θ

g, j,g=1,...,G,j=1,...,Mj

}

.Theunknown parameter

vector

can be classically estimated using the Expectation-Minimization (EM) algorithm [43] yielding an estimator de-noted as

. The EM algorithm was initialized using k-means clustering following Ding’s method [44]. The authors of Yairi etal. [14] proposedtoestimatethenumberofclusterKandthe dimensionalityofthecontinuouslatentspaceLusingheuristic rules. More precisely, the value of L wastuned usingthe so-called“elbow-law” afterapplyingaprincipalcomponent analy-sistothecontinuousdata.ThenumberofclustersKwas man-uallyestimatedbasedonthescatterplotoftheprincipal com-ponentscores.Finally,itisinterestingtonotethatan anomaly scorea

(

y C,y D

|

)

canbedeﬁnedastheminusloglikelihoodof

themixeddata

a

(

yC,yD

|

)

=−ln p

(

yC,yD

|

)

. (24)

• NewOperationalSofTwaReforAutomaticDetectionof Anoma-lies based on Machine-learning and Unsupervised feature Se-lection(NOSTRADAMUS):thisisaunivariatemethoddeveloped bythefrenchspaceagencyCNESbasedontheOC-SVMmethod appliedtoeachtelemetryparameterindividually [7].Theinput

data are vectors of features (mean, median, minimum, maxi-mum,standarddeviation...)computedontimewindows result-ing froma segmentation forthese parameters on a fixed pe-riodoftime.Differentfeaturesarecomputeddependingonthe discrete or continuous nature of the parameter. The OC-SVM method requires to define an appropriate kernel, which was chosen as the Gaussian kernel in [7]. An anomaly score was also defined in orderto quantify the “degree of abnormality” of any test vector. This degree of abnormality corresponds to thedistancebetweenthisvectorandtheseparatornormalized to[0,1]inordertoprovideaprobabilityofanomaly.Giventhe univariate frameworkofNOSTRADAMUS, ascoreisassignedto each parameter and is denoted as a

(

y k

)

for the kth

parame-ter.Inordertocomparewithmultivariate ADmethodsstudied inthiswork,wedeﬁneamultivariatescoreforNOSTRADAMUS correspondingtothesumoftheunivariatescores

a

(

y

)

=

K

k=1

a

(

yk

)

(25)

whereKisthenumberofparameters.

• ADDICT: the proposed strategy is a multivariate AD method based on a sparse decomposition of any test vector y on a DICTionary (ADDICT) of normal patterns. The dictionary is learned from mixed training signalsassociated witha period oftimewherenoanomalywasdetected.Theinputdataofthis method are mixedvectors composed of telemetryparameters acquired during the same period of time. The preprocessing applied to the vector y and the AD algorithm were detailed in Section 4. An anomaly score can also be deﬁned for this method

a

(

y

)

=

−1 if

eD

2>0(i.e.M=∅

)

eC

2 otherwise . (26)

Alltheregularizationparameters(a_,bC,bD)weredeterminedby

crossvalidationinthisstudy.Atthispoint,itisworth mention-ing that it might be interesting to consider other approaches suchasBayesianinference [45] toestimatetheseregularization parameters.

5.2.Performanceevaluation

Thissection compares detection results obtainedwiththe AD methods summarized inthe previous section when they are ap-plied on an anomaly dataset with available ground-truth. Fig. 3 showsthe differentanomaly scoreswithground-truthmarkedby red backgroundsfor OCSVM(a), MPPCAD (b),NOSTRADAMUS (c) andthe proposed method ADDICT (d).The higherthe score, the highertheprobability ofanomaly.Foreach method,thedetection rule compares the anomaly score to a threshold anddetects an anomalyifthisscoreexceeds anappropriate threshold.Inan op-erational context,the threshold can be set in orderto obtain an acceptableprobability ofdetectionby constrainingtheprobability offalsealarmtobeupper-bounded,sincedetectingtoomanyfalse alarmsisaproblemforoperationalmissions.Inthispaper,we de-terminedthethresholdassociatedwiththevalueofthepair (prob-abilityoffalse alarm PFA,probability ofdetection PD) located the

closestfromtheidealpoint (0,1). Fig.3 showsthat point anoma-lieslocated inboxes #3and#5of Fig.1)arewell detectedby all themethods.Indeed,thescoresreturnedby allthemethods dur-ing thisanomaly periodare signiﬁcantly higherthan the average score.The second anomaly (box #2 in Fig. 1 and indicated as 2 in Fig.3) is a univariate anomaly that is also relatively well de-tectedbyall themethods.Theﬁrstcollectiveanomaly(box #1in Fig.1), which corresponds to an abnormal duration ofa discrete

(9)

Fig. 3. Anomaly scores for the dataset with ground-truth marked by red back- ground. OCSVM (a), MPPCAD (b), NOSTRADAMUS (c) and ADDICT (d). (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

Fig. 4. ROC curves of OC-SVM, NOSTRADAMUS, MPPCAD and ADDICT for the anomaly dataset.

Table 1

Values of P D and P FA for OCSVM, MPPCAD, NOS- TRADAMUS and ADDICT.

Method Threshold PD PFA OC-SVM 0.0016 89% 12.3% MPPCAD 12 67% 25% NOSTRADAMUS 29 77.26% 6% ADDICT ( τmax = 0 ) 3.8 84.6% 9.8% ADDICT ( τmax = 5 ) 3.7 89% 10.2%

parameter, is only detected by NOSTRADAMUS. This non detec-tionby MPPCADcanbe partlyexplainedbythefact thatthis ap-proachprocessestimewindowsoflengthw=1whereasthiskind ofanomalyclearlyrequirestoconsiderlongertime windows.The fourthanomaly(box #4in Fig.1)can beclassifiedasacollective anomalyifweconsideritsabnormalduration,orasamultivariate contextual anomaly ifwe consider the abnormal jointbehaviour ofthe twodiscrete parameters. Thisanomaly isonly detected by NOSTRADAMUS. This result isdue to the fact that anomalies af-fecting discrete data are poorly managed by the multivariate AD methods. Note that in thisfirst experiment, the SI optionof the proposedmethodwhichaimsatsolving thisproblemwasnot ac-tive. Onthe other hand,the sixth anomaly (box #6 inFig 1 and referred to as 6 in Fig.3), corresponding to a univariate contex-tual anomaly that occurs on continuous data,is detected by the proposedmethodbutnotbytheothers.Thenondetectionofthis anomaly by MPPCAD can be explained by the same arguments usedforthecollectiveanomaly.Moreover,thisanomalyisnot de-tectedbyNOSTRADAMUSsinceitdoesnotaffectsignificantly fea-turesthatformtheinputvectorofthisalgorithm.Finally,thelast anomalycorrespondingtoamultivariatecontextualanomalyfora continuousparameter(labelled7in Fig.3 andlocatedinbox#7of Fig.1),isperfectlydetectedbyOCSVMandADDICT.However,itis lesssignificantforMPPCADandisnotdetectedbyNOSTRADAMUS, whichisnotabletohandleanomaliesduetocorrelationsbetween thedifferentparameters.

Quantitative results in terms of probability of detection and probability of false alarm are givenin Fig.4 which displaysROC curves ofthe four methods for theanomaly dataset.ROC curves werebuiltusingground-truth.Theperformancescorrespondingto the pair (probability of false alarm PFA, probability of detection

PD) located theclosest from theideal point (0,1)are reportedin Table1.Ourcommentsaresummarizedbelow

• Fewfalse alarms are generatedby theMPPCAD algorithmbut animportantproportionofanomaliesfromthedatabaseisnot detected.

(10)

Fig. 5. Anomaly scores obtained with ADDICT for the dataset with ground-truth marked by red background and the shift-invariance option enabled with τmax = 5 . (For interpretation of the references to colour in this ﬁgure legend, the reader is referred to the web version of this article.)

• TheOC-SVMmethoddetectsthemostseriousanomalies affect-ingcontinuousparameters butis notableto detectanomalies associatedwithdiscreteparameters.

• NOSTRADAMUSisabletodetectthemajorityoftheunivariate anomaliesbutfailsforthemultivariateones.Itwouldbe inter-esting to adapt NOSTRADAMUS to detect multivariate anoma-liesinmixeddata.

• TheresultsobtainedwithADDICTareveryencouragingwitha highprobability ofdetectionPD=0.846and asmall

probabil-ityoffalse alarmPFA=0.098.Theseresultsareimprovedwith

theSI optionleadingtoPD=0.89andPFA=0.102, which

cor-respondstothebestperformanceforthisdataset. 5.3. Shiftinvariantoption

ThissectioninvestigatestheusefulnessoftheSIoptionforthe proposed method. Fig.5 displaysthe anomaly scores of the pro-posed methodwith a maximumallowed shift

τ

max=5.By

com-paringthese resultswiththose in Fig.3(d) obtainedwithout us-ing theSI option(i.e.,with

τ

max=0),we observethat theSI

op-tionallowsanomaliesaffectingdiscreteparameterstobedetected (anomalies#1and#4in Fig.1).Inaddition,theactivationofthe SIoptiondecreasesscoresofnominalsignals,whichisan interest-ingproperty. Moreprecisely,70%ofnominalsignalsyieldalower anomaly scorewhentheSIoptionisenabled.Reducingthisscore allowsthenumberoffalsealarmstobedecreased,improvingthe performanceoftheproposedmethod.

The maximumallowed shift

τ

max wasdetermined usingROCs

that expressthe probability ofdetection PD asa function ofthe

probability offalsealarmPFA. Fig.6 showsROCsfordifferent

val-uesof

τ

max showingthat

τ

max=5leadstoagoodcompromisein

terms of performance and computational complexity (the higher

τ

maxthehighertheexecutiontime).Theseresultsconﬁrmthe

im-portanceoftheSIoptionfortheproposedmethod. 5.4. Selectingthenumberofatomsinthedictionary

ThissectionexplainshowtheproposedmethodADDICTselects thenumberofatomsinthedictionary.Intuitively,themoreatoms inthe dictionary, thebetterthe sparserepresentationofnominal signalsandthe lower theprobability offalse alarms. Our experi-mentshaveshownthattheanomaliesarealsobetterapproximated whenthenumberofatomsinthedictionaryincreases.

Fig.7 showsthevaluesofPD andPFAreturnedbytheproposed

methodADDICTversusthenumberofatomsinthedictionary.The performance startsby improving when the numberof atoms in-creases. Forinstance,moving from100 to 2000atoms allows PD

Fig. 6. ROC curves for ADDICT with different values of τmax ∈ {0, 1, 3, 5, 8}.

Fig. 7. Values of P D (top) and P FA (bottom) versus the number of dictionary atoms L .

to increase from77.78% to88.89% andPFA to decrease from26%

to5.72%,whichisasigniﬁcantimprovement.Beyond2000atoms, the detectionperformance does not improve, which explains the choiceL=2000inourexperiments.Thisanalysisemphasizesthat choosingthe numberof atomsin thedictionary is importantfor ADusingADDICT.

(11)

6. Conclusion

Thispaperinvestigatedanewdata-drivenmethodforanomaly detectioninmixedhousekeepingtelemetrydatabasedonasparse representationanddictionary learning. Theproposed methodcan handle mixed discrete and continuous parameters that are pro-cessedjointlyallowingpossiblecorrelationsbetweenthese param-eterstobe captured.The approach wasevaluated on a heteroge-neousanomaly dataset with available ground-truth. The ﬁrst re-sultsdemonstratedthecompetitivenessofthisapproach with re-specttothestate-of-the-art. Ourexperimentsshowedthe useful-nessofa shiftinvariantoption forthe detectionofanomalies af-fecting discrete parameters, leading to a signiﬁcant reduction in theprobabilityoffalsealarm.

For future work, different issues might be investigated. The mostchallengingtaskisthedictionarylearningstep,whichshould be adapted to mixed discrete and continuous data. Another re-search prospect is the potential use of sparse codes or anomaly signalstoidentifythecausesoftheanomaly.Finally,wethinkthat integratingthe feedbackofusers inthealgorithm mightimprove thefuturedetectionofanomalies,e.g.,byreducingthenumberof falsealarms.Thisopensthewayformanyworksrelatedtoonline orsequentialanomalydetection, whichcould beusefulfor space-crafthealthmonitoring.

Declaration of Competing Interest None.

Acknowledgements

The authors wouldlike tothank Pierre-BaptisteLambert from CNESandClémentineBarreyrefromAirbusDefenceandSpacefor fruitfuldiscussionsaboutanomalydetectioninspacecraft teleme-try. This work was supported by CNES and Airbus Defence and Space.

References

[1] V. Chandola , A. Banerjee , V. Kumar , Anomaly detection: a survey, ACM Comput. Surv. 43 (3) (2009) .

[2] V. Chandola , A. Banerjee , V. Kumar , Anomaly detection for discrete sequences: a survey, ACM Comput. Surv. 24 (5) (2012) .

[3] M. Pimentel , D. Clifton , L. Clifton , L. Tarassenko , A review of novelty detection, Signal Process. 99 (2014) 215–249 .

[4] M. Markou , S. Singh , Novelty detection: a review - part 2: neural network based approaches, Signal Process. 83 (12) (2003) 2499–2521 .

[5] M. Markou , S. Singh , Novelty detection: a review - part 1: statistical approaches, Signal Process. 83 (12) (2003) 2481–2497 .

[6] C. Barreyre , B. Laurent , J.-M. Loubes , B. Cabon , L. Boussouf , Statistical methods for outlier detection in space telemetries, in: Proc. Int. Conf. Space Operations (SpaceOps’2018), Marseille, France, 2018 .

[7] S. Fuertes , G. Picard , J.-Y. Tourneret , L. Chaari , A. Ferrari , C. Richard , Improving spacecraft health monitoring with automatic anomaly detection techniques, in: Proc. Int. Conf. Space Operations (SpaceOps’2016), Daejeon, South Korea, 2016 . [8] J.-A . Martínez-Heras , A . Donati , M. Kirksch , F. Schmidt , New telemetry monitoring paradigm with novelty detection, in: Proc. Int. Conf. Space Operations (SpaceOps’2012), Stockholm, Sweden, 2012 .

[9] I. Verzola , A. Donati , J.-A. M. Heras , M. Schubert , L. Somodi , Project sybil: a novelty detection system for human spaceﬂight operations, in: Proc. Int. Conf. Space Operations (SpaceOps’2016), Daejeon, South Korea, 2016 .

[10] C. O’Meara , L. Schlag , L. Faltenbacher , M. Wickler , ATHMoS: automated telemetry health monitoring system at GSOC using outlier detection and supervised machine learning, in: Proc. Int. Conf. Space Operations (SpaceOps’2016), Dae- jeon, South Korea, 2016 .

[11]K. Hundman , V. Constantinou , C. Laporte , I. Colwell , T. Soderstrom , Detecting spacecraft anomalies using LSTMs and nonparametric dynamic thresholding, in: Proc. Int. Conf. Knowledge Data Mining (KDD’2018), London, United King- dom, 2018, pp. 387–395 .

[12] C. O’Meara , L. Schlag , M. Wickler , Applications of deep learning neural networks to satellite telemetry monitoring, in: Proc. Int. Conf. Space Operations (SpaceOps’2018), Marseille, France, 2018 .

[13]N. Takeishi , T. Yairi , Anomaly detection from multivariate times-series with sparse representation, in: Proc. IEEE Int. Conf. Syst. Man and Cybernetics, San Siego, CA, USA, 2014 .

[14]T. Yairi , N. Takeishi , T. Oda , Y. Nakajima , N. Nishimura , N. Takata , A data-driven health monitoring method for satellite housekeeping data based on pobabilis- tic clustering and dimensionality reduction, IEEE Trans. Aerosp. Electron. Syst. 53 (3) (2017) 1384–1401 .

[15]A. Adler , M. Elad , Y. Hel-Or , E. Rivlin , Sparse coding with anomaly detection, J. Signal Process. Syst. 79 (2) (2015) 179–188 .

[16]R.O. Duda , Pattern Classiﬁcation and Scene Analysis, Wiley, New-York, 1973 . [17]D. Dohono , High-dimensional data analysis: the curses and blessings of dimen-

sionality, AMS Math Challenges Lecture, 20 0 0 .

[18]A. Tajer , V. Veeravalli , H.V.Poor , Outlying sequence detection in large data sets: a data-driven approach, IEEE Signal Process. Mag. 31 (5) (2014) 44–56 . [19]A. Gilbert , P. Indyk , M. Iwen , L. Schmidt , Recent developments in the sparse

fourier transform: a compressed fourier transform for big data, IEEE Signal Process. Mag. 31 (5) (2014) 91–100 .

[20]N. Li , Y. Yang , Robust fault detection with missing data via sparse decomposition, in: Proc. Int. Fed. Automatic Control (IFAC’2013), Shanghai, China, 2013 . [21]M. Elad , A. Aharon , Image denoising via sparse and redundant representa-

tion over learned dictionaries, IEEE Trans. Image Process. 14 (12) (2006) 3736– 3745 .

[22]W. Dong , B. Li , Sparsity-based image denoising via dictionary learning and structural clustering, in: Proc. IEEE conf. Comput. Vis. Pattern Recogni. (CVPR’2010), San Siego, CA, USA, 2010 .

[23]S. Xu , X. Yang , S. Jiang , A fast nonlocally centralized sparse representation algorithm for image denoising, Signal Process. 131 (2017) 99–112 .

[24]K. Huang , S. Aviyente , Sparse representation for signal classiﬁcation, in: Proc. Neur. Inf. Process. Syst. (NIPS’2006), Whistler, B C, Canada, 2006 .

[25]X. Mei , H. Ling , Robust visual tracking and vehicle classiﬁcation via sparse representation, IEEE Trans. Patt. Anal. Mach. intell. 33 (11) (2011) 2259–2272–18 . [26]Y. Oktar , M. Turkan , A review of sparsity-based clustering methods, Signal Pro-

cess. 148 (2018) 20–30 .

[27]J. Wright , A.Y. Yang , A. Ganesh , S.S. Sastry , Y. Ma , Robust face recognition via sparse representation, IEEE Trans. Patt. Anal. Mach. intell. 31 (3) (2009) 1–18 . [28]Q. Zhang , B. Li , Discriminative K-SVD for dictionary learning in face recogni-

tion, in: Proc. IEEE conf. Comput. Vis. Pattern Recogni., San Francisco, CA, USA, 2010 .

[29]H. Cheng , Z. Liu , L. Yang , X. Chen , Sparse representation and learning in visual recognition: theory and applications, Signal Process. 93 (6) (2013) 1408– 1425 .

[30]Y. Xu , Z. Wu , J. Li , A. Plaza , Z. Wei , Anomaly detection in hyperspectral images based on low-rank and sparse representation, IEEE Trans. Geosci. Remote Sens. 54 (4) (2016) 1990–20 0 0 .

[31]S. Biswas , R. Venkatesh , Sparse representation based anomaly detection with enhanced local dictionaries, in: Proc. IEEE Int. Conf. Image Process., Paris, France, 2014 .

[32]S.G. Mallat , Z. Zhang , Matching pursuits with times-frequency dictionaries, IEEE Trans. Signal Process. 41 (12) (1993) 3397–3415 .

[33]P.K.Y. Pati , R. Rezaiifar , Orthogonal matching pursuits: Recursive function approximation with application to wavelet decomposition, in: Proc. Asilomar Conf. Signals, Syst. Comput. (ACSSC’1993), Paciﬁc Grove, CA, USA, 1993 . [34]R. Tibshirani , Regression shrinkage and selection via the LASSO, J. Roy. Statist.

Soc. Ser. B (Methodological) 58 (1) (1996) 267–288 .

[35]I. Tosic , P. Frossard , Dictionary learning, IEEE Signal Process. Mag. 28 (2011) 27–38 .

[36]M. Aharon , M. Elad , A. Bruckstein , K-SVD: an algorithm for designing overcomplete dictionaries for sparse decomposition, IEEE Trans. Signal. Process. 54 (11) (2006) 4311–4322 .

[37]J. Mairal , F. Bach , J. Ponce , G. Sapiro , Online dictionary learning for sparse coding, in: Proc. Int. Conf. Mach. Learn. (ICML’2009), Montreal, QC, Canada, 2009, pp. 689–696 .

[38]B. Efron , T. Hastie , I. Johnstone , R. Tibshirani , Least angle regression, Ann. Statist. 32 (2) (2004) 407–499 .

[39]M. Osborne , B. Presnell , B. Turlach , A new approach to variable selection in least squares problems, IMA J. Num. Anal. 20 (3) (20 0 0) 389–403 .

[40]S. Boyd , N. Parikh , E. Chu , B. Peleato , J. Eckstein , Distributed optimization and statistical learning via alternating direction method of multipliers, Found. Trends Mach. Learn. 3 (1) (2010) 1–222 .

[41]J. Eckstein , D. Bertsekas , On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Math. Program. (Ser. A B) 55 (3) (1992) 293–318 .

[42]B. Schölkopf , J.C. Platt , J. Shawe-Taylor , A.J. Smola , R.C. Williamson ,Estimating the support of a high-dimensional distribution, Neural Comput. 3 (7) (2001) 1443–1471 .

[43]A.P. Dempster , N.M. Laird , D.B. Rubin , Maximum likelihood from incomplete data via the EM algorithm, J. Roy. Statist. Soc. Ser. B (Methodological) 39 (1) (1997) 1–38 .

[44]C. Ding , X. He , K-means clustering via principal component analysis, in: Int. Conf. Machine Learning (ICML’2004), Banff, Alberta, Canada, 2004 .

[45]M.E. Tipping , Bayesian Inference: an Introduction to Principles and Practice in Machine Learning, Springer, 2004 .

Anomaly detection in mixed telemetry data using a sparse representation and dictionary learning

OATAO is an open access repository that collects the work of Toulouse

researchers and makes it freely available over the web where possible

Any correspondence concerning this service should be sent

to the repository administrator:

[email protected]

This is an author’s version published in:

http://oatao.univ-toulouse.fr/25360

To cite this version:

pilastre, Barbara and Boussouf, Loic and Escrivan, Stéphane d'

and

Tourneret, Jean-Yves