A framework for reconciliating data clusters from a fleet of nuclear power plants turbines for fault diagnosis

(1)

HAL Id: hal-01988977

https://hal.archives-ouvertes.fr/hal-01988977

Submitted on 8 Feb 2019

HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

To cite this version:

Sameer Al-Dahidi, Francesco Di Maio, Piero Baraldi, Enrico Zio, Redouane Seraoui. A framework for reconciliating data clusters from a fleet of nuclear power plants turbines for fault diagnosis. Applied Soft Computing, Elsevier, 2018, 69, pp.213-231. �10.1016/j.asoc.2018.04.044�. �hal-01988977�

(2)

power plants turbines for fault diagnosis

SameerAl-Dahidiâ,FrancescoDiMaioâ,∗,PieroBaraldiâ,EnricoZioâ,b,RedouaneSeraoui^c

aEnergyDepartment,PolitecnicodiMilano,Milan,Italy

bChaironSystemsScienceandtheEnergeticChallenge,FondationEDF,CentraleSupélec,Paris,France

cEDF–R&DSTEPSimulationetTraitementdel’informationpourl’exploitationdessystèmesdeproduction,Chatou,France

a r t i c l e i n f o

Articlehistory:

Received31July2015

Receivedinrevisedform21February2018 Accepted24April2018

Availableonline28April2018

Keywords:

Faultdiagnosis

Unsupervisedensembleclustering Incrementallearning

Clusterreconciliation

Fleetofnuclearpowerplants(NPPs) turbinesshut-down

a b s t r a c t

WhenafleetofsimilarSystems,StructuresandComponents(SSCs)isavailable,theuseofalltheavailable informationcollectedonthedifferentSSCsisexpectedtobebeneficialforthediagnosispurpose.Although differentSSCsexperiencedifferentbehavioursindifferentenvironmentalandoperationalconditions, theymaybeinformativefortheother(evenifdifferent)SSCs.Inthepresentwork,theobjectiveistobuild afaultdiagnostictoolaimedatcapitalizingtheavailabledata(vibration,environmentalandoperational conditions)andknowledgeofaheterogeneousfleetofPNuclearPowerPlants(NPPs)turbines.Tothisaim, aframeworkforincrementallylearningdifferentclusteringsindependentlyobtainedfortheindividual turbinesishereproposed.Thebasicideaistoreconciliatethemostsimilarclustersacrossthedifferent plants.Thedataofshut-downtransientsacquiredfromthepastoperationofthePNPPsturbinesare summarizedintoafinal,reconciliatedconsensusclusteringoftheturbinesbehaviorsunderdifferent environmentalandoperationalconditions.Eventually,onecandistinguish,amongthegroups,thoseof anomalousbehaviorandrelatethemtospecificrootcauses.Theproposedframeworkisappliedonthe shut-downtransientsoftwodifferentNPPs.Threealternativeapproachesforlearningdataareapplied tothecasestudyandtheirresultsarecomparedtothoseobtainedbytheproposedframework:results showthattheproposedapproachissuperiortotheotherapproacheswithrespecttothegoodnessofthe finalconsensusclustering,computationaldemand,datarequirements,andfaultdiagnosiseffectiveness.

1. Introduction

Insafety-relevantindustriessuchasnuclear,oilandgas,auto- motiveandchemical, faultdiagnosisofSystems,Structuresand Components(SSCs)isconsideredacriticaltask[1–3].Inparticu- lar,efﬁcientfaultdiagnosiscanaidtodecidepropermaintenance and, hence, increase production availability and system safety, whilereducingoverallcorrectivemaintenancecosts[4,5].Forthese reasons,thereisanincreasingdemandfromindustryforfaultdiag- nosistechniques[6–9].

Generally,faultdiagnosistechniquescanbecategorizedinto physics-basedanddata-driven[10,11].Physics-basedtechniques useexplicitphysicalmodelstodescribetherelationshipsbetween thecausesthatdeterminetheSSCsbehaviorandthesignalevo- lutions[11–13]. Severalmethods havebeenproposedandused

∗Correspondingauthor.

E-mailaddress:[email protected](F.DiMaio).

for fault diagnosis in nuclear industry, suchas observer-based methods, parity space methods, Kalman ﬁlters and parameter identiﬁcation-basedmethods[14–16].However,thecomplexityof thephenomenainvolvedandthehighlynon-linearrelationships betweenthecausesandthesignalevolutionsmayposelimitations ontheirpracticaldeployment[11,13].

Ontheotherhand,data-driventechniquesareempiricallybuilt toﬁtmeasuredprocessdata[17–19].Forexample,ArtiﬁcialNeu- ralNetworks(ANNs),expertsystemsandfuzzyandneuro-fuzzy approacheshavebeensuccessfullyappliedfor faultdiagnosisin thenuclearindustry[20–22].Inthiswork,wefocusonthedevel- opmentofadata-driventechniqueforfaultdiagnosis.

One attractive way forward for building effective diagnosis modelsistoconsidertheknowledgecomingfromthefleetofsimi- larSSCs[3,23].Intheindustrialcontext,thetermfleetreferstoaset ofPsystemsthatcansharesometechnicalfeatures,environmental andoperationalconditionsandusagecharacteristics.Onthisbasis, three types of fleet can be envisaged: identical, homogenous andheterogeneous.Table1summarizesthetypesoffleet,their https://doi.org/10.1016/j.asoc.2018.04.044

(3)

RUL Remainingusefullife

FKNN FuzzyK-nearestneighboursalgorithm ADASYN ADAptiveSYNtheticsamplingapproach

TOPSIS Techniquefororderpreferencebysimilaritytoan idealsolution

H Numberofbaseclusterings j Indexofbaseclustering

M Truenumberofclustersintheﬁnalconsensusclus- tering

C^j_opt Optimumnumberofclustersofthej-thbaseclus- tering

P NumberoftheNPPturbinesoftheﬂeet p IndexofthegenericNPPturbine,p=1,...,P NP Numberof shut-downtransients ofthep-thNPP

turbine,p=1,...,P

i Indexofatransient,i=1,...,Np

Z Numberofsignalsofeachi-thtransient z Indexofthegenericsignal,z=1,...,Z T Timehorizonofthegenericsignalz

P^∗_p Optimumnumberofclustersintheﬁnalconsensus clusteringofthep-thNPPturbine

C_min Minimumnumberofclustersintheﬁnalconsensus clusteringP^∗

Cmax Maximumnumberofclustersintheﬁnalconsensus clusteringP^∗

C_Candidate Possiblenumberofclustersintheﬁnalconsensus clusteringP^∗,C_Candidate ∈[C_min,Cmax]

P^∗_final TheﬁnalreconciliatedconsensusclusteringoftheP NPPsturbines

DB Davies-Boludinvalidityindex

N_FF1/EE1 Numberofshut-downtransientsofFF1/EE1NPPs turbines

NaggregatedFF1,EE1 AggregatedsetoftransientsofFF1andEE1 NPPsturbines

P^∗aggregatedFF1,EE1 Optimum numberof clustersin theﬁnal consensusclusteringoftheaggregatedsetoftran- sientsofFF1andEE1NPPsturbines

m IndexofthegenericconsensusclusterofFF1,m= 1,...,P^∗_FF1

Y= e/f

Vibrational measurements dataset of the e/f-th transientofEE1/FF1

e/f Indexofthegenericshut-downtransientofEE1/FF1, e=1,...,NEE1,f =1,...,NFF1

^m_ef ThesimilaritybetweenY⁼

e

andY⁼

f

transientsofthe m-thconsensusclusterofFF1NPPturbine

ı^m_ef Thepointwisedifferencebetweenandtransientsof them-thconsensusofFF1NPPturbine

y^e/f_zt t-thvibrationalmeasurementofthez-thvibrational signalofmatrix/Y⁼

f

C^∗ Optimumnumberofclustersoftheﬁnalconsensus clusteringandforthemeansimilarityvaluesofeach EE1transienttoFF1consensusclusters

X=FF1 FF1TrainingdatasetmatrixofFF1NPPturbine X=^∗FF1 UpdatedFF1trainingdatasetbyADASYNapproach

K_min MinimumnumberofKthnearestneighborstran- sientsfortheFKNNclassiﬁer

Kmax MaximumnumberofKthnearestneighborstran- sientsfortheFKNNclassiﬁer

K_Candidate Possible number of Kth nearest neighbors transients for the FKNN classiﬁer, K_Candidate ∈ [K_min,K_max

K^∗ Optimum numberofKthnearest neighborstran- sientsusedinFKNNclassiﬁer

CV Crossvalidationanalysis

aⁱ Averagedistanceofthei-thdatumfromtheother databelongingtothesamecluster

bⁱ Minimumaveragedistanceofthei-thdatumfrom thedatabelongingtoadifferentcluster

Sⁱ Silhouettevalueofthei-thdatum

Cm m-thclusterintheﬁnalconsensusclustering Sm MeanSilhouettevalueforthem-thcluster nm Totalnumberofdatainthem-thclusterintheﬁnal

consensusclustering

SV_C_Candidate Silhouette validity value at C_Candidate, C_Candidate∈[C_min,Cmax]

=A Adjacencybinarysimilaritymatrix Pairwisebinarysimilarityvalue

=S Co-association(Similarity)matrix

S_ij Pairwisesimilarityvaluebetweenthei-thandj-th similarityvalues

d_i i-thentryofthediagonalmatrixD⁼

D= Diagonal matrix with diagonal entries d₁,d₂,...,d_N

=I IdentitymatrixofsizeNxN

=Lrs NormalizedLaplacianmatrix Eigenvalueof⁼Lrs

S^m_e MeansimilarityvalueoftransienteofEE1tothe wholetransientsofm-thconsensusclusterofFF1 U= Eigenvectorsof

¯

uC_candidate TheC_Candidate-theigenvectorof

characteristics and a selection of the most relevant research workperformedinthepast,makinganeffectiveuseofﬂeetdata:

Inidenticalfleet, thesystemsmight haveidenticaltechnical featuresandusage,andworkinthesameenvironmentalandopera- tionalconditions:knowledgederivedfromsuchfleethasbeenused fordefiningthresholdsforanomalydetection[5],RemainingUse- fulLife(RUL)estimation[24]andtechnicalsolutioncapitalization [25,26]foranysystemidenticaltothefleetmembers.

Inhomogenousfleet,thesystemsmightsharesomeidentical technicalfeaturesthatareinfluencedbysimilarenvironmentaland operationalconditions,butwithfewdifferenceseitherontheirfea- turesorontheirusage:knowledgederivedfromthistypeoffleet hasbeenusedfordevelopingdiagnosticsapproachesforenhancing maintenance planning [27]. However, in a context where cus- tomizedsystemsare common,theseapproaches maygivepoor results[3].

Inheterogeneousﬂeet,thesystemsmighthavedifferentand/or similartechnicalfeatures,butwithdifferentusageunderdifferent

(4)

improvetheefﬁciencyofthefaultdiagnosistask[2,3,23].

Mostoftheexistingfleet-wideapproachesforfaultdiagnosis treatonlytheinformationgatheredfromidenticaland/orhomoge- nousfleets,ratherthanfromheterogeneousones[23].Infact,the investigationonthebenefitofutilizingtheinformationofahetero- geneousfleetforfaultdiagnosishasbeenrarelyaddressedinthe literature[23].

Inthisregard,theobjectiveofthepresentworkistodevelopa frameworkforincrementallylearningdifferentturbinebehaviours ofaheterogeneousﬂeetofPNuclearPowerPlants(NPPs)turbines.

Thefinalgoalistosummarizethedataandknowledgeacquired fromthepast experienceof thefleet turbinesoperationsintoa final,reconciliatedconsensusclusteringofthedifferentturbines behaviorsunderdifferentenvironmentalandoperationalcondi- tions(namely normal condition, degraded condition,abnormal conditionandoutliers).

InthecontextoffaultdiagnosisofanindividualNPPturbine, theobjectiveistopartitiontheNpshut-downtransientsofthep-th plant,p=1,...,P,intoMdissimilargroups(whosenumberis“a priori”unknown)suchthattransientsbelongingtothesamegroup aremoresimilarthanthosebelongingtoothergroups.Inparticular, onecandistinguish,amongthegroups,anomalousbehaviorsofthe equipmentandrelatethemtospeciﬁcrootcauses[28–31].

Theproblemofgroupingtheoperationaltransientsofthetur- bine canbe formulated asan unsupervised clustering problem aimedatpartitioningthetransientdataintohomogeneous“apriori” unknownclustersfor which thetrue classes areunknown [30,32].

Tothisaim,anunsupervisedclusteringapproach(sketchedin Fig.1)hasbeenproposedbysomeoftheauthorsforcombininginan ensembletheclusteringresultsofi)datarepresentativeofthetur- binebehavior,i.e.,sevensignalsoftheturbineshaftvibrations(j=1 baseclustering),and2)datarepresentativeoftheenvironmental andoperationalconditionsthatcaninfluencetheturbinebehavior, i.e.,nominalvaluesofturbineshaftspeed,vacuumandtempera- turesignals(j=2baseclustering)[32].Inbrief,theapproachis basedonthecombinationof:1)aCluster-basedSimilarityParti- tioningAlgorithm(CSPA)toquantifytheco-associationmatrixthat describesthesimilarityamongthetwobaseclusterings(referto AppendixAformoredetails);2)SpectralClustering embedding anunsupervisedK-Meansalgorithm tofindthefinalconsensus clusteringbasedontheavailableco-associationmatrix (referto AppendixBformoredetails);3)theSilhouetteindextoquantifythe goodnessoftheobtainedclustersbychoosingtheoptimumnumber ofclustersinthefinalconsensusclusteringasthatwiththemax- imumSilhouettevalue,i.e.,suchthatclustersarewellseparated andcompacted(refertoAppendixCformoredetails).

Inthisregard,theﬁnalensembleclusteringofthegenericp-th NPPturbinecomprisesP_p^∗clustersofshut-downtransients,repre- sentativeofdifferentbehaviorsoftheturbinethatareinﬂuenced and explainedby differentenvironmentaland operational conditions,amongthemsomeanomalousbehaviorsoftheturbines

shut-downtransients,respectively[32,33].

DuetothefactthatthePplantsoftheﬂeet arehighlystan- dardized,someclustersrepresentativeofturbinesoperationsand independentlyobtainedfortheindividualplantsmightbesimilar (hereaftercalledthebestmatchingclusters)andcouldberecon- ciliatedintoauniqueclusterthatwouldgathermoreinformation collectedfrommultipleplantsand,thus,isexpectedtobemore reliableandrobust.

Morespeciﬁcally,whenanewdatasetofN_p+1shut-downtran- sientsfromthegenericp+1-thNPPturbinebecomesavailable,the previouslyobtainedensembleclusteringisupdatedbasedonthe clustersidentiﬁedindependentlyforthetransientsofthep+1-th NPPturbine.

Thescopeofthisworkistoproposeaframeworkforidentifying thebestmatchingclustersamongtheplants:thesewillberecon- ciliatedintoauniqueconsensusclustercomposedbythetransients oftheclustersindependentlyobtainedfortheplants.

Theproposedframework is validatedonthetwo previously mentionedNPPturbinesFF1andEE1.Theapplicationoftheframe- workleadstoobtainafinal,reconciliatedconsensusclusteringP_final^∗ of7and13clustersrepresentativeofuniqueturbinesoperations oftheFF1andEE1plants,respectively,and3consensusclusters representativeof similarturbinesoperationsof theplants (best matchingclusters).Theperformanceofthefinalreconciliatedcon- sensusclusteringP_final^∗ isquantifiedintermsofclustersseparation andcompactness,byresortingtotheSilhouettevalidityindex([34];

seeAppendixC),C-index[35]andDavies-Boludin(DB)index[36].

Theexploitedknowledgeoftheturbinescan,then,beretrievedfor thepurposeof,forinstance,lifetracking,healthstateestimation andfaultdiagnosisofanewNPPturbine.

Forcomparison,threeotherapproachesareusedtoreconciliate theconsensusclustersoftheFF1NPPturbineonthebasisofthe receivedinformationfromtheEE1NPPturbine:1)clusteringofthe aggregatedshut-downtransientsofFF1andEE1NPPsturbinesby theunsupervisedensembleclusteringapproach,2)theinclusion oftheEE1transientsintotheFF1ensembleclusteringbyresort- ingtoFuzzysimilaritymeasure[37–39]and3)theclassiﬁcationof EE1transientsbyasupervisedclassiﬁer,suchasaFuzzyK-Nearest Neighboursalgorithm(FKNN)[40–42]trainedonFF1clustering.

Resultsarediscussedandcomparedwiththoseobtainedwiththe proposedapproach:itisconcludedthattheproposedapproachis abletoupdateeffectivelytheclustersoftheFF1NPPturbineonthe basisofthereceivedinformationfromtheEE1NPPturbine,and thatitissuperiortotheotherapproacheswithrespecttothegood- nessoftheﬁnalconsensusclustering,computationaldemand,data requirements,andfaultdiagnosiseffectiveness.

Thus,theoriginalcontributioninthisworkisthedevelopment ofaframeworkforincrementallylearningtheinformationbrought byaheterogeneousﬂeetofdifferentNPPsturbinesbasedonthe combinationof:

(5)

Fig.1. Theunsupervisedensembleclusteringapproach[32].

1)theunsupervisedensembleclusteringapproach[32],thatover- comesthechallengetotheexisting clusteringtechniquesby determiningautomaticallytheoptimumnumberofclustersof theshut-downtransientsofeachindividualNPPturbine(which bymostindustrialapplications,isnotknown“apriori”);the clustersthatresultarewellseparatedandcompacted(asmea- suredbytheSilhouetteindex[34]);

2)a reconciliationprocedure for identifyingthe best matching clustersamongtheplants.Thegoodnessoftheﬁnalreconcil- iatedclusteringisquantiﬁedintermsofclustersseparationand compactness.

Itis worthmentioningthatthedimensionalityand required completenessofthedatasets(thatneedsignalsrepresentativeof bothenvironmentalandoperationalconditions(i.e.,turbineshaft speed,vacuumandtemperature)andcomponentbehaviours(i.e., vibrations))make,inthiswork,difﬁculttoshowtheapplicationof theframeworktoadditionaldatasetfromotherindustries,because ofconﬁdentialityconstraintsofsuchdatasets.

Theremainingof this paperis organizedas follows.Section 2illustratestheproposedframeworkforreconciliatingtheclus- tersofafleetofindustrialcomponentsforfaultdiagnosis.Section 3andSection4describehowtheproposedapproachand three otheralternativeapproachesareusedforlearningnewdatacom- ingfromafleetofNPPturbinesandupdatingtheclusteringresults obtainedbyensemble-clusteringthetransientscomingfromNPPs, respectively.Alongwiththedescriptionoftheprocedures,their applicationtotheshut-downtransientscollectedfromafleetof NPPsisshown.Finally,conclusionsandperspectivesaredrawnin Section5.

2. Theframeworkforreconciliatingtheclustersofaﬂeetof industrialcomponents

Inthissection,theframeworkforreconciliatingtheclustersof aheterogeneousﬂeetofPindustrialcomponentsisproposed.The frameworkentailstwostepsandissketchedinFig.2:

Step1: Clustering the transients of a genericp-th component bytheunsupervisedensembleclusteringapproach.Forthegeneric p-thcomponent,theobjective istopartitiontheN_p shut-down transientsintodissimilargroupsoftransientsrepresentativeofdif- ferentcomponentbehaviorsinﬂuencedbydifferentenvironmental andoperationalconditions.Tothisaim,theunsupervisedensem- bleclusteringapproachofFig.1(seeAppendixA)hasbeensetforth tobuildaconsensusclusteringP^∗fromthebaseclusterings:

1)j=1: Clustering of data representative of the component behaviour(suchasvibrations):theoutcomeofthisisgroupsof transientsrepresentingdifferentbehavioursofthecomponent, e.g.,normalcondition,degradedcondition,abnormalcondition andoutliers,

2)j=2:Clustering ofdata representativeof theenvironmental andoperationalconditionsthatcaninﬂuencethecomponent behaviour(such asrotating speed, vacuumvalues, tempera- tures,pressures,etc.):theoutcomeofthisisgroupsoftransients representingdifferentenvironmentalandoperationaconditions experiencedbythecomponent,e.g.,agroupmightbecharacter- izedbyhightemperaturevaluesandlowvacuumvalues.

The optimum number of clustersis selected among several candidatesC_Candidate=[C_min,Cmax] basedontheSilhouettevalid- ityindexthatmeasuresthesimilarityofthedatabelongingtothe sameclusterandthedissimilaritytothoseintheotherclusters(a largeSilhouettevalueindicatesthattheobtainedclustersarewell separatedandcompacted([34];seeAppendixC)).

Step2:Reconciliatingthemostsimilarconsensusclustersobtained individuallyforeachofthedifferentplants.Tocapitalizetheadded informationofanewcomingcomponent(i.e.,p+1-thcomponent) and,hence,toupdatethepreviousobtainedconsensusclustering P^∗_pofthep-thcomponenttransientsdata,a reconciliationproce- dureishereproposed.Theunderlyingapproachisthatoflearning thenovelinformationcontentofthenewN_p+1transientswithout forgettingthepreviouslyacquiredknowledgethatissummarized in the P_p^∗consensus clustering (as well shall see in Section 4).

Firstly,theN_p+1transientshavetobepartitionedintogroupsrep- resentative of the p+1-th component behavior under varying environmentalandoperationalconditionsofthenewcomponent asdoneinStep1forthep-thcomponent.Oncetheconsensusclus- teringsP^∗_p andP_p+1^∗ ofthetwo componentsareavailable,those composedbytransientswithsimilarbehaviorsareidentiﬁedand reconciliatedintouniqueclusterswithintheﬁnalensembleclus- tering of thetwo plants P_p,p+1^∗ . The remainingclustersare left disjointastheyarerepresentativeofuniqueoperationalconditions ofeachcomponent.

Theincrementallearningprocessandtheenvelopingreconcili- ationapproachisrepeatedforallthecomponentsavailableinthe ﬂeettogettheﬁnalclusteringP_final^∗ thatresumesthecharacteris- ticbehavioursofallthepossible(available)componentsoperating inaslargeaspossiblevarietyofenvironmentaland operational conditions.

OncethefinalclusteringP_final^∗ isobtained,thegoodnessofthe finalclustersidentifiedisquantifiedintermsoftheirseparation andcompactness,asmeasuredbyinternalvalidityindexes.These indexesevaluatetheclusteringresultsbasedoninformationintrin- sictothedataitself,withoutresortingtoanyexternalinformation liketrueclusteringresults,whicharenotknown“apriori”inmost industrialapplications[43].Inparticular,weresorttothefollowing threeinternalindexes:

•theSilhouette index ([34]; seeAppendix C): it measuresthe similarityofthedatabelongingtothesameclusterandthedis-

(6)

Fig.2. TheproposedframeworkforreconciliatingtheconsensusclusteringsforaﬂeetofPcomponents.

similaritytothose in theotherclusters. TheSilhouette index variesintheinterval[–1,1]andshouldbemaximized;

•theC-index[35]:itdeﬁnestheratiobetweenthesumofwithin- clusterdistancesandthedistancesconsideringallthepairsofthe instances.TheC-indexrangesintheinterval[0,1]andshouldbe minimized;

•the Davies-Boludin (DB) index [36]: it is based on the ratio ofwithin-clusterandbetween-clusterdistances.TheDBindex rangesintheinterval[0,∞)andshouldbeminimized.

LargeSilhouetteandsmallC-indexandDBvaluesindicatethat theobtainedclustersarewellseparatedandcompacted.

Itisimportanttopointoutthatthereexistotherclusteringvalid- ityindexes,thesocalledexternalvalidityindexes,thatevaluatethe goodnessoftheobtainedclusterswithrespecttoapre-speciﬁed structure (assumed to be known“a priori”), like false-positive, false-negativeandclassiﬁcationerror,etc.[43].However,thecal- culationsoftheseindexesarenotfeasibleinthisworkduetothe unavailabilityofthetrueclusteringresults.