HAL Id: hal-01988977
https://hal.archives-ouvertes.fr/hal-01988977
Submitted on 8 Feb 2019
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
To cite this version:
Sameer Al-Dahidi, Francesco Di Maio, Piero Baraldi, Enrico Zio, Redouane Seraoui. A framework for reconciliating data clusters from a fleet of nuclear power plants turbines for fault diagnosis. Applied Soft Computing, Elsevier, 2018, 69, pp.213-231. �10.1016/j.asoc.2018.04.044�. �hal-01988977�
power plants turbines for fault diagnosis
SameerAl-Dahidia,FrancescoDiMaioa,∗,PieroBaraldia,EnricoZioa,b,RedouaneSeraouic
aEnergyDepartment,PolitecnicodiMilano,Milan,Italy
bChaironSystemsScienceandtheEnergeticChallenge,FondationEDF,CentraleSupélec,Paris,France
cEDF–R&DSTEPSimulationetTraitementdel’informationpourl’exploitationdessystèmesdeproduction,Chatou,France
a r t i c l e i n f o
Articlehistory:
Received31July2015
Receivedinrevisedform21February2018 Accepted24April2018
Availableonline28April2018
Keywords:
Faultdiagnosis
Unsupervisedensembleclustering Incrementallearning
Clusterreconciliation
Fleetofnuclearpowerplants(NPPs) turbinesshut-down
a b s t r a c t
WhenafleetofsimilarSystems,StructuresandComponents(SSCs)isavailable,theuseofalltheavailable informationcollectedonthedifferentSSCsisexpectedtobebeneficialforthediagnosispurpose.Although differentSSCsexperiencedifferentbehavioursindifferentenvironmentalandoperationalconditions, theymaybeinformativefortheother(evenifdifferent)SSCs.Inthepresentwork,theobjectiveistobuild afaultdiagnostictoolaimedatcapitalizingtheavailabledata(vibration,environmentalandoperational conditions)andknowledgeofaheterogeneousfleetofPNuclearPowerPlants(NPPs)turbines.Tothisaim, aframeworkforincrementallylearningdifferentclusteringsindependentlyobtainedfortheindividual turbinesishereproposed.Thebasicideaistoreconciliatethemostsimilarclustersacrossthedifferent plants.Thedataofshut-downtransientsacquiredfromthepastoperationofthePNPPsturbinesare summarizedintoafinal,reconciliatedconsensusclusteringoftheturbinesbehaviorsunderdifferent environmentalandoperationalconditions.Eventually,onecandistinguish,amongthegroups,thoseof anomalousbehaviorandrelatethemtospecificrootcauses.Theproposedframeworkisappliedonthe shut-downtransientsoftwodifferentNPPs.Threealternativeapproachesforlearningdataareapplied tothecasestudyandtheirresultsarecomparedtothoseobtainedbytheproposedframework:results showthattheproposedapproachissuperiortotheotherapproacheswithrespecttothegoodnessofthe finalconsensusclustering,computationaldemand,datarequirements,andfaultdiagnosiseffectiveness.
©2018ElsevierB.V.Allrightsreserved.
1. Introduction
Insafety-relevantindustriessuchasnuclear,oilandgas,auto- motiveandchemical, faultdiagnosisofSystems,Structuresand Components(SSCs)isconsideredacriticaltask[1–3].Inparticu- lar,efficientfaultdiagnosiscanaidtodecidepropermaintenance and, hence, increase production availability and system safety, whilereducingoverallcorrectivemaintenancecosts[4,5].Forthese reasons,thereisanincreasingdemandfromindustryforfaultdiag- nosistechniques[6–9].
Generally,faultdiagnosistechniquescanbecategorizedinto physics-basedanddata-driven[10,11].Physics-basedtechniques useexplicitphysicalmodelstodescribetherelationshipsbetween thecausesthatdeterminetheSSCsbehaviorandthesignalevo- lutions[11–13]. Severalmethods havebeenproposedandused
∗Correspondingauthor.
E-mailaddress:francesco.dimaio@polimi.it(F.DiMaio).
for fault diagnosis in nuclear industry, suchas observer-based methods, parity space methods, Kalman filters and parameter identification-basedmethods[14–16].However,thecomplexityof thephenomenainvolvedandthehighlynon-linearrelationships betweenthecausesandthesignalevolutionsmayposelimitations ontheirpracticaldeployment[11,13].
Ontheotherhand,data-driventechniquesareempiricallybuilt tofitmeasuredprocessdata[17–19].Forexample,ArtificialNeu- ralNetworks(ANNs),expertsystemsandfuzzyandneuro-fuzzy approacheshavebeensuccessfullyappliedfor faultdiagnosisin thenuclearindustry[20–22].Inthiswork,wefocusonthedevel- opmentofadata-driventechniqueforfaultdiagnosis.
One attractive way forward for building effective diagnosis modelsistoconsidertheknowledgecomingfromthefleetofsimi- larSSCs[3,23].Intheindustrialcontext,thetermfleetreferstoaset ofPsystemsthatcansharesometechnicalfeatures,environmental andoperationalconditionsandusagecharacteristics.Onthisbasis, three types of fleet can be envisaged: identical, homogenous andheterogeneous.Table1summarizesthetypesoffleet,their https://doi.org/10.1016/j.asoc.2018.04.044
1568-4946/©2018ElsevierB.V.Allrightsreserved.
RUL Remainingusefullife
FKNN FuzzyK-nearestneighboursalgorithm ADASYN ADAptiveSYNtheticsamplingapproach
TOPSIS Techniquefororderpreferencebysimilaritytoan idealsolution
H Numberofbaseclusterings j Indexofbaseclustering
M Truenumberofclustersinthefinalconsensusclus- tering
Cjopt Optimumnumberofclustersofthej-thbaseclus- tering
P NumberoftheNPPturbinesofthefleet p IndexofthegenericNPPturbine,p=1,...,P NP Numberof shut-downtransients ofthep-thNPP
turbine,p=1,...,P
i Indexofatransient,i=1,...,Np
Z Numberofsignalsofeachi-thtransient z Indexofthegenericsignal,z=1,...,Z T Timehorizonofthegenericsignalz
P∗p Optimumnumberofclustersinthefinalconsensus clusteringofthep-thNPPturbine
Cmin Minimumnumberofclustersinthefinalconsensus clusteringP∗
Cmax Maximumnumberofclustersinthefinalconsensus clusteringP∗
CCandidate Possiblenumberofclustersinthefinalconsensus clusteringP∗,CCandidate ∈[Cmin,Cmax]
P∗final ThefinalreconciliatedconsensusclusteringoftheP NPPsturbines
DB Davies-Boludinvalidityindex
NFF1/EE1 Numberofshut-downtransientsofFF1/EE1NPPs turbines
NaggregatedFF1,EE1 AggregatedsetoftransientsofFF1andEE1 NPPsturbines
P∗aggregatedFF1,EE1 Optimum numberof clustersin thefinal consensusclusteringoftheaggregatedsetoftran- sientsofFF1andEE1NPPsturbines
m IndexofthegenericconsensusclusterofFF1,m= 1,...,P∗FF1
Y= e/f
Vibrational measurements dataset of the e/f-th transientofEE1/FF1
e/f Indexofthegenericshut-downtransientofEE1/FF1, e=1,...,NEE1,f =1,...,NFF1
mef ThesimilaritybetweenY=
e
andY=
f
transientsofthe m-thconsensusclusterofFF1NPPturbine
ımef Thepointwisedifferencebetweenandtransientsof them-thconsensusofFF1NPPturbine
ye/fzt t-thvibrationalmeasurementofthez-thvibrational signalofmatrix/Y=
f
C∗ Optimumnumberofclustersofthefinalconsensus clusteringandforthemeansimilarityvaluesofeach EE1transienttoFF1consensusclusters
X=FF1 FF1TrainingdatasetmatrixofFF1NPPturbine X=∗FF1 UpdatedFF1trainingdatasetbyADASYNapproach
Kmin MinimumnumberofKthnearestneighborstran- sientsfortheFKNNclassifier
Kmax MaximumnumberofKthnearestneighborstran- sientsfortheFKNNclassifier
KCandidate Possible number of Kth nearest neighbors transients for the FKNN classifier, KCandidate ∈ [Kmin,Kmax
K∗ Optimum numberofKthnearest neighborstran- sientsusedinFKNNclassifier
CV Crossvalidationanalysis
ai Averagedistanceofthei-thdatumfromtheother databelongingtothesamecluster
bi Minimumaveragedistanceofthei-thdatumfrom thedatabelongingtoadifferentcluster
Si Silhouettevalueofthei-thdatum
Cm m-thclusterinthefinalconsensusclustering Sm MeanSilhouettevalueforthem-thcluster nm Totalnumberofdatainthem-thclusterinthefinal
consensusclustering
SVCCandidate Silhouette validity value at CCandidate, CCandidate∈[Cmin,Cmax]
=A Adjacencybinarysimilaritymatrix Pairwisebinarysimilarityvalue
=S Co-association(Similarity)matrix
Sij Pairwisesimilarityvaluebetweenthei-thandj-th similarityvalues
di i-thentryofthediagonalmatrixD=
D= Diagonal matrix with diagonal entries d1,d2,...,dN
=I IdentitymatrixofsizeNxN
=Lrs NormalizedLaplacianmatrix Eigenvalueof=Lrs
Sme MeansimilarityvalueoftransienteofEE1tothe wholetransientsofm-thconsensusclusterofFF1 U= Eigenvectorsof
¯
uCcandidate TheCCandidate-theigenvectorof
characteristics and a selection of the most relevant research workperformedinthepast,makinganeffectiveuseoffleetdata:
Inidenticalfleet, thesystemsmight haveidenticaltechnical featuresandusage,andworkinthesameenvironmentalandopera- tionalconditions:knowledgederivedfromsuchfleethasbeenused fordefiningthresholdsforanomalydetection[5],RemainingUse- fulLife(RUL)estimation[24]andtechnicalsolutioncapitalization [25,26]foranysystemidenticaltothefleetmembers.
Inhomogenousfleet,thesystemsmightsharesomeidentical technicalfeaturesthatareinfluencedbysimilarenvironmentaland operationalconditions,butwithfewdifferenceseitherontheirfea- turesorontheirusage:knowledgederivedfromthistypeoffleet hasbeenusedfordevelopingdiagnosticsapproachesforenhancing maintenance planning [27]. However, in a context where cus- tomizedsystemsare common,theseapproaches maygivepoor results[3].
Inheterogeneousfleet,thesystemsmighthavedifferentand/or similartechnicalfeatures,butwithdifferentusageunderdifferent
improvetheefficiencyofthefaultdiagnosistask[2,3,23].
Mostoftheexistingfleet-wideapproachesforfaultdiagnosis treatonlytheinformationgatheredfromidenticaland/orhomoge- nousfleets,ratherthanfromheterogeneousones[23].Infact,the investigationonthebenefitofutilizingtheinformationofahetero- geneousfleetforfaultdiagnosishasbeenrarelyaddressedinthe literature[23].
Inthisregard,theobjectiveofthepresentworkistodevelopa frameworkforincrementallylearningdifferentturbinebehaviours ofaheterogeneousfleetofPNuclearPowerPlants(NPPs)turbines.
Thefinalgoalistosummarizethedataandknowledgeacquired fromthepast experienceof thefleet turbinesoperationsintoa final,reconciliatedconsensusclusteringofthedifferentturbines behaviorsunderdifferentenvironmentalandoperationalcondi- tions(namely normal condition, degraded condition,abnormal conditionandoutliers).
InthecontextoffaultdiagnosisofanindividualNPPturbine, theobjectiveistopartitiontheNpshut-downtransientsofthep-th plant,p=1,...,P,intoMdissimilargroups(whosenumberis“a priori”unknown)suchthattransientsbelongingtothesamegroup aremoresimilarthanthosebelongingtoothergroups.Inparticular, onecandistinguish,amongthegroups,anomalousbehaviorsofthe equipmentandrelatethemtospecificrootcauses[28–31].
Theproblemofgroupingtheoperationaltransientsofthetur- bine canbe formulated asan unsupervised clustering problem aimedatpartitioningthetransientdataintohomogeneous“apri- ori” unknownclustersfor which thetrue classes areunknown [30,32].
Tothisaim,anunsupervisedclusteringapproach(sketchedin Fig.1)hasbeenproposedbysomeoftheauthorsforcombininginan ensembletheclusteringresultsofi)datarepresentativeofthetur- binebehavior,i.e.,sevensignalsoftheturbineshaftvibrations(j=1 baseclustering),and2)datarepresentativeoftheenvironmental andoperationalconditionsthatcaninfluencetheturbinebehavior, i.e.,nominalvaluesofturbineshaftspeed,vacuumandtempera- turesignals(j=2baseclustering)[32].Inbrief,theapproachis basedonthecombinationof:1)aCluster-basedSimilarityParti- tioningAlgorithm(CSPA)toquantifytheco-associationmatrixthat describesthesimilarityamongthetwobaseclusterings(referto AppendixAformoredetails);2)SpectralClustering embedding anunsupervisedK-Meansalgorithm tofindthefinalconsensus clusteringbasedontheavailableco-associationmatrix (referto AppendixBformoredetails);3)theSilhouetteindextoquantifythe goodnessoftheobtainedclustersbychoosingtheoptimumnumber ofclustersinthefinalconsensusclusteringasthatwiththemax- imumSilhouettevalue,i.e.,suchthatclustersarewellseparated andcompacted(refertoAppendixCformoredetails).
Inthisregard,thefinalensembleclusteringofthegenericp-th NPPturbinecomprisesPp∗clustersofshut-downtransients,repre- sentativeofdifferentbehaviorsoftheturbinethatareinfluenced and explainedby differentenvironmentaland operational con- ditions,amongthemsomeanomalousbehaviorsoftheturbines
shut-downtransients,respectively[32,33].
DuetothefactthatthePplantsofthefleet arehighlystan- dardized,someclustersrepresentativeofturbinesoperationsand independentlyobtainedfortheindividualplantsmightbesimilar (hereaftercalledthebestmatchingclusters)andcouldberecon- ciliatedintoauniqueclusterthatwouldgathermoreinformation collectedfrommultipleplantsand,thus,isexpectedtobemore reliableandrobust.
Morespecifically,whenanewdatasetofNp+1shut-downtran- sientsfromthegenericp+1-thNPPturbinebecomesavailable,the previouslyobtainedensembleclusteringisupdatedbasedonthe clustersidentifiedindependentlyforthetransientsofthep+1-th NPPturbine.
Thescopeofthisworkistoproposeaframeworkforidentifying thebestmatchingclustersamongtheplants:thesewillberecon- ciliatedintoauniqueconsensusclustercomposedbythetransients oftheclustersindependentlyobtainedfortheplants.
Theproposedframework is validatedonthetwo previously mentionedNPPturbinesFF1andEE1.Theapplicationoftheframe- workleadstoobtainafinal,reconciliatedconsensusclusteringPfinal∗ of7and13clustersrepresentativeofuniqueturbinesoperations oftheFF1andEE1plants,respectively,and3consensusclusters representativeof similarturbinesoperationsof theplants (best matchingclusters).Theperformanceofthefinalreconciliatedcon- sensusclusteringPfinal∗ isquantifiedintermsofclustersseparation andcompactness,byresortingtotheSilhouettevalidityindex([34];
seeAppendixC),C-index[35]andDavies-Boludin(DB)index[36].
Theexploitedknowledgeoftheturbinescan,then,beretrievedfor thepurposeof,forinstance,lifetracking,healthstateestimation andfaultdiagnosisofanewNPPturbine.
Forcomparison,threeotherapproachesareusedtoreconciliate theconsensusclustersoftheFF1NPPturbineonthebasisofthe receivedinformationfromtheEE1NPPturbine:1)clusteringofthe aggregatedshut-downtransientsofFF1andEE1NPPsturbinesby theunsupervisedensembleclusteringapproach,2)theinclusion oftheEE1transientsintotheFF1ensembleclusteringbyresort- ingtoFuzzysimilaritymeasure[37–39]and3)theclassificationof EE1transientsbyasupervisedclassifier,suchasaFuzzyK-Nearest Neighboursalgorithm(FKNN)[40–42]trainedonFF1clustering.
Resultsarediscussedandcomparedwiththoseobtainedwiththe proposedapproach:itisconcludedthattheproposedapproachis abletoupdateeffectivelytheclustersoftheFF1NPPturbineonthe basisofthereceivedinformationfromtheEE1NPPturbine,and thatitissuperiortotheotherapproacheswithrespecttothegood- nessofthefinalconsensusclustering,computationaldemand,data requirements,andfaultdiagnosiseffectiveness.
Thus,theoriginalcontributioninthisworkisthedevelopment ofaframeworkforincrementallylearningtheinformationbrought byaheterogeneousfleetofdifferentNPPsturbinesbasedonthe combinationof:
Fig.1. Theunsupervisedensembleclusteringapproach[32].
1)theunsupervisedensembleclusteringapproach[32],thatover- comesthechallengetotheexisting clusteringtechniquesby determiningautomaticallytheoptimumnumberofclustersof theshut-downtransientsofeachindividualNPPturbine(which bymostindustrialapplications,isnotknown“apriori”);the clustersthatresultarewellseparatedandcompacted(asmea- suredbytheSilhouetteindex[34]);
2)a reconciliationprocedure for identifyingthe best matching clustersamongtheplants.Thegoodnessofthefinalreconcil- iatedclusteringisquantifiedintermsofclustersseparationand compactness.
Itis worthmentioningthatthedimensionalityand required completenessofthedatasets(thatneedsignalsrepresentativeof bothenvironmentalandoperationalconditions(i.e.,turbineshaft speed,vacuumandtemperature)andcomponentbehaviours(i.e., vibrations))make,inthiswork,difficulttoshowtheapplicationof theframeworktoadditionaldatasetfromotherindustries,because ofconfidentialityconstraintsofsuchdatasets.
Theremainingof this paperis organizedas follows.Section 2illustratestheproposedframeworkforreconciliatingtheclus- tersofafleetofindustrialcomponentsforfaultdiagnosis.Section 3andSection4describehowtheproposedapproachand three otheralternativeapproachesareusedforlearningnewdatacom- ingfromafleetofNPPturbinesandupdatingtheclusteringresults obtainedbyensemble-clusteringthetransientscomingfromNPPs, respectively.Alongwiththedescriptionoftheprocedures,their applicationtotheshut-downtransientscollectedfromafleetof NPPsisshown.Finally,conclusionsandperspectivesaredrawnin Section5.
2. Theframeworkforreconciliatingtheclustersofafleetof industrialcomponents
Inthissection,theframeworkforreconciliatingtheclustersof aheterogeneousfleetofPindustrialcomponentsisproposed.The frameworkentailstwostepsandissketchedinFig.2:
Step1: Clustering the transients of a genericp-th component bytheunsupervisedensembleclusteringapproach.Forthegeneric p-thcomponent,theobjective istopartitiontheNp shut-down transientsintodissimilargroupsoftransientsrepresentativeofdif- ferentcomponentbehaviorsinfluencedbydifferentenvironmental andoperationalconditions.Tothisaim,theunsupervisedensem- bleclusteringapproachofFig.1(seeAppendixA)hasbeensetforth tobuildaconsensusclusteringP∗fromthebaseclusterings:
1)j=1: Clustering of data representative of the component behaviour(suchasvibrations):theoutcomeofthisisgroupsof transientsrepresentingdifferentbehavioursofthecomponent, e.g.,normalcondition,degradedcondition,abnormalcondition andoutliers,
2)j=2:Clustering ofdata representativeof theenvironmental andoperationalconditionsthatcaninfluencethecomponent behaviour(such asrotating speed, vacuumvalues, tempera- tures,pressures,etc.):theoutcomeofthisisgroupsoftransients representingdifferentenvironmentalandoperationaconditions experiencedbythecomponent,e.g.,agroupmightbecharacter- izedbyhightemperaturevaluesandlowvacuumvalues.
The optimum number of clustersis selected among several candidatesCCandidate=[Cmin,Cmax] basedontheSilhouettevalid- ityindexthatmeasuresthesimilarityofthedatabelongingtothe sameclusterandthedissimilaritytothoseintheotherclusters(a largeSilhouettevalueindicatesthattheobtainedclustersarewell separatedandcompacted([34];seeAppendixC)).
Step2:Reconciliatingthemostsimilarconsensusclustersobtained individuallyforeachofthedifferentplants.Tocapitalizetheadded informationofanewcomingcomponent(i.e.,p+1-thcomponent) and,hence,toupdatethepreviousobtainedconsensusclustering P∗pofthep-thcomponenttransientsdata,a reconciliationproce- dureishereproposed.Theunderlyingapproachisthatoflearning thenovelinformationcontentofthenewNp+1transientswithout forgettingthepreviouslyacquiredknowledgethatissummarized in the Pp∗consensus clustering (as well shall see in Section 4).
Firstly,theNp+1transientshavetobepartitionedintogroupsrep- resentative of the p+1-th component behavior under varying environmentalandoperationalconditionsofthenewcomponent asdoneinStep1forthep-thcomponent.Oncetheconsensusclus- teringsP∗p andPp+1∗ ofthetwo componentsareavailable,those composedbytransientswithsimilarbehaviorsareidentifiedand reconciliatedintouniqueclusterswithinthefinalensembleclus- tering of thetwo plants Pp,p+1∗ . The remainingclustersare left disjointastheyarerepresentativeofuniqueoperationalconditions ofeachcomponent.
Theincrementallearningprocessandtheenvelopingreconcili- ationapproachisrepeatedforallthecomponentsavailableinthe fleettogetthefinalclusteringPfinal∗ thatresumesthecharacteris- ticbehavioursofallthepossible(available)componentsoperating inaslargeaspossiblevarietyofenvironmentaland operational conditions.
OncethefinalclusteringPfinal∗ isobtained,thegoodnessofthe finalclustersidentifiedisquantifiedintermsoftheirseparation andcompactness,asmeasuredbyinternalvalidityindexes.These indexesevaluatetheclusteringresultsbasedoninformationintrin- sictothedataitself,withoutresortingtoanyexternalinformation liketrueclusteringresults,whicharenotknown“apriori”inmost industrialapplications[43].Inparticular,weresorttothefollowing threeinternalindexes:
•theSilhouette index ([34]; seeAppendix C): it measuresthe similarityofthedatabelongingtothesameclusterandthedis-
Fig.2. TheproposedframeworkforreconciliatingtheconsensusclusteringsforafleetofPcomponents.
similaritytothose in theotherclusters. TheSilhouette index variesintheinterval[–1,1]andshouldbemaximized;
•theC-index[35]:itdefinestheratiobetweenthesumofwithin- clusterdistancesandthedistancesconsideringallthepairsofthe instances.TheC-indexrangesintheinterval[0,1]andshouldbe minimized;
•the Davies-Boludin (DB) index [36]: it is based on the ratio ofwithin-clusterandbetween-clusterdistances.TheDBindex rangesintheinterval[0,∞)andshouldbeminimized.
LargeSilhouetteandsmallC-indexandDBvaluesindicatethat theobtainedclustersarewellseparatedandcompacted.
Itisimportanttopointoutthatthereexistotherclusteringvalid- ityindexes,thesocalledexternalvalidityindexes,thatevaluatethe goodnessoftheobtainedclusterswithrespecttoapre-specified structure (assumed to be known“a priori”), like false-positive, false-negativeandclassificationerror,etc.[43].However,thecal- culationsoftheseindexesarenotfeasibleinthisworkduetothe unavailabilityofthetrueclusteringresults.