ContentslistsavailableatScienceDirect
Journal of Quantitative Spectroscopy & Radiative Transfer
journalhomepage:www.elsevier.com/locate/jqsrt
Machine learning for automatic identification of new minor species
Frédéric Schmidt
a,∗, Guillaume Cruz Mermy
a, Justin Erwin
b, Séverine Robert
b, Lori Neary
b, Ian R. Thomas
b, Frank Daerden
b, Bojan Ristic
b, Manish R. Patel
c, Giancarlo Bellucci
d, Jose-Juan Lopez-Moreno
e, Ann-Carine Vandaele
baUniversité Paris-Saclay, CNRS, GEOPS, Orsay, 91405, France
bBelgian Institute for Space Aeronomy (BIRA-IASB), Avenue Circulaire, Brussels 3 B-1180 Belgium
cSchool of Physical Sciences, The Open University, Milton Keynes, MK7 6AA,U.K.
dINAF-Istituto di Astrofsica e Planetologia Spaziali, Rome, ITALY
eInstituto de Astrofísica de Andalucía CSIC, Spain
a rt i c l e i nf o
Article history:
Received 4 May 2020 Revised 14 August 2020 Accepted 28 September 2020 Available online 29 September 2020 Keywords:
Spectroscopy Atmosphere Data mining Machine learning Unsupervised Source separation
Non-negative matrix factorization
a b s t r a c t
One ofthe main difficulties toanalyze modernspectroscopic datasets isdue tothe large amount of data.Forexample,inatmospherictransmittancespectroscopy,thesolaroccultationchannel(SO)ofthe NOMADinstrumentonboardtheESAExoMars2016satellitecalledTraceGasOrbiter(TGO)hadproduced
~ 10millionsofspectrain ~ 20000acquisitionsequencessincethebeginningofthemissioninApril 2018until15January2020.Otherdatasetsareevenlargerwith ~ billionsofspectraforOMEGAonboard MarsExpressorCRISMonboardMarsReconnaissanceOrbiter. Usually,newlinesarediscoveredaftera longiterative processofmodel fitting and manual residualanalysis.Herewe proposeanew method basedonunsupervisedmachinelearning, toautomaticallydetectnewminor species.Althoughprecise quantificationisoutofscope,thistoolcanalsobeusedtoquicklysummarizethedataset,bygivingfew endmembers(”source”)andtheirabundances.
The methodologyisthe following:weproposed away to approximatethedataset non-linearityby a linearmixtureofabundanceandsourcespectra(endmembers).Weusedunsupervisedsourceseparation informofnon-negativematrixfactorizationtoestimatethosequantities.Severalmethodsaretestedon syntheticand simulation data.Ourapproach isdedicatedtodetect minorspeciesspectra rather than preciselyquantifyingthem.Onsyntheticexample,thisapproachisabletodetectchemicalcompounds present inform of100 hidden spectraout of104,at1.5 timesthe noise level. Results onsimulated spectraof NOMAD-SO targetingCH4 show that detectionlimits goes inthe rangeof100–500ppt in favorableconditions.ResultsonrealmartiandatafromNOMAD-SOshowthatCO2 andH2Oarepresent, asexpected,butCH4isabsent.Nevertheless,weconfirmasetofnewunexpectedlinesinthedatabase, attributedbyACSinstrumentTeamtotheCO2magneticdipole.
© 2020ElsevierLtd.Allrightsreserved.
1. Introduction
In modern exploration science, one has to face a major chal- lenge:howtolearnsomethingnewfromanalyzingalargedataset collectionwhiletakingintoaccountwhatwealreadyknow.Ifthe currentknowledgeoverridestheanalysis,thediscoveryofnewel- ementsmaybe difficult.Usually,inthefield ofspectroscopy,one can compare laboratory spectra, model and observation spectra.
Goingbackandforthleadstodiscoveryofnewlinesbyidentifying unexpectedresidualsintheobservationdata(notexpectedbythe model).Sometimes,initialidentificationoflinescanbewrong.As
∗Corresponding author.
E-mail address: [email protected] (F. Schmidt).
an example,spectroscopicevidenceofatmospheric CO2 icecloud wasreportedafterthediscovery ofan emission spikeata wave- length of4.3mm fromMariner 6and7infrared probings ofthe bright martian limb [16], but this spectral feature wasmistaken foraresonantscatteringbandofCO2 fluorescence[22].
Foronesinglespectrum,onecanusesimulationalgorithm(see for instance [9]). For large datasets, simplest ideas would be to scrutinize average spectra, or potential band depth distribution.
Unfortunately,inthecaseoflowsignal-to-noiseratio(SNR,defined assignal /standarddeviationofnoise), suchmethodsfail(aswill beillustrated inthetoy example).Analyzingresiduals aftermod- elingisagoodmethodbutitrequiresalotofwork.
Severalstatisticaltoolswithvariousapproacheshavebeenpro- posed,such asthePrincipalComponentAnalysis(PCA)[12,28],or
https://doi.org/10.1016/j.jqsrt.2020.107361 0022-4073/© 2020 Elsevier Ltd. All rights reserved.
F. Schmidt, G.C. Mermy, J. Erwin et al. Journal of Quantitative Spectroscopy & Radiative Transfer 259 (2021) 107361
Independent Component Analysis (ICA) [8,30], but mostof them require a human operator to pick endmembersand trends since those methodsarenothingmorethanachangeofrepresentation.
Furthermore, noneof these methods guarantees positivity ofthe component (which are sometimes also called source), which can be problematicduringthe interpretation.Recently, advanced ma- chinelearningmethodsbasedonnon-negativematrixfactorization havebeenproposed[6,13,17,20,25,29].Thisapproachiscompletely different fromPCA/ICA: eachsource ispositive andrepresentsan endmember/atrend.Asourceisnotonespectrumextractedfrom thedatasetbutastatisticalreconstruction.Byusingthisapproach, the human operator doesn’t have to identifyendmembers/trends anymore, since theyare automaticallypicked by thealgorithm in form of source.Furthermore when there are statistical / spectral correlationsbetweensourcesPCA/ICAfailsbecauseitassumesor- thogonality/independence,whichisnotthecasefornon-negative matrixfactorization.
Basedonthisnewapproach,weproposeatool:
• togiveanoverviewandquicklysummarizealargeandcomplex spectroscopicdatasetwithsimplevariables
• todetectpotentialnewspectroscopicfeatures(unexpectedmi- norspecies,newabsorptionlines,...)
• tobeperformedinafullyblindway(withoutpriorinformation onneitherthespectra,northeabundances).
The target observation type of this study is solar occulta- tion. This measurement principle has been proposed asearly as 1900, an interesting review was publishedby Smith andHunten [31]. Several recent instruments used this technique to investi- gatethecompositionoftheEarth’s(SCIAMACHY/ENVISATBovens- mann et al.[4]), Mars’ (SPICAM Bertauxet al. [2]) orVenus’ at- mospheres (SPICAV Bertauxetal.[3]). Herewe will focusonthe recentNOMADinstrument[33],andespeciallytheSOchannel,de- signed tostudytheMartianatmosphereandits tracegases, such asmethane.IndeedthepresenceofCH4onMarsisaveryhottopic fortheplanetarysciencecommunity[14,19,23].Inthepresentarti- cle,weproposetoapplythetoolforpotentialCH4detection.Nev- ertheless,theapproachcanbeextendedtoothertypesofspectro- scopicmeasurements.
2. Dataset
We propose here to focus on the Nadir and Occultation for MArsDiscovery(NOMAD)instrumentonboardESA’sExoMarsTrace GasOrbiterandespeciallytheSolarOccultation(SO)channel[33]. NOMAD isa compact,high-resolution, dualchannel IR spectrom- eter (SOandLNO)coupled withahighlyminiaturized UV-visible spectrometer (UVIS),capableofoperating indifferentobservation modes:solaroccultation,nadirandlimb.
The SO channel operatesatwavenumbers from2320cm−1 to 4550 cm−1 (wavelength 2.2 to 4.3 μm), using an echelle grating withagroove densityof4lines/mmina Littrowconfigurationin combinationwithanAcousto-OpticTunableFilter(AOTF)forspec- tral order selection. The width of the selected spectral ranges is recordedby320spectels(spectralelement)andvariesfrom20to 35cm−1dependingontheselecteddiffractionorder.Thedetector isanactivelycooled HgCdTeFocalPlaneArray.SOachievesan in- strument lineprofile resolutionof 0.15cm−1, corresponding toa resolvingpowerl/Dlofapproximately25000.Alldetails ofthein- strumentareavailablein[27]and[34].Theorderswiththemaxi- mumsensitivitytoCH4are:119,134and136.Wewillusethedata fromthe beginning ofthemission inApril2018 until 15January 2020, incalibrationversion1p0a.Duetotemperaturechange,the spectral registrationvaries,producinga shiftup to ~ 10 spectels.
We correcteditbyaligning thefulldatasetto areferencespectra (arbitrarily choosen withthe maximum band depth ofwater) by
cross-correlation. No sub-spectel resamplinghas been performed but a simple shift. When the calibration will be improved, this step willmost probablybe replaced by a routine correction. The dataare available on the ESA/Planetary ScienceArchive aftera 6 monthsembargoperiod.
3. Method
Inthissection,wefirstdescribethedatapretreatmentrequired fornon-negativematrixfactorizationpurposefollowedbythedata miningmethod.
3.1. Datapretreatment
After calibration, the NOMAD SOspectra are in transmittance T=I/I0, depending onwavenumber
ν
, withI the observed lightintensitytroughtheatmosphereandI0thesolarspectrameasured outsidetheatmosphere.
Assumingthattheatmosphereishomogeneous,andthatmulti- plescatteringandrefractionarenegligible[4,31],theopticaldepth
τ
is a linear combinationof E(ν
) the total extinction,andthe
slantcolumndensity,foreachchemicalspeciesi:
τ ( ν )
=−logT( ν )
≈NS
i=1
Ei
( ν )
.i+MC
( ν )
(1)withNS,thetotalnumberofspeciesandMC(
ν
)amodeledcontin-uumdescribedbelow.
Theslantcolumndensity
isdirectlyrelatedtothetotalnum-
berofparticlesN(s)alongthelineofsights:
=
N
(
s)
ds (2)Whiletheextinctionbygasisusuallyhighlystructured,absorp- tionbyparticles,scatteringbymoleculesandparticles,andalsore- flectionatthesurfacearebroadbandfeatures. Suchlargefeatures aremodeled bya continuum MC(
ν
), oftentakenasa polynomial, thatisfilteredout.The problem with this continuum removal rationale is that whentheopticaldepthislarge,theSNRisdecreasedandthenoise effectoncontinuumremovalamplified(seeSup.Mat.).
Instead ofusing thisrationale, we propose to first correctfor thecontinuumC(
ν
)inthetransmittancespace:T∗
( ν )
=T( ν )
−C( ν )
(3)Thenconvertthespectraintoabsorbance:
X
( ν )
=1−T∗( ν )
(4)Thefinalstepisthelinearmixture:
X
( ν )
≈NS
i=1
Si
( ν )
.Ai (5)with Si(
ν
) the source spectra and Ai the spectral abundance. In thisdescription, the physical meaning of Si(ν
) and Ai islost but the apparent SNR isdramatically increased, which ismuch more importantforour analysis.Nevertheless,assumptions required in Eq.(1)areusuallynot relevant.Radiativetransfermodelusedfor precisequantificationishighlynon-linear.One has to consider that this unsupervised linear unmixing problemisalreadyverydifficultformachinelearning.Solvingnon- linear model in a unsupervised way is a research area that is clearlynotsolvedyet.Inaddition,wewouldliketofocusonspec- tral detection, rather than quantification. Thus, we will focus on S(
ν
)muchmorethanA.Wewillshowthatforlinear,butalsonon-linearsimulationandrealdata,meaningfulS(
ν
) can beretrieved.Duetonon-linearity,Amaydiffersignificantlyfromtruth,butthe
big tendencies should berespected. After the quick-lookanalysis, estimatingSi andA,onemustgo backtothe realdata.Themost trivialstrategy istopickthe spectraXout ofthecollection,with thehighestabundanceofaselectedsourceSi.
Inthefollowing,wewillusethecontinuumestimationC(
ν
)us-ing asymmetric least square [7], with parameters :
ν
smooth=103 andp=1−10−2,10numberofiterations.3.2. Nonnegativematrixfactorization
Foracollectionofspectra,Eq.(5)canbewritteninmatrixform Xkj≈Ski.Aij,withithesourceindex(from1toNS),jtheobserva- tionindex(from1toNO)andkthewavenumberindex(from1to Nν).Thus,onehavetoestimateSandA,byminimizingtheobjec- tivefunction:
F =
X−S.A2 (6)with
.,theFrobeniusnorm(usualL2norm).Several algorithms have beenproposed to solve thisproblem, subjecttopositivity(bothSandAarenon-negative).Suchproblem is calledNonnegativeMatrixFactorization (NMF).Thisconstraint is important to keep the physical meaning, but also to promote sparsityofS(asignalissparsewhenmostofthevaluesareclose tozeroexceptseveralnon-zerovalues).LetS˙ andA˙ betheestima- tionofthosequantities.
MUWe proposetousetheMultiplicativeUpdates(MU)ofLee andSeung[20]acceleratedbyGillisandGlineur[13].Weusedthe convergence parameter
α
MU=1. Other alternative algorithms are possiblebutgiveequivalentresultssincetheyminimizethesame cost function.Thisalgorithm hastheadvantageofvery fastcom- putationtimebuttheresultmaydependoninitialization.BPSS2 We propose to test another kind of algorithm: the Bayesian Prior Source Separation [6,24], that has beenoptimized [29], hereafter calledBPSS2. This algorithm has the main advan- tage to account for extra constraint : the sum-to-one or sum- lower-than-oneontheabundances (
iAi j=1) thatalsopromotes sparsity of S. This algorithm, based on Monte Carlo approach is much moretimeconsuming. Oneapproachtoreduce thecompu- tation time is to select only relevant spectra out of the dataset [25], but then the statistics may be biased [29]. Thanks to the advances of computer capabilities, we propose to treat the full dataset. Thiskindofalgorithm isveryslow butsincethe formu- lationisBayesian,itconvergetowardanuniquesolution.
psNMF In orderto regularize the problemof Eq.(6), one can add anextrapenalizationtermtoenforcesparsityonA(onlyfew nonzeroselementsinA)[18]:
F =
X−S.A2+λ
A1 (7)With
.1,theL1norm.Thefirsttermiscalleddataattachment term(the usual squareddifference).The second iscalledregular- izationterm.Theproblemwiththisapproach,isthathyperparam- eterλ
isnot known andhasto betuned manually. Arecentap-proach has beenproposed to solve thisproblem inthe Bayesian framework [17].The main idea isto encompass all variables and hyperparametersinauniqueproblemthatisestimatedwithvaria- tionalupdateprinciple.Wewillreferthisalgorithmtoprobability sparse NMF(psNMF). Thisalgorithmhastheadvantage tohavea reduced computationtimeand nohyperparametertuning. Italso has a regularization termto avoid strongdependence ofthe ini- tializationonthefinalsolution.
In order to estimate the precision of the reconstruction, we usedtheRootMeanSquareDifferenceRMSD:
RMSD=
X−S˙.A˙
2 X (8)With
.,themean.Oncethesourcesareestimated,wequantifytheirrelevancefor the global dataset. From the total reconstruction X˙k j=S˙ki.A˙i j, for alli,wecan estimatethecontributionofsourcei,that istosay:
X˙ik j=S˙ki.A˙ij.Thus,therelevanceofsourceiisdefinedas:
Ri=
|
X˙i−X˙|
X˙ (9)ThisdefinitionisconvenientsincethesumofallRiisone(this property isonly presentwhen sources andabundances are posi- tive)andwecaneasilyestimatethe%contributionofeachsource inthe final reconstruction. One hasto note that relevance is not ameasureofpresenceornotofaminorspecie(forinstanceCH4) but a measure of how important is the source over the dataset.
Major species,should always havea larger relevance than minor species.Inthefollowing,weplotallsourcesresultsbydecreasing orderofrelevance.
3.3. Banddepth(BD)
Weusedthefollowingbanddepthdefinition,differenceofthe geometricmeanoftworeferencewavenumbersinthecontinuum, comparedtotheband:
BD=X
( ν
l)
ννcr−−ννll.X( ν
r)
ννrr−−ννcl −X( ν
c)
(10)withXtheobservedspectraintransmittance,
ν
c thewavenumberof the centerof band,
ν
l the wavenumber of thereference level ontheleft (smallerwavenumber),ν
r thewavenumberoftheref- erencelevelontheright(largerwavenumber).4. Synthetictests
Wesimulatedseveralsyntheticobservationsindifferentcondi- tions,tomimicthecaseofNOMAD-SO.Thefirstsectiondescribes a simpletoy modelexample andthesecond one presentsexten- sive testsof thistoy modelwithvarious cases.Byhiddenspectra, hidden compounds and hidden CH4, we always refer to a spectral datasetwithadominantmajorcomponent(herewater)andami- norspecie(hereCH4).Thegoaloftheproposedapproachistopick upasource,containingCH4only.
4.1. Toyexample
4.1.1. Syntheticdataset
Inordertodemonstratetheusefulnessofourmethod,wepro- poseherea toyexampleinavery difficultcase.Wewill seethat usualmethodfailsdetectionbutourmethodisabletodetectthe hiddencompounds.
Forthistoyexample,wesimulatealinearmixtureofNO=104 observations spanned over Nν=320 spectels (see Fig. 1) simi- larto order 136 ofNOMAD-SO.Each spectrum is amixture of a spectraofwatervapor SH2O (coming fromoneactual sourceesti- matedfromrealdatausingpsNMF)andtheoreticalmethaneSCH4 fromVillanuevaet al.[35],with corresponding abundancesAH
2O, ACH
4:
X=SH2O.AH2O+SCH4.ACH4+n (11) Thenoise nisassumedto be aGaussian process witha stan- dard deviation of
σ
=0.001 and no bias: n=G(0,σ
). All spec- tra contain pure water vapor with a coefficient following AH2O= 5/6.β
(1,10)+1/6.U(0,1), a mixture of beta (β
) distribution for 5/6 ofthesample andan uniform(U) distribution for1/6 ofthe sample. This process mimics well the water vapor band depthF. Schmidt, G.C. Mermy, J. Erwin et al. Journal of Quantitative Spectroscopy & Radiative Transfer 259 (2021) 107361
Fig. 1. Synthetic dataset containing 10 4spectra with various abundances of H 2O and 100 containing CH 4at 3- σlevel of the noise. In blue the reference spectra S H2Oof H 2O (coming from actual data analysis). In red the reference spectra S CH4of CH 4(from theoretical data).
Fig. 2. Water vapor Band Depth distribution (left) in the real observation (right) modeled by the toy example.
distribution (BD, seedefinitionin Section 3.3) of thereal dataset (see Fig. 2). As the baseline of SH
2O is not zero, we also mimic baseline correction errors. In addition 100 spectra out of 10,000 contain methanewithACH4=1,suchthat thebanddepth ofSCH4
is at 3-
σ
level.Please note that the model to generate the datais not fulfillingthe sum-to-one constraint, but fully fulfillingthe positivityconstraint. Giventhedefinednoiseandsignallevel,the RMSDexpectedforaperfectreconstructionofthesignal(andnot thenoise)is0.16.
ThefinalsyntheticdatasetisrepresentedinFig.1.
Inordertocheckthequalityoftheestimation,wesimplycom- putethecorrelationcoefficientbetweenSCH
4 andtheestimatedNS sourcesS˙,using:
Q=corr
SCH4,S˙:i
(12)The ith source with the maximumcorrelation is identified to CH4 contribution. The value to the maximum correlation is used asmetrictoassessthequalityoftheretrieval.
Fig. 3. (left) Histogram of Band Depth at 3067.2 cm −1from the dataset containing 100 CH 4at 3- σ level out of 10 4spectra. (right) 100 spectra with the maximum Band Depth at 3067.2 cm −1specific of CH 4. Signal is dominated by water and by noise. No specific signature of CH 4is visible.
4.1.2. Results
By plotting the 10,000 samples of the dataset, one is able to identifyeasilytheH2Obands.Nevertheless,wecannotobservethe target CH4 in theaveragespectrum,even at3-
σ
level,becauseitislostinthebaselinechanges.
The second simpletool fordetectionwouldbe the analysisof thebanddepth.Fig.3(left)showsthehistogramofthemainCH4 band that exhibits no sign of the presence ofCH4 (no asymme- try inthe positive part).Fig. 3(right) representsthe 100 spectra with themaximum CH4 BDat 3067.2 cm−1. Again, no particular elementscanbeusedtoarguefordetection.
Fig.4representstheresultsfromthenon-negativematrixfac- torization using psNMF algorithm. One can clearly identify both H2OandCH4 sources.Since those2chemicalcompounds arenot correlatedinabundance,(AH
2OandACH
4 areindependent),twodif- ferentsourcespectraareidentified.Please notethattherelevance ofsource4isverylow(0.4%),meaningthatonly0.4%ofthevari- abilityinthedatasetisduetoCH4,averylowvalue,asexpected forminorspecies.
Inthiscase,thecorrelationcoefficientbetweenestimatedabun- dances A˙4: andtrueonesACH
4 is0.73. Sincethequantification of abundance isa moredifficult problem,we will notpayexcessive attentiononthisparameter.
4.1.3. Convergenceandcomputationtime
We settheMUalgorithmconvergencetorelativedifference of the cost function <10−8 anda maximum runningtime of 1000 seconds.ForpsNMF,wesettherelativedifferenceofthecostfunc- tion to <10−7 anda maximum iterationto 2000.ForBPSS2, we compute a minimum burn in of 1000 iterations and after that whenthelong termstatistics(1000lastiterations)oftheMarkov Chainisclosetotheshorttermstatistics(100lastiterations),con- vergenceisconsideredtobereached.Thenanother1000iterations arecomputedtoestimatethefinalsolutionstatistics.
Werunthe3identifiedtools10timesonthesamedatasetwith differentnoise realization,andcompute meanandstandarddevi- ationfromthese10experiments.ResultsarepresentedinTable1. Onecanclearlyseethattheeveniftheconvergenceisset,thereis a highvariabilityinMUresults,duetothelackofregularization.
Onthisparticularexample,thebestisclearlypsNMFalgorithm.
Table 1
Results (mean and standard deviation) from 10 realizations of a toy synthetic exam- ple with N S= 5 (in agreement with next section on synthetic tests), N O= 10 0 0 0 , N ν= 320 and 300 CH 4spectra hidden at a level of 1 std of the noise. Quality is computed as a correlation coefficient (see Eq. (12) ). RMSD is computed from Eq. (8) . Computation time is expressed in second.
MU psNMF BPSS2
Quality Q 0.35 ±0.12 0.822 ±0.005 0.41 ±0.06 RMSD relative error 0.1455 ±2 . 10 −6 0.1461 ±5 . 10 −6 0.1468 ±3 . 10 −4 Computation time (s) 13 ±8 46 ±9 413 ±21
TheRMSDiscomputedforallcasesandshowninTable1.We canobservethat thevalue isalmost equivalent,around 0.146,for allmethodbutMUisslightlybetter,duetothefactthat thecost function hasnoother term. MUalgorithm isjustminimizing the reconstruction.As acomparison,theRMSDexpectedforaperfect reconstructionofthe signal(andnot thenoise)ofthistoy exam- pleis0.16.With5 sources(significantlymore thanthe3 sources definedinthistoyexample),noiseisalsoencompassedwithinthe approximatedlinearmodel,asexpected.
The quality Q is the only parameter to assess the quality of the algorithm todetect minor specie (hereCH4). Inthis particu- lartoyexample,psNMFseemstobethebestalgorithm,providing asourcecorrelatedwithgroundtruthCH4 withacorrelationcoef- ficientupto0.8. Wewillextensivelytest thisperformance inthe nextsection.
Wealsoestimatethecomputationtimeona2.9GHzIntelCore i7with16GoDDR3RAMasanexample.Allalgorithmsareimple- mentedin©Matlabusingparallelizedmatrixcomputation.Results, presentedin Table 1,demonstrate that MUis fasterthan psNMF but both are clearlyless resources consuming than BPSS2. From thecomputationtimeandefficiency,weexcludedBPSS2fromthe nexttests.
4.2. Extendedsynthetictests
Forthefirstsetoftests,weusedthesametoymodeldescribed inSection4.1,exceptwith100CH4 spectrahiddenatalevelof2 and3standarddeviationofthenoise(thisnumberiscalled“factor above noise level”). In order to haverobust results,we made 10 realizationsandaveragedtheresults.
F. Schmidt, G.C. Mermy, J. Erwin et al. Journal of Quantitative Spectroscopy & Radiative Transfer 259 (2021) 107361
Fig. 4. Results of the psNMF algorithm for N S = 4 . Sources 1 and 3 are identified to the level with significant noise contribution, source 2 is identified to H 2O (correlation coef. with groundtruth 0.99), and source 4 is CH 4(correlation coef. with groundtruth 0.98). Relevance is computed from Eq. (9) .
Fig. 5 represents the results as a function of the number of sources NS. It presents two quality indicators of the results: the average correlation coefficient Q (see Eq. (12)) and the fraction of realizationwithacceptableresults (withQ >0.5). Wecan ob- serve that the psNMF is always better than MU on average at cost of an higher variability (higher standard deviation). Adding sources seems to always increase the detection until reaching a plateau around NS=5. Adding more sources will not drastically increase/decrease the source estimation. Nevertheless, it requires more computationtime for a larger number of source (approxi- matelyx2between3and9sourcesbutthecomputationtime al- waysstaysbelow200seconds).
For thesecond set oftests, we usedthe sametoy model,ex- cept with50and100 CH4 spectrahiddenatalevel of0.7,1,1.2, 1.5, 2.0,2.5and3standarddeviationofthenoise(thisnumberis called “factorabove noiselevel”). In orderto haverobust results, we made10 realizationsandaveraged theresults.Results are al- ways withRMSD< 0.18with an average ~ 0.16.RMSDfromthe noiselevelis0.16whatevertheexperiment(theCH4islowenough so thatit’scontributionto RMSDisnegligible), sothereconstruc- tionisinaverageasexpected.
Fig.6presentstwoqualityindicatorsoftheresults:theaverage correlation coefficient Q(seeEq.(12)) andthefractionofrealiza- tion withacceptable results (withQ > 0.5).Both indicatorsindi- catethatthemethodpsNMFclearlyoutperformsMUathighfactor abovenoiselevel.Fromourvisualinspectionoftheresults,wede- fine thedetectionlimitwhen atleast50%oftheresultsare with Q > 0.5(correlationcoefficient > 0.5).This definitionisdebat- ablebutthereisnoabsolutewayofdefiningit.Fig.6showsthat thedetectionlimitisat1.5factorabovenoiselevelfor100hidden spectracase,around2for50hiddenspectra.Belowthislimit,none
ofthemethodisabletodetecttheCH4spectrafromthenoise.For 20hiddenspectra,evenatafactorabovenoiselevelof3,noneof themethods isabletodetect theCH4 spectra.Onecanalso note thatthepsNMFislessstablesincethestandarddeviationismuch larger.
5. SimulationofNOMAD-SO 5.1. Simulationdataset
This second dataset has been generated with the most pre- cisedirectmodel,takingintoaccount thefullnon-linearradiative transfer and instrumental effects to produce synthetic transmit- tance,highlycomparablewithactualobservations.Synthetictrans- mittances were made forreal NOMAD-SOobservation filesusing therelevantgeometryandinstrumentparameterstoattempttoin- cludethevariabilityinherentinthetruemeasurements.
Modelatmospheres foreach occultation were developedfrom theGEM-Marsgeneralcirculationmodel[5,26].Theoutputofthe modelwere providedfor1 Martiandayevery 10solarlongitude, and48timestepsper Martianday.Atmosphericprofiles were de- velopedforeachoccultationby interpolatingthemodeltempera- tureandpressuretothesolarlongitude,localsolartime,latitude, longitude,andtangentaltituderelativetotheareoid.
Toconstructthesimulatedtransmittancespectra,thehighres- olutionirradianceswerecomputedforeachoccultationassuminga sphericallysymmetryandthetangentatmospheredevelopedfrom GEM-Mars for several different abundance of methane and wa- ter,which were simulatedasconstant volume mixingratios. The spectroscopic data for methane and water were taken from HI- TRAN 2016 using CO2 broadening [10,11,15]. The instrument for-
Fig. 5. Results of the MU and psNMF algorithm for N S= 3 to 9, N O= 10 0 0 0 , N ν= 320 , as a function of the number of source. The average Q of 10 realizations of the best estimated source (thick lines and standard deviation in thin lines) and the fraction of acceptable results (with Q > 0.5). (left) with a factor above noise level of 2 (right) with factor above noise level of 3.
Fig. 6. Results of the MU and psNMF algorithm for N S= 5 , N O= 10 0 0 0 , N ν= 320 , as a function of the factor above noise level. The average Q of 10 realizations of the best estimated source (thick lines and standard deviation in thin lines) and the fraction of acceptable results (with Q > 0.5). (left) with 100 hidden CH 4spectra (right) with 50 hidden CH 4spectra.
ward model wasthen applied to each simulation by considering theAOTFbandpass,instrumentInstrumentLineShape (ILS),blaze function, specteltowavenumbercalibration,andthecontribution oflightcomingfromthemainorderandnearbyorders[27,33,34]. The final synthetic transmittancespectrais theratio ofthislow- resolutionirradiancetothetop-of-atmospherelowresolutionirra- diance.
The AOTF/echelle instrument was modeled using the latest available calibration[1,21],consideringorderadditionfrom+/−2 nearby orders (5 total). The spectral calibration of NOMAD-SO varies because it is affected by the instrumenttemperature, and is provided foreach individual NOMADspectra.The 320spectels
covertherange3056.1 cm−1 to3080.4cm−1 witha wavenumber stepof0.0763cm−1.
Nosimulation ofdusthasbeenperformed.Due tothelimited spectralrangeonasingleorder,about25cm−1,themajoreffectof dustandotheraerosolsisrelativelyflatbaseline,whichweremove atthepre-treatment ofthe spectra.When dustis opticallythick, then non-linearity may appear that are out of the scope of this simulation.
Thesimulationdatasetconsistof12,486spectra,simulatingob- servationsoforder136inthesameconfigurationasthe106solar occultationsactuallyobservedfromMaytoDecember2018.
F. Schmidt, G.C. Mermy, J. Erwin et al. Journal of Quantitative Spectroscopy & Radiative Transfer 259 (2021) 107361
Fig. 7. Results of the psNMF algorithm for N S= 5 on simulation dataset, averaged over 10 noise realizations, for different noise levels (0.001 and 0.0001) and different fractions of hidden CH 4(1%, 5%, 10%, 100%). Hidden CH 4are taken within the same orbital sequences. The left panels represent results for 10 ppm of water vapor and the right ones for 100 ppm of H 2O. From top to bottom, we show: a) Fraction of the 4 main CH 4peaks detected in the best source ; b) Mean distance to the expected center in spectel and c) Abundance of CH 4in the source αCH4. Please note that the absence of plotted data means that no source was successfully detected.
Table 2
Simulation parameters. Fraction of CH 4is fraction of spectra containing methane hidden in the simulation dataset.
CH 4[ppt] H 2O [ppm] fraction of CH 4[%] noise level Value 0; 100; 500; 1000 0; 10; 100 1; 5; 10; 50; 100 0.001; 0.0001
Weaddtothedatasetarandomnoisewithstandarddeviation of 0.001 and0.0001 in orderto simulate the instrumental noise (correspondingtoSNRof100and1000approximately).
WehidespectracontainingCH4 inafractionofthetotalnum- berofspectrafrom1%to100%inarandommanner.Inrealobser- vation,CH4maybespatially/temporallycoherentbutthenumber ofscenariosisinfinite.Wefeelthattherandomcaseisinteresting enoughtobetested.Onehastonotethatcontrarilytotheprevious toymodelofSection4,hereabundancearequantitativeabundance intheatmosphere.
ThesimulationparametersaresummedupinTable2. 5.2. Detectionlimits
WeappliedthepsNMFmethodwithNS=5,whichisthemost promisingonefromtheprevious analysis.Wecomputetheanaly-
sis10timesfor10differentrandomnoiserealizationsandaverage theresultsinordertopresentrobustconclusion.Weselectapure CH4 andapureH2Ospectra(notedPCH
4 andPH
2O)fromthesimu- lationasreferencespectra.
5.2.1. Methodstoanalyzetheresults
Themaindifferencewiththetoymodelsectionin4isthatH2O andCH4 maybe highly mixedinthe sources. Simple correlation coefficienttopickthebestsourceisthusnotefficientenough.We proposehereanotherapproachtoestimatethebestsource.
ForeachestimatedsourceS˙:i,weanalyzeitasalinearmixture ofPH
2OandPCH
4:
S˙:i=PH2O.
α
H2O,i+PCH4.α
CH4,i (13) ThisproblemiscalledsuperviseddetectionalgorithmsincePH2O
and PCH4 are known, contrary to the general one, presented in Eq.(5),wheresourcespectraarenotknown.Thesourcei∗withthe maximum
α
CH4,i∗ isselectedasthebesttarget CH4 source,called bestsourcehereafter.Wethenproposetousethreeindicatorsofgooddetection:
• Fraction of the 4 main CH4 peaks detected (at 3057.7, 3063.4, 3067.2and3076.6cm−1).Thisiscomputedusingthepeakde- tection algorithm from ©Matlab on both simulation and best
Fig. 8. Results of the psNMF algorithm for the diffraction order 119 for N S = 5 . The sources 1, 3 and 5 are identified to CO 2(shift of 0.01 for clarity). The source 2 is identified to the background level (continuum misestimation). The source 4 is identified to H 2O. No source seems to be related to CH 4.
F. Schmidt, G.C. Mermy, J. Erwin et al. Journal of Quantitative Spectroscopy & Radiative Transfer 259 (2021) 107361
Fig. 9. Results of the psNMF algorithm for the diffraction order 134 for N S= 5 . The source 1 is identified to the background level (continuum misestimation), the sources 3, 4 and 5 are identified to H 2O. The sources 2 present unmodeled lines that are not present in the spectroscopic database. These lines has been first detected in the ACS instrument data and attributed to CO 2magnetic dipole transition [32] . No source seems to be related to CH 4.
Fig. 10. Results of the psNMF algorithm for the order 136 for N S= 5 . The source 1 is identified to the level background (continuum misestimation), the sources 2, 3, 4 and 5 are identified to H 2O, either directly either from the adjacent orders. No source seems to be related to CH 4.
F. Schmidt, G.C. Mermy, J. Erwin et al. Journal of Quantitative Spectroscopy & Radiative Transfer 259 (2021) 107361
sourcewithatoleranceof2spectels,i.e.detectedpeakscanbe 2spectelsoff theexpectedone.Thepeakmustbewithamax- imumamplitudelargerthan1/1000themaximumofS˙:i∗tobe considered significant. Please note that even there are only 5 possiblefraction(0,0.25,0.5,0.75and1),sinceweaverageon 10realizations,anynumbercanappear.
• Meandistanceto the expectedcenter.Mean distancein spectel betweentheCH4peaksdetectedinthebestsourceandtheref- erenceone.
• AbundanceofCH4 inthesource.
α
CH4 (fromEq.(13)),whichde- scribestheamplitudeoftheCH4 peaksinthebestsource.5.2.2. Analysisoftheresults
Fig. 7 summarizes all the results.Fraction of the4 main CH4 peaksdetected inthemostrelevantsourcehasalwaysastandard deviation < 0.43 and a mean value of 0.06 over the 10 real- izations. The Mean distance tothe expected centerhas always a standard deviation < 0.40 anda mean value of 0.07 over the 10 realizations.TheabundanceofCH4 inthesource hasalwaysa standarddeviation < 0.05andameanvalueof0.005overthe10 realizations.
This figure shows that the detection limits clearlydepend on CH4density,butalsoonthefractionofhiddenCH4andnoiselevel, as expected. Abundance of CH4 in the source
α
CH4 maximum is 25%,meaningthatinanycasesH2Oisdominatingthebestsource andsobothCH4 andH2Oarepresentineachbestsource.Thisis becauseCH4isaminorspecie(asexpectedfromtheconditionsof oursimulation),itsabsorptionbandgenerallyfollowstheair-mass, asH2Odoes.SothereisnoparticularsourceforCH4only.When morethan twolinesaredetected,we canconsideritas a detection.This limit isreached forCH4 ≥ 500 pptfor10 and 100 ppm of H2O. Nevertheless,the detection limitslies between 100and500pptinthecaseof10ppmofH2Ovaporsincethede- tection isperfect(100%ofthe4mainCH4peaksdetected)occurs forafractionofCH45to50%.Interestingly,theoptimumdetection isnot when100%ofthespectracontainsCH4,butmorebetween 5–50%. This behavior is due to the statistics that is richerwhen alsoCH4 islackingincertain spectra.When100%ofspectracon- tain CH4,thestatisticalvariabilityofthedatasetismainly dueto airmass (atmosphereis assumedto be well mixed). Soboth CH4 and H2Oare varying together andthere is less statistics to base thedetectionon.
Noiseleveldoesnot affectfirstthefractionofthe4mainCH4 peaksbutincreasesthespectral shiftofthebandcenter.Inaddi- tion,itclearlyaffectstheabundanceandthusthebanddepth.
In conclusion, fromthis simulationanalysis, one could expect detectionlimitsofCH4 intherange100–500pptwhen operating infavorableconditions.
6. Realdataanalysis
Inthissection,wereporttheresultsofactualNOMADdata,fo- cusingondiffractionorderswithpotentialCH4 lines:119,134and 136,areshownrespectivelyonFigs.8,9and10.Weusedthe821 ingress andegress transitorbitsfororder119, 2358orbitsforor- der134and703fororder136.WefilterspectrawithSNR > 100.
ResultsarecomparedwithNOMADsimulations[35]usingthecal- ibrationpipeline. Thisprocess addsghost lines fromadjacentor- ders,asinrealdata.Table3summarizestherelativeerrorandthe number ofspectra.Theapproach hereistocompute the analysis withpsNMFusingNS=5inagreementwiththeprevioussection.
Please remindthat ourapproachis fullyblind:nospectral in- formation hasbeen included inthe analysis(nothing aboutH2O, CO2orCH4).
Forall orders,sources ofH2Oare estimated,asexpected. Also asourcepresentingaresidualofthecontinuum isalwayspresent.
Table 3
Number of spectra N Oand RMSD relative errors for 4 to 10 no. of sources N Sre- sulting from the analysis of all observations of NOMAD data up to 15 January 2020, using the psNMF algorithm. RMSD is computed from Eq. (8) .
119 134 136
N O 134,045 365,985 140,064
N S= 4 0.476 0.575 0.634
N S= 5 0.456 0.553 0.609
N S= 6 0.442 0.553 0.585
N S= 10 0.410 0.484 0.544
Duetonon-linearitiesoftheradiativetransfer,theacquisitionpro- cess (temperature dependence) and the wavenumber shift, the molecularspeciesappearssometimesindifferentsources.
Order136givesthe 1sourcerelatedto thebackgroundand4 sourcesrelatedtoH2O.All4sources ofwaterhavethepeaksbut withdifferentrelativeintensitiesandwavenumbershift.
Fororder119,bothCO2andH2Olinesareidentified(seeFig.8).
Since those two components are uncorrelated, separatedsources arefoundbythealgorithm.
Interestingly, order 134 presents a source with unexpected lines. The main linesare atpositions : 3016.70, 3017.07, 3018.12, 3019.54, 3020.90, 3022.25, 3023.60, 3024.96, and 3027.29 cm−1. TheselineshasbeenalsodetectedintheACSinstrumentdataand attributedtoCO2 magneticdipoletransition[32].Furtheranalysis shallbedonetocomparebothNOMADANDACSdata.
Solar lines are never appearing in the sources. Theyare self- corrected by the calibration since we don’t use a referencesolar spectrabutthesolarobservationduringthetransitwhenthetan- gentaltitudeissohighthatthereisnomartianatmosphere(typi- cally > 200km).
NoneoftheanalyzedorderspresentssourcesrelatedtoCH4.
7. Discussionsandconclusion
We implemented a new strategy to analyze spectroscopic datasets. This strategy is fully unsupervised, so that anykind of absorptionbandscanbediscovered.Theamountofpriorinforma- tionrequiredisthusverylow.Thecomputationcanbedoneona regular hardwareforthemostcommondatabaseandwithin rea- sonableamountoftime(~ 100,000spectra).
Weillustratetheapproachfortypicalatmosphericspectroscopy.
Wefirstputforwardasynthetictest,basedonsimplelinearmix- ingtogiveatoyexampleandtoidentifythebestpromisingalgo- rithm.ThepsNMFclearlyoutperformedMUandBPSS2.
Then we proposed a simulation, based on realistic radiative transfer andinstrumental effects, applied on NOMAD-SOspectra.
The detection limitsgoes below500 ppt in favorableconditions, withreducedH2Oandlownoise level.The samerangeofdetec- tionlimitsisreachwithusualapproachofmodelfittingatamuch highercomputationcostandanalysiseffort.Giventhesimplicityof use,thistoolmayberelevanttohandlelargeandcomplexdatasets atfirstglance.Asaperspective,analysisofresidualsafterthenon- linearretrievalofthedatamaylowerthedetectionlimits.Onecan thentestiftheresidualsaresimplyGaussiannoise,oriftheymay containinterestingfeatures.
Interestingly,a molecular specie not well mixed in theatmo- spherecanbemosteasilydetectedwithourapproach.
Thelastsectionpresentedtheresultsoftheapplicationonreal NOMAD-SOdata,using orders119, 134 and136,selectedasthey arerepresentativeofthebaselinestrategyofmeasurementsinNO- MAD, allowing characterizationofH2Oandpotential detectionof CH4.TheoutcomeisthatnoCH4hasbeenidentified,butH2Oand CO2 aredetected.Interestinglyanewsetofspectrallineshasbeen discoveredintheNOMADdata.Theselineshasbeenfirstdetected
intheACSinstrumentdataandattributedtoCO2magneticdipole transition [32]. We thus confirm their presence withour current analysis.
One way togo back to the data isto pick the realdata with the highestsource contribution A˙.Our quicklook analysisis thus only a starting point of a more complete scientific analysis. This second step will requiremuch moreprior information (chemical compounds, fundamentalspectroscopic constants, radiativetrans- fermodel,...).
Future work should apply the proposed approach to other datasets, such asother NOMAD-SOorders,or other spectroscopic datasets (including hyperspectral images) from laboratory mea- surements, groundbasedtelescopes orspace-born spectrometers.
The approach is generic enough to treat datasets that can be at firstorderapproximatedtoalinearmixture.
DeclarationofCompetingInterest
Theauthorsdeclarethattheyhavenoknowncompetingfinan- cialinterestsorpersonalrelationshipsthatcouldhaveappearedto influencetheworkreportedinthispaper.
CRediTauthorshipcontributionstatement
Frédéric Schmidt: Conceptualization, Software, Methodology, Writing -original draft. GuillaumeCruz Mermy: Software,Writ- ing - review &editing. JustinErwin: Writing -review & editing.
Séverine Robert: Writing - review & editing. Lori Neary: Writ- ing-review&editing.Ian R.Thomas:Writing-review&editing.
Frank Daerden:Writing -review&editing.BojanRistic:Writing - review & editing.ManishR.Patel: Writing - review & editing.
Giancarlo Bellucci: Writing - review & editing.Jose-Juan Lopez- Moreno:Writing -review &editing.Ann-CarineVandaele:Writ- ing-review&editing.
Acknowledgments
We acknowledge support from the “Institut National des Sci- ences de l’Univers” (INSU), the "Centre Nationalde la Recherche Scientifique" (CNRS) and "Centre National d’Etudes Spatiales"
(CNES) through the "Programme National de Planétologie" and the ExoMars TGO programs. The NOMAD experiment is led by the Royal Belgian Institute for Space Aeronomy (BIRA-IASB), as- sistedbyCo-PIteamsfromSpain(IAA-CSIC),Italy(INAF-IAPS),and the UnitedKingdom(OpenUniversity).Thisprojectacknowledges fundingbytheBelgianSciencePolicyOffice(BELSPO),withthefi- nancialandcontractualcoordinationbytheESAProdexOffice(PEA 4000103401,4000121493), bySpanishMinistryofScienceandIn- novation (MCIU) and by European funds under grants PGC2018- 101836-B-I00 and ESP2017-87143-R (MINECO/FEDER), as well as by UK Space Agencythrough grants ST/R005761/1, ST/P001262/1, ST/R001405/1andST/R001405/1andItalian Space Agencythrough grant2018-2-HH.0.ThisworkwassupportedbytheBelgianFonds delaRechercheScientifique-FNRSundergrantnumber30442502 (ET-HOME). The IAA/CSIC team acknowledges financial support fromthe StateAgency forResearchofthe SpanishMCIUthrough the Center ofExcellenceSeveroOchoa award fortheInstituto de Astrofísica de Andalucía (SEV-2017-0709). US investigators were supported by the NationalAeronautics andSpaceAdministration.
Canadian investigators were supported by the Canadian Space Agency.
Supplementarymaterial
Supplementary material associated with this article can be found,intheonlineversion,atdoi:10.1016/j.jqsrt.2020.107361.
References
[1] Aoki S , Vandaele AC , Daerden F , Villanueva GL , Liuzzi G , Thomas IR , et al. , the NOMAD team Water vapor vertical profiles on mars in dust storms observed by tgo/nomad. J Geophys Res 2019;124(12):3482–97 .
[2] Bertaux J-L , Fonteyn D , Korablev O , Chassefière E , Dimarellis E , Dubois J , et al. The study of the martian atmosphere from top to bottom with SPICAM light on mars express. Planet Space Sci 20 0 0;48(12–14):1303–20 .
[3] Bertaux J-L , Nevejans D , Korablev O , Villard E , Quémerais E , Neefs E , et al. SPI- CAV On venus express: three spectrometers to study the global structure and composition of the venus atmosphere. Planet Space Sci 2007;55(12):1673–700 . [4] Bovensmann H , Burrows JP , Buchwitz M , Frerick J , Noël S , Rozanov VV , et al. SCIAMACHY:Mission objectives and measurement modes. J Atmos Sci 1999;56(2):127–50 .
[5] Daerden F , Neary L , Viscardy S , Muñoz AG , Clancy R , Smith M , Encrenaz T , Fe- dorova A . Mars atmospheric chemistry simulations with the gem-mars general circulation model. Icarus 2019;326:197–224 .
[6] Dobigeon N, Moussaoui S, Tourneret J-Y, Carteret C. Bayesian separation of spectral sources under non-negativity and full additivity constraints. Sig- nal Processing 2009;89(12):2657–69 . http://www.sciencedirect.com/science/
article/B6V18- 4W9XDSW- 2/2/f3d4b6f457b91e5ccfcce8ffcf41bb18 .
[7] Eilers PH , Boelens HF . Baseline correction with asymmetric least squares smoothing. Leiden University Medical Centre report; 2005 .
[8] Erard S, Drossart P, Piccioni G. Multivariate analysis of visible and infrared thermal imaging spectrometer (virtis) venus express nightside and limb ob- servations. J Geophys Res 2009;114. doi: 10.1029/2008JE003116 .
[9] Faisal M , Windholz L , Kröger S . Systematic investigations of the hyperfine structure constants of niobium i levels. part i: constants of upper odd parity energy levels between 16,672 and 31,025 cm-1 and discovery of a new level. J Quant Spectrosc Radiat Transfer 2020;245:106873 .
[10] Fissiaux L , Delière Q , Blanquet G , Robert S , Vandaele AC , Lepère M . CO 2-broadening coefficients in the ν4 fundamental band of methane at room temperature and application to CO2-rich planetary atmospheres. J Mol Spec- trosc 2014;297:35–40 .
[11] Gamache RR , Farese M , Renaud CL . A spectral line list for water isotopologues in the 1100–4100 cm-1 region for application to CO2-rich planetary atmo- spheres. J Mol Spectrosc 2016;326:144–50 .
[12] Geminale A, Grassi D, Altieri F, Serventi G, Carli C, Carrozzo F, Sgavetti M, Orosei R, D’Aversa E, Bellucci G, Frigeri A. Removal of atmospheric features in near infrared spectra by means of principal component analysis and tar- get transformation on mars: I. method. Icarus 2015;253(0):51–65 . http://www.
sciencedirect.com/science/article/pii/S0019103515000640 .
[13] Gillis N , Glineur F . Accelerated multiplicative updates and hierarchi- cal ALS algorithms for nonnegative matrix factorization. Neural Comput 2012;24(4):1085–105 .
[14] Giuranna M , Viscardy S , Daerden F , Neary L , Etiope G , Oehler D , et al. Inde- pendent confirmation of a methane spike on mars and a source region east of gale crater. Nat Geosci 2019;12(5):326–32 .
[15] Gordon I , Rothman L , Hill C , Kochanov R , Tan Y , Bernath P , et al. The hi- tran2016 molecular spectroscopic database. J Quant Spectrosc Radiat Transfer 2017;203:3–69 .
[16] Herr KC , Pimentel GC . Evidence for solid carbon dioxide in the upper atmo- sphere of mars. Science 1970;167:47–9 .
[17] Hinrich JL , Mxørup M . Probabilistic sparse non-negative matrix factorization.
In: Latent variable analysis and signal separation. Springer International Pub- lishing; 2018. p. 488–98 .
[18] Kim H , Park H . Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinfor- matics 2007;23(12):1495–502 .
[19] Korablev O , Vandaele AC , Montmessin F , Fedorova AA , Trokhimovskiy A , Forget F , Lefèvre F , Daerden F , Thomas IR , Trompet L , Erwin JT , Aoki S , Robert S , Neary L , Viscardy S , Grigoriev AV , Ignatiev NI , Shakun A , Patra- keev A , Belyaev DA , Bertaux J-L , Olsen KS , Baggio L , Alday J , Ivanov YS , Ris- tic B , Mason J , Willame Y , Depiesse C , Hetey L , Berkenbosch S , Clairquin R , Queirolo C , Beeckman B , Neefs E , Patel MR , Bellucci G , López-Moreno J-J , Wil- son CF , Etiope G , Zelenyi L , Svedhem H , Vago JL The ACS and NOMAD team. No detection of methane on mars from early ExoMars trace gas orbiter observa- tions. Nature 2019;568(7753):517–20 .
[20] Lee DD, Seung HS. Learning the parts of objects by non-negative matrix fac- torization. Nature 1999;401(6755):788–91. doi: 10.1038/44565 .
[21] Liuzzi G, Villanueva G, Mumma M, Smith M, Daerden F, Ristic B, et al. Methane on mars: new insights into the sensitivity of CH 4with the NOMAD/ExoMars spectrometer through its first in-flight calibration. Icarus 2019;321:671–90.
doi: 10.1016/j.icarus.2018.09.021 .
[22] López-Valverde M , López-Puertas M , López-Moreno J , Formisano V , Grassi D , Maturilli A , et al. Analysis of non-LTE emissions at in the martian atmo- sphere as observed by PFS/mars express and SWS/ISO. Planet Space Sci 2005;53(10):1079–87 .
[23] Moores JE , Gough RV , Martinez GM , Meslin P-Y , Smith CL , Atreya SK , et al. Methane seasonal cycle at gale crater on mars consistent with regolith adsorption and diffusion. Nat Geosci 2019;12(5):321–5 .
[24] Moussaoui S , Brie D , Mohammad-Djafari A , Carteret C . Separation of non-neg- ative mixture of non-negative sources using a bayesian approach and mcmc sampling. Signal Processing, IEEE Transactions on [see also Acoustics, Speech, and Signal Processing, IEEE Transactions on] 2006;54(11):4133–45 .