HAL Id: hal-03118299
https://hal.inrae.fr/hal-03118299
Submitted on 22 Jan 2021
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of
sci-entific research documents, whether they are
pub-lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Improved prediction of tablet properties with
near-infrared spectroscopy by a fusion of scatter
correction techniques
Puneet Mishra, Alison Nordon, Jean-Michel Roger
To cite this version:
ContentslistsavailableatScienceDirect
Journal
of
Pharmaceutical
and
Biomedical
Analysis
jou rn al h om e p a g e :w w w . e l s e v i e r . c o m / l o c a t e / j p b aImproved
prediction
of
tablet
properties
with
near-infrared
spectroscopy
by
a
fusion
of
scatter
correction
techniques
Puneet
Mishra
a,∗,
Alison
Nordon
b,
Jean-Michel
Roger
c,daWageningenFoodandBiobasedResearch,BornseWeilanden9,P.O.Box17,6700AA,Wageningen,TheNetherlands
bWestCHEM,DepartmentofPureandAppliedChemistryandCentreforProcessAnalyticsandControlTechnology,UniversityofStrathclyde,Glasgow,G1
1XL,UnitedKingdom
cITAP,INRAE,MontpellierSupAgro,UniversityMontpellier,Montpellier,France dChemHouseResearchGroup,Montpellier,France
a
r
t
i
c
l
e
i
n
f
o
Articlehistory: Received22August2020
Receivedinrevisedform6October2020 Accepted7October2020
Availableonline10October2020 Keywords: Multiblock Fusion Spectroscopy Pre-processing Multivariate
a
b
s
t
r
a
c
t
Near-infrared(NIR)spectraofpharmaceuticaltabletsgetaffectedbylightscatteringphenomena,which masktheunderlyingpeaksrelatedtochemicalcomponents.Oftenthebestperformingscattercorrection techniqueisselectedfromapoolofpre-selectedtechniques.However,thedatacorrectedwithdifferent techniquesmaycarrycomplementaryinformation,hence,useofasinglescattercorrectiontechniqueis sub-optimal.Inthisstudy,theaimistoprovethatNIRmodelsrelatedtopharmaceuticalscandirectly benefitfromthefusionofcomplementaryinformationextractedfrommultiplescattercorrection tech-niques.Toperformthefusion,sequentialandparallelpre-processingfusionapproacheswereused.Two differentopensourceNIRdatasetswereusedforthedemonstrationwheretheassayuniformityand activeingredient(AI)contentpredictionwastheaim.Asabaseline,thefusionapproachwascompared topartialleast-squaresregression(PLSR)performedonstandardnormalvariate(SNV)correcteddata, whichisacommonlyusedscattercorrectiontechnique.Theresultssuggestthatmultiplescatter cor-rectiontechniquesextractcomplementaryinformationandtheircomplementaryfusionisessentialto obtainhigh-performancepredictivemodels.Inthisstudy,thepredictionerrorandbiaswerereducedby upto15%and57%respectively,comparedtoPLSRperformedonSNVcorrecteddata.
©2020TheAuthor(s).PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCC BY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/).
1. Introduction
Near-infrared(NIR)spectroscopyiswidelyusedforrapidand non-destructive analysis of pharmaceutical tablets [1,2]. How-ever,themainchallengewithNIRspectroscopyisrelatedtodata modelling,whichcaneasilybecomesub-optimalifnoproper inves-tigationisperformedduringthepre-processingstage[3–5].The interactionofNIRlightwithsamplesresultsintwomajor opti-calphenomena,i.e.,absorptionand scattering[6].Absorption is theresultofspecificchemicalmoleculesabsorbingcertain wave-lengths of light, whereas scattering is a result of the complex interactionoflightwiththephysicalstructureofsamples[6,7].NIR dataarewidelyaffectedbylightscattering,whichcanbeidentified asadditiveandmultiplicativeeffectsinspectra[6,8].
TraditionaldataanalysisrelatedtoNIRdatamodellinginvolves explorationofdifferentscattercorrectiontechniquestoremovethe
∗ Correspondingauthor.
E-mailaddress:[email protected](P.Mishra).
scatteringeffectsfromthespectrasuchthattheabsorption charac-teristicscanbeusedforproperestimationofchemicalcomponents [9].However,recentlyMishraetal.,(2020)demonstratedthatthe NIRmodellingshouldnotaimtoselectasinglescattercorrection techniquebutshouldutilisetheinformationextractedbyseveral scattercorrectiontechniques[5,10,11].Thisisbecausedifferent scattercorrectiontechniquesremovetheeffectsoflightscattering differentlyandmayrevealcomplementaryinformation,whichif modelledcanleadtohigherperformancemodels[5]. Complemen-taryinformation fromdatapre-processed withdifferentscatter correctiontechniquescanbefusedusingtherecentlydeveloped sequentialandparallelpre-processingfusionapproaches[10,12].
Inthisstudy,theaimistoprovethatNIRmodelsof pharmaceuti-calproductscandirectlybenefitfromthefusionofcomplementary informationextractedfrommultiplescattercorrectiontechniques. Toperformthefusion,sequentialandparallelpre-processingfusion approaches wereused.TwodifferentopensourceNIRdatasets wereusedforthedemonstrationwherethepredictionof assay uniformityandactiveingredient(AI) contentwastheaim.Asa baseline,thefusionapproachwascomparedtopartialleast-squares
https://doi.org/10.1016/j.jpba.2020.113684
2 P.Mishra,A.NordonandJ.-M.Roger/JournalofPharmaceuticalandBiomedicalAnalysis192(2021)113684
regression(PLSR)performedonstandardnormalvariate(SNV) cor-recteddata,whichisacommonlyusedscattercorrectiontechnique.
2. Materialsandmethods
2.1. Dataset
Twoopensourcetabletdatasetswereusedforthisstudy.The firstwasthe‘NIRshootout2002’datasetpublishedbythe Inter-nationalCouncil for Near-infrared Spectroscopy.The datawere downloadedfromtheofficialwebsiteofEigenvectorResearchInc, USAinMATALABreadableformat(http://www.eigenvector.com/ data/tablets/index.html).Thedatainthedownloaded.matfile con-tainedspectrafrom twoinstruments.In this study,thespectra correspondingonlytoinstrument1wasused.Intotal,654tablets weremeasuredwithNIR(600–1898nm)andreferenceanalysis (assayuniformity).TheseconddatasetcomprisedNIRtransmission datarelatedtomeasurementof310tabletsandthe correspond-ingactiveingredient(AI)content(%).Thedatawereobtainedfrom http://www.models.life.ku.dk/Tabletsandwerethesameas pre-sentedintheoriginalwork[13]. Thespectral rangeofthisdata setwas7398–10507cm−1.Forbothdatasets,thesampleswere dividedintocalibration(60%)andtest(40%)setusingthe Kennard-Stonealgorithm[14].
2.2. Dataanalysis
2.2.1. Scattercorrectionmethods
Inthestudy,fourofthemostcommonlyusedscattercorrection techniqueswereselected[5].Thetechniqueswere2ndderivative [4], variablesorting for normalization (VSN) [8], standard nor-malvariate(SNV)[15]andmultiplicativescattercorrection(MSC) [7]. The second derivativeestimation wasperformed usingthe Savitzky-Golaymethodwitha2ndorderpolynomialand21-point smoothing filter. All the pre-processing methods were imple-mentedasdiscussedin[4]andinMATLAB,Natick,MA,USA. 2.2.2. Partialleast-squaresregression
PLSRisacommonregressionmethodusedforNIRdata mod-elling[16].BymaximisingthecovarianceoftheNIRdatawiththe response(s),PLSRidentifiesthesubspaceoflatentvariables(LVs) onwhichthehigh-dimensionaldatacanbetransformedtogive alow dimensioninformation concentratedspace.This
transfor-mationguaranteesthat thedataarerelevant forpredictingthe responsevariables.Inthestudy,PLSRwasperformedusing MAT-LAB’sbuilt-infunction‘plsregress’,witha10-foldcrossvalidation procedureapproachusedtoselecttheoptimalnumberoflatent variables(LVs).
2.2.3. Sequentialandparallelpre-processingthrough orthogonalisation
Thesequentialandparallelpre-processingthrough orthogonali-sation(SPORTandPORTO)approachesareinspiredfrommultiblock dataanalysisinchemometrics[12].Thesequentialapproachcalled SPORTisbasedonsequentialorthogonalizedpartialleast-squares regressionandthePORTOapproachonparallelorthogonalized par-tialleast-squaresregression.AschematicoftheSPORTandPORTO approachesispresentedinFig.1.InSPORT(seeFig.1A),initially, aPLSregressionmodelisfittedbetweenYandtheXblockafter applicationofthefirstpre-processingmethod,andthescoresfor thefirstblock(T1)areobtained.Then,theXblockafter applica-tionofthesecondpre-processingmethodisorthogonalizedwith respecttothescores(T1)ofthefirstregression.Thenthe residu-alsofYarefittedtotheorthogonalizedXblockafterapplication ofthesecondpre-processingmethodandthescores(T2)are esti-mated.Theprocedureiscontinuedforasmanyblocks(4inthis case)astherearepre-processingmethods.Inthiswork,theorder ofapplicationofthescattercorrectionmethodswas2ndderivative, VSN,SNVandMSC,makingatotalof4blocksofdata.Inthecase ofPORTO,acombinationofPLSregression,generalizedcanonical analysis(GCA)andmultipleorthogonalizationstepsareperformed withtheaimofextractingthecommonanddistinctinformation presentedindatapre-processedwithdifferentscattercorrection methods.TheconceptofPORTOistoidentifycommonand dis-tinctinformationasshowninFig.1B.Thethreecirclesrepresent datapre-processedwiththreedifferentscattercorrection meth-odsandthelettersDandCindicatethedistinctandthecommon information,respectively.
Optimisingthe number of LVs is highly important for both SPORTandPORTO.InthecaseofSPORT,allpossiblecombinations of LVswereexploredwith theoptimum numberchosen based onthelowestRMSECV.InPORTO,severallocalcross-validations (CVs)wereperformedinsequence,asdiscussedin[17].SPORTwas implementedwiththealgorithmpresentedin[12]andwithfreely availablemultiblockdataanalysistoolbox[18].PORTOwas imple-mentedusingthemulti-blockdataanalysiscodesfromNOFIMA
Fig.2.SummaryofPLSRappliedtoSNVpre-processeddata,SPORTandPORTOmodels.(A)PLSRpredictionforassay(mg),(B)PLSRpredictionforAPIcontent(%),(C)SPORT predictionforassay(mg),(D)SPORTpredictionforAPIcontent(%),(E)PORTOpredictionforassay(mg),and(F)PORTOpredictionforAPIcontent(%).
(https://nofima.no/en/)fortheimplementationofparallel orthog-onalisedpartialleast-squares(POPLS).Allanalysiswasperformed inMATLAB2017b(TheMathWorks,Natick,USA).
3. Results
3.1. PLSRmodellingvsSPORTvsPORTO
TheresultsfromPLSR withSNVpre-processed dataandthe pre-processingfusionapproaches(SPORTandPORTO)areshown inFig.2.Itcanbeseenthatfusionofinformationfromdifferent scattercorrectionmethodsimprovedthepredictiveperformance ofthemodels.Inthecaseofpredictionofassayuniformitywith NIRdata (Fig.2A,C,E),theSPORT approach decreased theerror and biasby 9% and 47 %,respectively, compared toPLSR with
SNVpre-processeddata.ThePORTOapproachfurtherdecreased theerrorandbiasby 15%and 57%,respectively,compared to PLSRwithSNVpre-processeddata.ForpredictionoftheAI con-tent with NIR data(Fig. 2B,D,F), the main improvements were observedinerrorreductionwitha6%and3%decreaseusingSPORT andPORTOapproach,respectively.Theimprovementwasattained withthesamenumberofLVsasforthePLSRmodelconstructed usingSNVpre-processeddata(3LVs).Further,inallcasesthebest modelswereobtainedusingLVsfromdatacorrectedusing mul-tiplescattercorrectionmethods,explainingthatcomplementary informationisbeingextractedandmodelledbythepre-processing fusionapproaches(PORTOandSPORT).
com-4 P.Mishra,A.NordonandJ.-M.Roger/JournalofPharmaceuticalandBiomedicalAnalysis192(2021)113684
Fig.3.LoadingscorrespondingtoPLSRappliedtoSNVpre-processeddata(A), SPORT(B)andPORTO(C)modellingapproaches.(A)Threeloadingswereextracted byPLSRforoptimalmodelling,(B)threeloadingswereextractedbySPORTapproach (2fromVSNpre-processeddataand1fromSNVpre-processeddata),and(C)three componentswereextractedbythePORTOapproach(1commoncomponentforall pre-processingmethodsand2distinctcomponentsfrom2ndderivative).
paredtotheloadingsofstandardPLSR,theloadingsofSPORTand PORTOarelessnoisy,andespeciallyinthecaseofPORTOthe load-ingweightsinthespectralregionaround10500cm−1arecloseto 0withtheregion<9000cm−1,whicharisesfromtheC≡N over-tonesfromtheactiveingredient,havingthehighestloadings[13]. Suchefficientmodellingcanbeunderstoodasthereasonforthe improvementinthemodelperformance.
4. Conclusions
This study has proved that NIR models of pharmaceuticals tabletscan beimprovedwithafusion ofcomplementary infor-mation present in data pre-processed using different scatter correctionmethods.Bothsequential and parallelapproaches to pre-processingcanbeusedforthefusion.Inthisstudy,the paral-lelapproachperformedslightlybettercomparedtothesequential approach.Theerrorandbiasdecreasedbyupto15%and57%, respectively.Basedontheresultsobtained,itisrecommendedthat fusion of multiplescattercorrection techniquesshouldbe
per-formedratherthanspendingtimeonexploringandidentifyingthe bestscattercorrectiontechnique.
CRediTauthorshipcontributionstatement
PuneetMishra:Conceptualization,Datacuration,Investigation.
AlisonNordon:Formalanalysis,Writing-review&editing. Jean-MichelRoger:Formalanalysis,Writing-review&editing.
DeclarationofCompetingInterest
Theauthorsdeclarethattheyhavenoknowncompeting finan-cialinterestsorpersonalrelationshipsthatcouldhaveappearedto influencetheworkreportedinthispaper.
References
[1]L.M.Kandpal,J.Tewari,N.Gopinathan,J.Stolee,R.Strong,P.Boulas,B.-K.Cho, QualityassessmentofpharmaceuticaltabletsamplesusingFouriertransform nearinfraredspectroscopyandmultivariateanalysis,InfraredPhys.Technol. 85(2017)300–306.
[2]M.Alcalà,J.León,J.Ropero,M.Blanco,R.J.Roma ˜nach,Analysisoflowcontent drugtabletsbytransmissionnearinfraredspectroscopy:selectionof calibrationrangesaccordingtomultivariatedetectionandquantitationlimits ofPLSmodels,J.Pharm.Sci.97(2008)5318–5327.
[3]Å.Rinnan,Fvd.Berg,S.B.Engelsen,Reviewofthemostcommon
pre-processingtechniquesfornear-infraredspectra,TracTrendsAnal.Chem. 28(2009)1201–1222.
[4]J.-M.Roger,J.-C.Boulet,M.Zeaiter,D.N.Rutledge,Pre-processingMethods, ReferenceModuleinChemistry,MolecularSciencesandChemical Engineering,Elsevier,2020.
[5]P.Mishra,A.Biancolillo,J.M.Roger,F.Marini,D.N.Rutledge,Newdata preprocessingtrendsbasedonensembleofmultiplepreprocessing techniques,TracTrendsAnal.Chem.(2020),116045.
[6]C.Pasquini,Nearinfraredspectroscopy:amatureanalyticaltechniquewith newperspectives–areview,Anal.Chim.Acta1026(2018)8–36.
[7]T.Isaksson,T.Næs,Theeffectofmultiplicativescattercorrection(MSC)and linearityimprovementinNIRspectroscopy,Appl.Spectrosc.42(1988) 1273–1284.
[8]G.Rabatel,F.Marini,B.Walczak,J.-M.Roger,VSN:Variablesortingfor normalization,J.Chemom.34(2020)e3164.
[9]J.Gerretzen,E.Szyma ´nska,J.J.Jansen,J.Bart,H.-J.vanManen,E.R.vanden Heuvel,L.M.C.Buydens,Simpleandeffectivewayfordatapreprocessing selectionbasedondesignofexperiments,Anal.Chem.87(2015) 12096–12103.
[10]P.Mishra,J.M.Roger,D.N.Rutledge,E.Woltering,SPORTpre-processingcan improvenear-infraredqualitypredictionmodelsforfreshfruitsand agro-materials,PostharvestBiol.Technol.168(2020),111271.
[11]P.Mishra,F.Marini,A.Biancolillo,J.-M.Roger,Improvedpredictionoffuel propertieswithnear-infraredspectroscopyusingacomplementary sequentialfusionofscattercorrectiontechniques,Talanta(2020),121693.
[12]J.-M.Roger,A.Biancolillo,F.Marini,Sequentialpreprocessingthrough ORThogonalization(SPORT)anditsapplicationtonearinfraredspectroscopy, Chemom.Intell.Lab.Syst.199(2020),103975.
[13]M.Dyrby,S.B.Engelsen,L.Nørgaard,M.Bruhn,L.Lundsberg-Nielsen, Chemometricquantitationoftheactivesubstance(ContainingC≡N)ina pharmaceuticaltabletusingnear-infrared(NIR)transmittanceandNIR FT-Ramanspectra,Appl.Spectrosc.56(2002)579–585.
[14]R.W.Kennard,L.A.Stone,Computeraideddesignofexperiments, Technometrics11(1969)137–148.
[15]R.J.Barnes,M.S.Dhanoa,S.J.Lister,Standardnormalvariatetransformation andde-trendingofnear-infrareddiffusereflectancespectra,Appl.Spectrosc. 43(1989)772–777.
[16]W.Saeys,N.N.DoTrong,R.VanBeers,B.M.Nicolai,Multivariatecalibrationof spectroscopicsensorsforpostharvestqualityevaluation:areview, PostharvestBiol.Technol.158(2019).
[17]I.Måge,E.Menichelli,T.Næs,PreferencemappingbyPO-PLS:separating commonanduniqueinformationinseveraldatablocks,FoodQual.Prefer.24 (2012)8–16.