Any correspondence concerning this service should be sent to the repository administrator:
staff-oatao@inp-toulouse.fr
Identification number: DOI : 10.1016/j.compchemeng.2014.09.009
Official URL:
http://dx.doi.org/10.1016/j.compchemeng.2014.09.009
This is an author-deposited version published in:
http://oatao.univ-toulouse.fr/
Eprints ID: 12221
To cite this version:
Heintz, Juliette and Belaud, Jean-Pierre and Pandya, Nishant and Teles dos
Santos, Moises and Gerbaud, Vincent Computer aided product design tool for
sustainable chemical product development. (2014) Computers & Chemical
Engineering, vol. 71 . pp. 362-376. ISSN 0098-1354
O
pen
A
rchive
T
oulouse
A
rchive
O
uverte (
OATAO
)
OATAO is an open access repository that collects the work of Toulouse researchers and
makes it freely available over the web where possible.
Computer
aided
product
design
tool
for
sustainable
product
development
Juliette
Heintz
a,b,3,
Jean-Pierre
Belaud
a,b,
Nishant
Pandya
a,b,2,
Moises
Teles
Dos
Santos
a,b,1,
Vincent
Gerbaud
a,b,∗aUniversitédeToulouse,INP,UPS,LGC(LaboratoiredeGénieChimique),4alléeEmileMonso,F-31432ToulouseCedex04,France bCNRS,LGC(LaboratoiredeGénieChimique),F-31432ToulouseCedex04,France
Keywords:
Computeraidedproductdesign Geneticalgorithm
Moleculargraph Bio-basedmolecule Sustainableproductdesign
Acomputeraidedproductdesign(CAPD)toolisproposedthatfindsmixturesmatchingtargetproperties. Geneticalgorithmcrossoverandmutationoperatorsarecompletedwithinsertionordeletionoperators adaptedforsidebranches.Anewsubstitutionoperatorisdevisedforcyclicmolecules.Themixture fitnessisevaluatedbyaweightedsumofpropertyperformances.Moleculesarerepresentedbymolecular graphs.Theyaresplitintomolecularfragmentswhicharebuiltfrompolyatomicgroups.Moleculesor molecularfragmentscanbefixed,constrainedorleftfreeforbuildinganewmolecule.Buildingblocksare chemicalfunctionalgroupsorbio-sourcedsynthons.Aspecificcodingofhydrogen-suppressedatomsis devisedthatcanbeusedwithvariouspropertyestimationmodelswhereatomconnectivityinformation isrequired.Illustrationisprovidedthroughthreecasestudiestofindlevulinic,glycerolandbio-based derivativesassubstituteforchlorinatedparaffin,methylp-coumarateestersolventandblanketwash solvent,respectively.
1. Introduction
Thechemicalindustriesareonthefrontlineofsustainable devel-opmentduetothepotentialimpactontheenvironment,healthand safetyofitsproductandprocessactivities.Regulationssuchasthe EuropeanREACH(REACH,2006)andVOC(VOC,2004)directivesor thekeeninterestofconsumersforeco-labelledproductspushthe chemicalindustriestoreconsidertheproductswhichtheyuseand produce.
In Europe, the cost ofregistering chemicals tocomply with REACHcouldexceedD 2.1 billion,basedonabout30,000 sub-stances(ECHA,2012).Thereforethereisastrongincentivetofind substitutemoleculesandchemicalproducts.Newproductsneed
∗ Correspondingauthorat:CNRS,LGC(LaboratoiredeGénieChimique),F-31432 ToulouseCedex04,France.Tel.:+33534323651.
E-mailaddress:Vincent.Gerbaud@ensiacet.fr(V.Gerbaud).
1 Currentaddress:DepartmentofChemicalEngineering,PolytechnicSchoolof theUniversityofSãoPaulo,AvenidaProfessorLineuPrestes,05088-900SãoPaulo, Brazil.
2 Currentaddress:ShroffS.R.RotaryInstituteofChemicalTechnology,BlockNo. 402,At&PO–Vataria,Bharuch,Gujarat393001,India.
3 Currentaddress:ProsimSA,51rueAmpère,ImmeubleStratègeA,31670Labège, France.
toobeyenvironmental,healthandsafetyconstraintsinadditionto usualproductandprocessrequirements.Economistshaveargued thatadoublygreenchemistryperspectiveprevailsamongchemical industryengagedingreenactivities:onegreenforthereductionof theirimpactsonenvironmentandonegreenfortheuseof renew-ablerawmaterials(Garnieretal.,2012).Thefirstperspectiveis adirecttranscriptofthedefinitionofsustainablegrowthinthe foundingBrundtland1987report.Thesecondistheseventh prin-cipleofgreenchemistry(AnastasandWarner,1998).Asitshould allowsustainableissues liketoxicityor degradabilitytobemet moreeasily,theuseofbio-sourcedmoleculesorsynthonsisamajor stimuluswhenlookingforanewproduct.
Forfindingasubstitutionmolecule,theusual‘trialanderror’ approach seemsinefficientunless highthroughputscreening is used. Instead, reverse engineering approaches, like Computer AidedMolecularDesign(CAMD)arefittohandleseveral proper-tiesandtoproposemolecularstructuresmatchingthetargetvalues oftheseproperties.Insomecases,theproblemofsubstitutinga moleculemayresultinproposingamixture.Thisfurtherbrings forththechallengeofcomputingmixturepropertieswhichmay notalwaysobeyalinearmixingrule.
Thispaperpresentsa ComputerAidedMolecularDesigntool and its tailoring for finding alternative bio-sourced molecules andmixtures,withthehelpofmodeldrivenengineering(MDE)
concepts.TheCAPDtoolfollowsthegeneralmethodologyofCAMD toolwithseveralmodifications.Byusingageneticalgorithm,this toolsimultaneouslyoptimizesthemolecularstructureofthe com-ponentsandtheircompositionsinthemixtureinordertobestfit thedesiredpropertiesatnormaloperatingconditionssetbythe user.
After a section devoted to present background information relatedtoCAMD,wedescribethedatastructuresandmethods. They concernmolecular representation, atomcoding, fragment builder along withspecificgeneticoperators tobuildor delete sidechemicalbranchesandtoenhancechangesinaromaticrings whilekeepingtheiraromaticity.Theirimplementationintoathree software-componenttoolisthenpresentedusingMDEconcepts. Threecase studiesare presented to illustratesomeof the fea-tures ofthetool:mixturesearch(case1),search ofa molecule withpredefinedbio-sourcedsynthons(case2),twolevelsearch (case3).
2. Background
Computer Aided Molecular Design (CAMD) aims at finding moleculesthatsatisfyasetofpropertytargetsdefinedinadvance (Achenie et al., 2003). CAMD relies upon four main concepts, namely,amolecularrepresentationmodel,asetofproperty cal-culationmodels,a solvingmethodanda performancecriterion. Candidatemoleculescanbesearchedinadatabaseorbuiltfrom chemicalgroups.Theirfitnessisevaluatedthankstoproperty esti-mationmodelsbycomparingthevaluesofestimatedpropertyand thetargetproperty.Thentheyarediscriminatedaccordingtotheir performanceandeithermodified,keptasisorrejected,withthe helpofthesolvingalgorithm.Duringtheproblemsetting,in addi-tiontotheinitialdefinitionofthepropertytargetvalues,chemical blocksarepre-selectedtobeusedinthemolecularconstruction.
The CAMD problemsolving methodhasoften been tailored toaspecificrepresentationmodel.Theearly“generateandtest” methodwasdevelopedforasetofchemicalgroupsthatwerealso usedbythegroup-contributionpropertyestimationmethod(Gani etal.,1991;Constantinouetal.,1996).Avectorofgroupsandtheir occurrencesdescribedcandidates.However,asinglevectormay correspondtoseveralisomermolecules andin thiscase a final stepisrequiredtogeneratethetruemolecules.Toovercomethis, somerepresentationdescribingexplicitlythegroup interconnec-tionshavebeenused:ageneticalgorithmwithadaptedoperators wasused to generatepolymers witha symbol string encoding (Venkatasubramanianetal.,1994),abinaryrepresentationofatom connectivityinmoleculeswasusedwithaMILNPmethod(Churi andAchenie,1996),anadjacencymatrixwasusedwithasimulated annealing(OuriqueandSilvaTelles,1998),agraphrepresentation wasusedwithTABUsearch(Linetal.,2005)andrecentlya graph-basedrepresentationissuedfromsignaturedescriptorswasused withageneticalgorithm(HerringandEden,2014).Theseexplicit representationsofmoleculearefitformanykindsofproperty esti-mationmethodsoncearoutineforfindingthegroupsordescriptors ofthecorrespondingestimationmethodisprovided.
Regardingthefitnessofacandidatemolecule,thedifferences betweenthepredictedandtargetvaluesofallpropertiesare aggre-gatedinaglobalobjectivefunctionthrougheitheranarithmetic mean(VaidyanathanandEl-Halwagi,1996)orageometricmean (DelCastilloetal.,1996).Thegeometricmeanpenalizesseverely the fitness when an individual property prediction/estimation methodisfarfromtarget.Inthatwayitismorediscriminantthan thearithmeticmean.
Theevaluationoftheperformanceofeachmoleculereliesupon the calculation of property values that have beenclassified as product-properties,process-relatedpropertiesandusage-related
properties (Costa et al., 2006). Product attributes found desir-ableorundesirablebyconsumersbelongtothelatterclass.For theCAMDproblem, productrequirementshavetobetranslated intotargetpropertyvalues,whichhavebeendonebyusing prob-lemtemplates(Matteietal.,2014a,b).Mostproductandprocess propertiesareusuallydescribedbygroupcontributionmethods (JobackandReid,1987;ConstantinouandGani,1994;Martinand Young,2001;MarreroandGani,2001,2002;Nannoolaletal.,2004, 2007;Hukkerikaretal.,2012)orQSAR/TItopologicalindex/QSPR methods(VeithandKonasewich,1975;Karelsonetal.,1996;Gani etal.,2005;ChemmangattuvalappilandEden,2013).Some envi-ronmental,healthandsafety(EHS)propertieslikeR-phraseorCMR classificationaredescribedbysimilaritymethods,relyinguponthe findingofspecificmolecularpatternsinmolecules(Gallenos,2006). The problem of designing a mixture is referred to as Com-puterAided ProductDesign(CAPD)whereindividualmolecules withinthemixtureandtheircompositionmustbefound. Some CAMDmethodshavebeenextendedtoCAPDwithanadditional composition search (Klein et al., 1992; Gani and Fredenslund, 1993;VaidyanathanandEl-Halwagi,1996;DuvediandAchenie, 1997;ChuriandAchenie,1997;SinhaandAchenie,2003). Over-all, CAPD raises new issues compared to CAMD: firstly, more propertieshavetobematched,includingmoreusage-related prod-uctpropertiesorthemixturestability.Secondly,severalmixture propertymodelssuchasboilingpointandflashpoint,exhibit non-linearmixingrulesandneedtobesolvedwithbuilt-inroutines, whichmayincreasethecomputationtime.Thirdly,some usage-relatedpropertiesmaynotbedescribedbyanysuitableprediction model.
Severalapproaches havebeentakentosolveCAPDproblem: somehaveperformedasequentialsearchofeachmixture com-ponentsindividually,beforecheckingmixtureproperties,stability and composition (Gani, 2004; Conteet al., 2011;Papadopoulos etal.,2013;Matteietal.,2014a,b),someothershavedone decom-positionof theprobleminto asetof subproblems(Karunanithi et al., 2005), while somehave solved theproblem globally for a givenapplication, forexample polymerblends(Vaidyanathan andEl-Halwagi,1996).Aspartofa methodologyforthedesign of formulated products, Gani and co-workers (Conte and Gani, 2011;Conte etal.,2011;Matteietal.,2014a,b)have conceived theVirtual Product-ProcessDesign Laboratory.Theyproposeto runsequentiallyadesignscenariowithinacomputeraidedstage: selectaproblemtemplateandtranslateproductneedsinto prop-erties(Matteietal.,2014a,b),chooseanactiveingredientofthe productfrom thedatabase,thendesign thesolvents withtheir MIXDalgorithmeitherfromapre-definedlistorgeneratedwith aCAMDtool(Conte,2010)andthenaddadditivesfromanother listand finallyend upwiththeoptimizationofcomposition.To escapethecomputer-aidedstage,averificationscenarioisrunwith more accuratemodels,possiblyinvolving model developments. Anultimateexperimentalvalidationendsthedesignactivity.For overcomingtheproblemofconsumerattributesnotdescribedby models,Solvasonetal.(2009)combinedanenumeratingCAMD techniqueandMDOE(mixturedesignofexperiments)technique. Illustratedwiththeformulationofarefrigerantmixture,theyfirst solveareverseformulationproblemtofindpropertyrelationsthat matchuser-defined attributes.Thoserelationsarethen usedas targetofareverseproblemaimingatfindingthesuitablemixture.
3. Methodsanddatastructures
WehavedevelopedaCAPDtool,namedasIBSS(IntegratedBio SourcedSearch).ItfollowsthegeneralmethodologyofCAMDtools andisaimedatfindingmixturesinwhichsomemoleculemaybear bio-sourcedfragments.Theproblemoffindingasinglemolecule
ishandledasamixturewithoneelement.Themethodsanddata structuresdevelopedtocopewiththattailoringarenowpresented.
3.1. Optimizationproblem
TheCAPDproblemismulti-objectivesinceseveralproperties mustbematched.Itistransformedintoasingle-objective prob-lem,aimingatmaximizingaglobalperformance,GloPerf,described byanobjectivefunctionOF,subjecttokequalityconstraintsandi
inequalityconstraintsonpropertytargetsP.Itcanbemodelledas follows:
OF=max(GloPerf(MGi,zi,condj))
s.t. Pk(MGi,zi,condj)=Pk,fixed
Pl,lowerbound≤Pl(MGi,zi,condj)≤Pl,upperbound
s.t. constraintson MGi,zi,condj
(1)
Theoptimizationvariablesarethemolecular graphstructure
MGioftheindividualimixturecomponents,themixture
compo-sitionziand jconditions condj.Theconditions, condj,affectthe
performancecalculationbyimposingconditionsunderwhichthe propertiesarecalculated.
Theoptimizationvariablescanbeconstrainedtoallowtheuser totailortheproblem:thecompositionofanymolecule,ziand
con-ditioncondjcanbefixed,boundedorfree.Forexample,theuser
canimposemolefractionofaningredient,specifyaphysicalstate ofthemoleculeordefinetherangeofoperatingconditions.Any moleculeMGiofthemixturecanbefixed(ex.anactiveingredient),
sourcedfromalistofmolecule(ex.alistofadditivesorsolvents)or leftfreeforoptimization.Inthatlattercase,oneormorechemical fragmentscanbefixedortakenfromalistoffragmentstodesignthe molecule(ex.toimposearenewablematerialderivativefragment inthemolecule).
Theglobalperformance,GloPerf,isformulatedastheproduct ofapenaltyfunctionandofaweightedsumofnpindividual per-formancePropPerfpwithweightwpwithrespecttoeachproperty
target.
GloPerf(MGi,zi,condj)=min ır=1(ır ·(1−Penalr)) ·
P
np p=1wp·PropPerfp(MGi,zi,condj)P
np p=1wp (2)The penalty function min (ır·(1−Penalr)) is related to user
definedrules.Eachrulercontainsdatarelatedtoamolecular pat-terndescribedas anopened moleculargraph andis assigned a penaltypercentagePenalr.ırisequalto1iftherthruleisviolated,
0otherwise.Typicalrulesdescribeunrealisticstructuresfromthe chemicalsynthesispointofview,ormolecularpatternsthatare correlatedwithtoxicity.
EachindividualperformancePropPerfpforthepropertyp,
com-paresthepredictedvaluexwiththetargetedvalueP.Theuser canselectamongmathematicalfunctionsF(x)asshowninTable1: Gaussian(Venkatasubramanianetal.,1994),desirabilityfunctions (DelCastilloetal.,1996)orstraightfunctions.
CAMDsolutionrobustnessisshatteredbythepropertymodel predictionuncertainty.Solutionshave beenproposedinthe lit-erature,liketheuseoffuzzylogicoperatorstodefineupperand lowerboundedpropertyrangesassociatedtodegreesof satisfac-tion(Ngetal.,2014),ascanbedoneherewiththestraightfunction representation. Alternatively,the knowledgeof property model uncertainty for some group contribution methods (Hukkerikar etal.,2012)canbeusedtodefinetheTolparameterintheGaussian functionrepresentation.
Table1
Propertyperformancefunctions.
3.2. Thesearchalgorithm
Thesearchalgorithmselectedisageneticalgorithmwithelitism policyasearlierproposedbyVenkatasubramanianet al.(1994) inCAMD.Modificationoperatorsareaddedtoalter themixture composition,conditionsandmoleculesandtoperformamultilevel search.Thepopulationsize,theelitismvalue,thenumberoflevel andalltheprobabilitiesofoperatorsaredefinedbytheuser.
The initial population of individuals is generated randomly withinthepredefinedconstraintsontheoptimizationvariables relatedtoMGi,zi,condj.Themethodforbuildingfragmentsfrom
chemicalbuildingblocksisdescribedlater.
TheCAPDsearchcanbeperformedinseveralsequential lev-els(Harperetal.,1999;Korichietal.,2008).Atlowlevel,simple and/orfast-computingpropertypredictionmodelsareusedover alargepopulation. Thenasthelevel increments,morecomplex and/ortime-consumingmodelsareusedoverasmallerpopulation originatedfromthefittestindividualsofthepreviouslevel popula-tion.Atthenextlevel,thesamesetofbuildingblocksandmolecular structuresiskept.Inthemeantime,theobjectivefunctioncanbe modifiedaccordingtotheuser’sinitialchoices:property estima-tionmodelscanbedropped,addedorsubstitutedbymorecomplex ones.
3.3. Mixturerepresentationdata 3.3.1. Mixturerepresentation
ThemixturestructureiscustomisableaspresentedinFig.1.Each mixtureisanassemblyofitemsandconditions.Eachitemcontains onemoleculeandonemolefractionvalue.Eachmoleculeisfurther splitintointerconnectedfragments.Thefragmentsarefurtherbuilt frombasicorcomplexfunctionalgroups.
Initially,theuserdefines themixture structure:thenumber ofmolecules,theirtype(fixed,listorfree)andcomposition con-straints.Foreachfreemolecule,hesetsthenumberoffragments, fragmenttype(fixed,listorfree)andfragmentinterconnections. Foreachfreefragments,hedefinesthebuildinggroupstobeused andtheirmaximumnumberDifferentbuilding blocklistcanbe usedfordifferentfragments.Amoleculemaycontainasinglefree fragment.Inthatcasethefragmenthasnoexternalconnections.
3.3.2. Molecularrepresentation
Wehaveselectedmoleculargraphforthemolecular represen-tationwhichisdescribedbyanadjacencymatrix(Achenieetal.,
Fig.1. Overviewofthemixturestructureanditssubstructures.
2003).Thediagonal elementsareeithera hydrogen-suppressed atom basicgroup, or a complex group or a fragment. The off-diagonal elements are bond type connections. A matrix with a diagonal that contains basic groups exclusively is called an extendedmoleculargraphhereafter.
A molecule graph isthe aggregationof itsfragmentgraphs. Fig.2describestheacetoinmolecule(3-hydroxybutanone)builtas astructureofthreeinterconnectedfragments.Fragment1isbuilt fromtwobasicgroupsandonecomplexgroup.Fragments2and3 arejustonebasicgroup.
Basicgroups(BGs)representahydrogen-suppressedatom,like O, OH, CH3, CH2,>CH2, NH3, N,etc.BGsareoften
simi-lartosomefirstordergroupsingroupcontributionmethods.They areassigneda“BG.ID”attribute.Itisan‘elementarygroup’integer EG=P1P2P3P4 thatisdisplayedintheextendedmoleculargraph
diagonal.P1referstotheatomicnumberprecededwitha1(106
forC,107forN,108forO,117forCl...),P2referstothehighest
bondorder,P3 tothetypeoftheatomattachedto,includingits
occurrencein anaromaticornon-aromaticcycle,andP4 tothe
number of implicit hydrogen atoms bonded to the atom. BGs alsobeara“bondvector”attributethatrepresentsthenumberof single,doubleandtriplebonds.e.g.forBG.shortformula=“=C<”, BG.ID=“106200”andBG.bondvector=“[2;1;0]”.
Complexfunctionalgroups(CGs)aremulti-atomgroups.They are useful for a compact description of multi-atom chemi-cal functions,like carboxylgroups, R COOH,nitrite R O N O, nitroR NO2,peroxyROOR′,esterRCOOR′,acetalsRCH(OR′)(OR′′),
sulfenylRSOR′...Anon-exhaustivelistofBGsandCGsalongwith
BG.IDisprovidedasSupplementarymaterial.Astheyarekeptintact whenapplyingmodificationoperators,complexgroupsare suit-abletodescribebio-sourcedmoleculederivativeorsynthonsand tokeeptheminthemoleculecandidates.
CGsinheritfromthebasicgroupattributes,butthe“ID”attribute hasnospecificmeaningandisassignedauniqueincrementalvalue 2xxxxxdefinedbytheuser.AdditionalCGsattributesarea molecu-largraph“CG.graph”describingthecomplexgroupintermsofbasic
groupsanda“CG.connectionVector”integerattributethat repre-sentsthenumberandlocationoftheexternalconnectionsofthe groupinthemoleculargraph(seeFig.2).
Fragments“Fgt”aredescribedasanadjacencymatrixandan externalconnectionvector.Theadjacencymatrixcontainsbasicor complexfunctionalgroupinformation(seeFig.2).Thevector rep-resentsthesingle,doubleortriplebondexternalconnections,and theirlocationonthefunctionalgroups.Similarbondtypeexternal connectionsononegrouparedistinguishedbyaletter.
3.4. Groupvector
Theclassificationoffunctionalgroups byConstantinouetal. (1996)isadaptedintogroupvectors“GV”toprovidethelistofbasic andcomplexfunctionalgroupsauthorizedbytheusertobuilda freefragmentwherekgroupsareallowed.AgroupvectorGVis representedbythefollowingway:
GV={N1,N2,...,Nn} (3) WhereNiisthenumberofgroupsinthefragmentthathave
iconnections,from1ton.Basicfunctionalgroupshaveupto4 connections.Somegroupswithsulphurandphosphorousatoms whereatomvalencecanbe6and5respectively,thoughtheyare describedwith4connections(seealistofbasicgroupsin Supple-mentarymaterial).Forcomplexgroups,thenumberofconnections
ncanbehigherthan4.Authorshaveusedsomegroupswith6 exter-nalconnections,especiallythosethataresourcedfromrenewable synthons.
Inaddition,thefollowingchemicalfeasibilityrulescomingfrom theoctetrulehold:
n
X
j=1 Nj=k (4) nX
j=1 Nj(2−j)=2m−extconnections (5)wheremisanumberequalto1minusthemaximumnumberof cyclesallowedbytheuserandextconnectionsisthenumberof externalconnectionsofthefragment.
3.5. Fragmentcreation
Themethodfor building afree fragmentfroma preselected listofbasicorcomplexfunctionalgroupsisdevelopedtoensure adiversityofstructures,inparticularheterocyclesandaromatic cycleswhichmayoftendisappearduringmodificationbyagenetic algorithm.
Theprocedureforconstructingafragmentisdescribedasan activitydiagraminFig.3whichisexplainedbelow:
(1)Initializationoftheuserparameters:minimumkminand max-imumkmaxnumberofBGsand/orCGsfunctionalgroupsina
fragment,themaximumnumberofcyclesmmax,thenumber ofexternalconnectionsforthefragmentextconnectionsand thepreselectedBGsandCGsorderedintoNjlists.
(2)Choiceofthefragmentparameters:avalueofkbetweenkmin
and kmax and a value of m between 1 and 1−mmax are
randomlyselected.Usingthosevalues,thepossiblegroup vec-tors are determined, e.g. assuming that groups with up to 4externalconnectionsexist(n=4)andk=6,fivegroup vec-torsarepossibleGV1={1,5,0,0},GV2={2,3,1,0},GV3={3,1,2,0},
GV4={3,2,0,1},GV5={4,0,1,1}.AGViispickedand usedasa
basisforthefragmentconstruction.Eachgroupvectorhasthe
sameprobabilitytobechosentoensureahigherdiversityof thegeneratedstructures.
(3)Additionofanewelementtothefragment:Atthispointeither anacyclicgrouporanentirecyclestructurecanbeadded.The decisionismaderandomlyconsideringthenumberofcyclesyet tobeconstructedandtheremainingnumberofelementsthat canbeinserted.Ifanacyclicgroupisselected,aNjisselected
randomlyfromGVi.ThenoneofthegroupsoftheNjlistispicked
andisadded.Step3isrepeateduntilallkgroupshavebeen added(gotostep6).Ifacycleisselectedgotostep4.
(4)Cyclebuilding:Thecyclesizeanditsaromaticityaredecided beforethecycleconstructionstarts.Thenalltheelementsthat formthecycleareinsertedoneaftertheotheruntilthecycle iscomplete.Sidebranchestothecycleareinsertedonlyonce thecycleisclosedtothecyclegroupsthatstillbearunsaturated externalconnections.Thiswaytheelementsofacycleare con-secutiveintheadjacencymatrixtomakethegrapheasierto handle.
(5)Fusedcyclebuilding:Attheendofstep4itispossibletoadd afusedcycle.Thedecisionismaderandomlyconsideringthe numberofcyclesyettobeconstructedandtheremaining num-berofelementsthatcanbeinserted.Thensizeandaromaticity aredecided.Fora non-aromaticfusedcycle,theattachment pointsofthefusedcyclearesearchedonthelastinsertedcycle anditsadjacentcycles.Acoupleofattachmentpointsare ran-domlychosenandtheshortestwaybetweenthesetwopoints is determined.Thispath isthenconsideredas apartofthe fusedcycletobuild.Thenthenumberofelementsstilltobe addedisrandomlychosenandtheelementsareaddedoneby one.Theprocesstoconstructanaromaticcycleisthesame withsupplementaryconstraints:theattachmentpointsmust beconsecutiveinthelastcycle;theymustbeconnectedwitha doubleboundandmustconcernaromaticgroups.Thisprocess goesonuntilthefragmentiscomplete(gotostep6).
(6)Fragmentcomplete:Allthegroupshavingbeenadded,the cor-respondingmolecularconsistencyistested.Ifso,thepossible complexgroupsareexpandedintobasicgroupstocreatethe expandedmoleculargraphofthefragment.
3.6. Moleculemodificationoperators
The operators used to alter the molecular graph adjacency matrixaresummarizedinFig.4: mutation,crossover, insertion, deletionandsubstitution.
Regardingtheinsertionanddeletionoperators,wehaveaddeda specificbranchconstructorordestructor.Wehavealsodeveloped anewsubstitutionoperatortoimprovethechancesofstructural changesinaromaticringswithoutbreakingtheiraromaticity.
• Themutationoperatorconsistsintherandomreplacementofa singlegroupbyagroupfromtheNireferencelistthatbearsthe
sameexternalconnections,e.g.>NHby>CH2bothwithtwosingle
bondsinFig.4.
• Thecrossoveroperatorconsistsinrandomlychoosingtwo iden-ticalnon-cyclicbondtypes(single,doubleortriplebound)intwo extendedmoleculargraphs.ThisischeckedwiththeP3valuein
theIDattributeofagroup.Thesemi-graphsarethenswitched andrecombinedtoformtwonewmolecules.
• Theinsertionoperatorconsistsintherandomadditionofagroup inthegraph.Wehavedevelopedthepossibilitytoinsertagroup thathasmorethantwoconnections,like CH<,thusenablingthe completionofthegraphwithanewbranch(Fig.4).
• Thedeletionoperatorconsistsintherandomremovalofagroup withatleasttwoconnectionsofthesametypeinthegraph.To beconsistentwiththeinsertionoperator,itispossibletodelete agroupthathasmorethantwoexternalconnections.Thismay
Fig.3.Activitydiagramofthecreationofafreefragmentobject.
possiblyinducethedeletionofasidebranchofthemolecule.In Fig.4,thegroup C<israndomlychosenfordeletion.Theextra branchesaredeletedandthetworemainingbranchesaredirectly reconnected.ThebranchNH isdeletedbecauseitisnot consis-tentwiththeremainingconnectiontype.Thusanothersuitable groupF isreconnectedinstead.
• Thenewsubstitutionoperatorcombinestheprinciplesof muta-tionandinsertionandconsistsinthereplacementofagroupby agroupthathasmoreconnections.Itwasdevelopedbecausethe otheroperatorsperformedpoorlyinmodifyingaromaticcycles withoutdestroyingtheiraromaticity:amutationoperatoralone wouldbeineffectivebecauseofthelimitednumberofaromatic
Fig.4.Moleculemodificationoperators.
groups,hereonlyenablingtoreplace CaromH by Narom and
conversely.Thecrossoveroperatorcannotbeappliedonrings. Theinsertionanddeletionoperatorswouldonlyallowaddingor deletingthearomaticheteroatoms O and NH ,tomaintain thearomaticity.
4. Implementation
Themethodsdescribedabovehavebeenimplementedinthe “IBSS” CAPD tool software prototype. The iterative process of theIBM-RUPsoftwaredevelopmentmethod(KrollandKruchten, 2003)wasapplied.It is centredonthearchitectureand driven bythefunctionalneedsthat shouldcover theCAPDtool.Those needswereexpressedbythepartnersoftheFrenchANRCDP2D 2009projectInBioSynSolv,aimingatdesigningbiosolvents.Those needshighlighted that thesustainability of thecandidate mix-ture may rise from the occurrence of bio-sourced fragments withinthemolecular structureand bytheuseof EHSproperty modelstoevaluatetheperformanceofeachcandidate.To speed-uptheimplementationprocess,ModelDrivenEngineering,MDE principleswerefollowedwiththehelpofUML2.0(Unified Mod-ellingLanguage)andBPMN(BusinessProcessModellingNotation) diagrams.It producedarchitectural, behavioural,functional and structuralUML2.0viewsthatarenowbrieflypresented.
4.1. Architecturalview
TheCAPDtoolisdevelopedascomponent-basedsoftware.Each ofthethreecomponents,‘Man–Machine-InterfaceMMI’;‘SEARCH’ and ‘P3’,is asoftware package thatencapsulatesa setof func-tionsanddataandcommunicatesthroughinterfacewiththeother components(Fig.5).Nexttothem,anXML-structureddatabase containsthebasicandcomplexgroups.
InthespiritofMDE,effectivecodingwasstartedaftertheCAPD tool architectureandneedsdefinitionwerewelladvanced. The firstMMIcomponentiswritteninjavaandaimsatprovidingan inputXMLfiletothesearchcomponent.Thesecondcomponent “SEARCH”isbuiltaroundanobject-orientedarchitectureand is writteninC#withintheVisualStudio® environment.Thethird
propertycalculationcomponent “P3”is a Dynamic-LinkLibrary writteninVB.NET.Itcontainsalibraryofpropertyestimation mod-elsandautomaticgroupfindersormoleculardescriptorroutines used totranslate themolecular graph information sent by the
searchcomponent intosuitableinputs fortheproperty estima-tionmodels.Asastandalonecomponent,variousinterfacemethods enabletheuseoftheP3componentwiththesearchcomponentor withotherindependentsoftware.Currently,thirtypropertiescan beestimatedfromtwentypropertyestimationmodelswhichare listedintheSupplementarymaterial.
4.2. Behaviouralview
Thebehaviouralviewpresentsthedifferentprocessesofthetool andhighlightsthecomponentsinteroperability(Fig.6).
TheinteroperabilitybetweentheMMIandtheSearch compo-nentsisasynchronousviaanXMLfileasinputandatextfilefor theresults.TheinteroperabilitybetweentheSearchandthe Prop-ertycalculationcomponentsissynchronousandWindows-Library like.TheXMLfileisgeneratedthroughtheMMI-componentand loadedinthesearchcomponent interface.Then, theResolution packagelaunchesthesearchalgorithm,ageneticalgorithmwith asinglelevelinFig.6.Firsttheinitialpopulationisgenerated.Then propertiesareevaluatedbythepropertycalculationcomponentto calculatetheperformanceofeachmixture.Thepopulationis mod-ifiedandevaluatedagainuntilastopcriterionissatisfiedputting anendtothesearch.Resultsarethensavedinatextformattothe MMIpackageofthesearchcomponent.Thedatainthesefilesare afterwardsdisplayedbytheMMIcomponent.
4.3. Functionalview
Thefunctionalviewhighlightsthepotentialusersandthemain functionalitiesofthesoftware,likethefunction“launchingaCAPD search”representedinFig.7.Incompliancewiththehierarchical decisionmakingprocessforsustainableproductdesignpresented elsewhere(Heintzet al., 2014), we distinguishthe expertuser fromthebasicuser.Theformercanaccessallparameters.Therole ofbasicuserisintended forpeoplewithmoderate expertisein propertyestimationandchemistry,typicallybusinessor techni-calmanagers.Thebasicuserhasaccesstofollowingfunctionality ofthetool:toselectpoolsofchemicalbuildingblocksclassifiedby rawmaterialsources,tochooseproductrequirementstobe con-sidered,todefinetheroughstructureoftheproductmixtureandto setgenericproductconstraints,likeafixedingredientwithinthe mixture.
Fig.5. CAPDtoolUMLcomponentdiagram.
Fig.6. BPMNdiagramofthethreeIBSScomponentsbehaviour.
Fig.7describestheusecasediagramofthe‘launchasearch’ functionality.Itgivesaccesstothedefinitionoftheproblemdata throughthreedatasubsetsanddescribesactionsavailabletothe user.
• Themixturedataarerelevanttothestructureofthemixtureand itscomponents:buildingblocks andcompositions.Withthese parameters,theusercancustomizethemixturebydefiningthe
possiblefixedpartsandthedegreesoffreedomofthedifferent variableparts.
• Theobjectivefunctiondataarerelatedtotheproperties,their targetvalues,theirestimationmodelsandtheconditionsusedto calculatetheseproperties.
• The genetic algorithm search parameters are data that can directlyinfluencethespeedandtheeffectivenessofthesearch: population size, elitism, modification operator probabilities, searchlevel.
Table2
Examplesofproductrequirementsandassociatedcalculableproperties.
Productrequirement Calculable property
Defaultpurecompound calculationmodel
Fluidity Viscosity Conteetal.(2008) Molecular
weight
Atomicweightsummation
Volatility Boilingpoint MarreroandGani(2001) Vaporpressure Riedel(1954)
Toxicity
Log(Kow) MarreroandGani(2002) −Log(LC50) MartinandYoung(2001) Log(BCF) VeithandKonasewich(1975)
Themixturedata shouldpreferablybesetbeforethe objec-tivefunctiondatabecausesomepropertycalculationmodelsmust bechosenforeach mixturecomponentwhich numbermustbe knownbeforehand.Thealgorithmdatacanbespecifiedatanytime. Anydatasetcanbesavedandretrievedindependently.Whenthe dataarecorrectlydefined,theapplicationofferstheusertorun thesearchalgorithm.Theresultsaredisplayedasalistofproduct candidatewhichtheusercanchoosetosave.
4.4. Structuralview
The structuralview concerns themodel abstractions, object classesandtheirrelationships.
• The mixture data structure implements the hierarchy described earlier in Figs. 1 and 2: mix-ture>composition+molecules+conditions>fragments>building blocks;alongwiththemethods likethefragmentconstructor displayedinFig.3.Aroutineisdevelopedtousethebasicand complexgroupsstoredinthe“Block”database.
• Thesearchalgorithmdatastructurecontainsthegenetic algo-rithmparametersandthemodificationoperatormethodswhich aredescribedearlier.
• Theobjectivefunctionstructurecontainsallthedataand meth-odsrelatedtopropertytargetvaluesandperformancefunctionin additiontothepropertyestimationmodelforpurecompounds andmixtures.Inaddition,weallowthebasicusertobeloggedin theMMIcomponenttoidentifyproductrequirementsasneeds (Yunusetal.,2014)ratherthanasprecisepropertynamesand models.Requirementsareeditedbyanexpertuserintoasetof calculableproperties,withspecifictargetsandproperty estima-tionmodelsasdonebyothers(Matteietal.,2014a,b).Table2 displaysanexample.
5. Casestudies
5.1. Casestudy1:blanketwashmixturesubstitution 5.1.1. Problemsetting
“Blanketwash”areneededtocleaninkresiduesfromrubber blanketsinthelithographicprinting process.In replacementof petroleumsourcedsolvents,SinhaandAchenie(2003)usedaCAPD approachonapre-selectionofsevenwater-solubleandlow EHS-impactssolventsandsolvedaMINLPproblemtofindtheoptimal compositionoftheaqueousblend.Heintzetal.(2014)revisited thewholedecisionprocessleadingtosubstituteblanketwash.It ledtothespecificationofatreeofrequirementstranslatedhere intopropertytargetvaluesforaCAPDsearchwiththeIBSStool.We searchforanaqueousmixtureandweoptimizesimultaneouslythe organicmoleculeanditsmolarcomposition.
Afirstrequirementconsistsinsolubilizingthephenolicresinink whichisevaluatedbytheRelativeEnergyDifference,RED, prop-ertygivenbyEq.(6).REDiscomputedbyusingHansensolubility
Fig.8.Organicmolecularstructureandbuildingblocksintheblanketwashmixture search.
parameters,ıD,ıP,ıH,alongwiththeHansendistancebytheink
solubilityradius:
RED=
p
4(ıD−19.7)2+(ıP−11.6)2+(ıH−11.6)2
12.7 (6)
IfRED<1,thesolventdissolvesthesolute.Forthebest solubi-lizationresults,wesetatargetvalueofRED=0.
Otherrequirementsrefertothecleaningprocessconditions:the solventshouldkeepitsfluidity overtherubberblanketsurface, whichisevaluatedbysettingtargetvaluesforviscosityandsurface tension.ItshouldcomplywithVolatileOrganicCompound,VOC limits,evaluatedwiththevapourpressure.EnvironmentalHealth andSafetyissuesareassessedwithafiveEHSindicesmodelanda flammabilityclassisevaluatedwithaflashpointmodel.The prop-ertytargetvaluesaredisplayedinTable3alongwiththeproperty weights,operationconditions,linearornon-linearmixturemodels andpurecompoundmodel.AGaussianfunctionisusedforeach propertyperformanceinEq.(2).
IntheglobalperformanceEq.(2),wesetpenalties,namelythe occurrenceof three consecutiveoxygenatoms orof C-C-Cring (Penalr=100%=>forbidden).
Theorganicsolventstructureissplitintotwofragments,one witha coresynthontraceablefromvariousrenewable biomass materialstocksandtheotherbuiltfromlessthan10groupsamong apoolofsimpleandcomplexgroupsdisplayedinFig.8.
Thesearchisranwiththefollowingsetofparameters: num-berofgenerations(300),populationsize(100),elitism(30)andthe probabilitiesofcrossover,mutation,insertionanddeletion (respec-tively65,15,10and10).Itiscompletedinlessthan40min.Cyclic compoundsareexcluded.Theprobabilityforcompositionchange andmoleculechangeare0.7and0.3respectively.
5.1.2. Results
Theoutputfiledisplaysalistofhundredmixturesratedbytheir performance (Heintzet al.,2012)and thebestsolution(Fig.9) emergedafter40generations.
Sinha and Achenie (2003) suggested a mixture of g-butyrolactone and water (45/55mol%) for which posterior evaluationoftheperformancewithourcriteriais0.94.Our solu-tionreachesaperformanceof0.96foracompositionof31mol%. Confidentialissuespreventustodisplaytheformulaoforganic moleculewhichisdifferentthang-butyrolactone.Fig.9showshow theorganicmolarfractionaffectstheREDproperty.Theminimum isat0.3,indicatingthatthegeneticalgorithmcorrectlyfindsthe bestsolution.It alsoshowsthataddingwater tothebiosolvent improvesitscleaningability.
Table3
Blanketwashsubstitutioncalculableproperties,target,modelandparameters.
Propertyname Weight Target Performancefunctiona Mixturemodel Purecpndmodel Operatingconditions
Molecularweight 1 <200g/mol G(20,0.8) –
Flashpoint 1 >323.15K G(5,0.8) Linear Catoireetal.(2006)
Vaporpressure 1 <0.00267bar G(10−4,0.8) Linear Riedel(1954) T=298.15K
RED 4 =0 G(1,0.9) Volumicfraction HSPiP SeeEq.(6)
Env.waste 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)
Env.impact 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)
Health 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)
Safety 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)
LCA 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)
Viscosity 1 ∈[0.8;1.4]cP G(0.1,0.8) Nonlinear;Tamura andKurata(1952)
Conteetal.(2008) T=298.15K Surfacetension 1 ∈[30;45]dyn/cm2 G(5,0.8) Nonlinear;Rice
andTeja(1982)
Conteetal.(2008) T=298.15K Density 1 ∈[0.9;1.1] G(0.05,0.8) Linear HSPiP3.1,Model,2010
Log(Ws) 4 >4mg/L G(0.5,0.8) Linear MarreroandGani(2002)
aG(Tol,Val):GaussiantypefunctionseeTable1.
5.2. Casestudy2:substitutionofchlorinatedparaffins 5.2.1. Problemsetting
Weseekanalternativemoleculetochlorinatedparaffins(CPs). With a general formula CnH2n+2−xClx, they have been used as
additivesinhigh-temperaturelubricantsandcuttingfluidsfor met-alworking,plasticizersand flameretardants inplastics,sealants andleather(Bayenetal.,2006).Sourcedfrompetroleum,theiruse andrisks(toxicity,potentialcarcinogenic)havebeenassessedand areregulated(EUReport,2008).
Table4summarizesthetargetedphysicalproperties.
Weseekasuitablealternativesourcedfromlevulinicacid(LA), whichisknownasabiomassplatformchemicalwithderivatives alreadyinuseaslubricants,coatingsandprinting/inks(Bozelletal., 2000).Thestructureofsubstitutemoleculeconsistsintwo frag-ments,oneisfixedasLAandtheotherisconstructedasassembly oflessthan10groupsamong19basicgroupsand8complexgroups asdisplayedinFig.10.
Twodifferentsearchesareexecuted:oneusesonlybasicand complexgroupstogeneratecandidates(set1);theotherusesa fixedfragment(levulinicacid)andgenerateLAderivativesusing basicandcomplexgroups(set2).
The following parameters are used: number of generations (300),populationsize(100),elitism(10)andtheprobabilitiesof crossover,mutation,insertionanddeletion(respectively20,50,15 and15).Theformationofcyclicandaromaticcompoundsisnot allowed.Nopenaltyrelatedtospecificchemicalsub-structureis set.
5.2.2. Results
Thesearchwithset1(noLAfragment)producesthemolecule showninFig.11thatwasbroughtoutbythegeneticalgorithmafter the100thgenerationwithaperformanceof0.9122.It hasgreat similaritieswithknownCPsalternativesanddisplayedinFig.11. Noticethatviscosityisnotcomputedbecauseofnosuitablegroup
Fig.9. Performanceandpropertyvaluesforthebestbiomassderivativemixture andinfluenceofitsfractiononREDproperty.
contributionvalueinJobackandReid’smodelexists.Theusercan overcomefailureinpropertycalculationbyallowingthe perfor-manceequationtobecomputedwithoutthoseproperties.Insuch situation,propertyweightsarenormalizedappropriately.
IntroducingLAfragmentinthemolecule,wedisplaythe influ-enceofkmax(maximumnumberofbuildinggroupsinfragment2),
ontheperformanceofthegeneticalgorithminFig.12.Itshows thatacceptablesolutionsarefoundifweusekmax>4.The
maxi-mumperformance(1.000)isalwaysfoundforkmax≥6,andfound morerapidlywithkmax=7.
Withtheset2(LA+basic+complexgroups),themaximum per-formance(1.00)isachievedafter100generationsforkmax=7.The
best10moleculesofthe100thgenerationaregiveninFig.13,along withageneralschemeforthepresentapproachandtheirpredicted valuesofvariouspropertiesinTable5.
Fortheset2,theflashpointofmolecules6and9couldnotbe estimated.Indeed,groupcontributionsdonotexistforthegroup >CO(molecules1–8and10)and forthegroupCHCO(molecule 9).TheConfigurationInteraction“CI”correctionoftheHukkerikar modelwasnotused(Hukkerikaretal.,2012).Forthemolecules otherthan6and 9,theviscositywasnotpredictedbecausethe Joback andReid methoddoesnot handlethephosphategroup. Besides,themethodofHukkerikaretal.doesnothavethe contribu-tionofthegroupaC-PO4fortheboilingpoint;therefore,thevalues ofboilingpointshowninTable5areonlyfirstapproximationsthat donottakeintoaccounttheaC-PO4group.Suchpropertymodel deficienciesforcomplexstructuresconfirmthatCAPDcomputer basedapproaches shouldnecessarilybevalidatedby laboratory experiments.Asmentioned before, theglobalperformance was renormalized.
Analysingtheresultsformolecules6and9,predictedmelting pointsvaluesaregreaterthanthetargetvaluesbutremainsbelow thereportedaverageabsoluteerrorofthemodel(17.65K).These moleculescanberetainedforfurtherexperimentalanalysisoftheir meltingpoint.Nevertheless,theywilllikelybewaivedbychemists becausetheyexhibitadoubleketonesequence, C( O) C( O) , thatisknowntobeunstable.Alternatively,thisrulecouldbecoded intheIBSStoolandapenalizedperformancecouldhavebeenused instead.
5.3. Casestudy3:findsolventsforextractionofnatural antioxidants
5.3.1. Problemsetting
The objective is to find a solvent for methyl p-coumarate (MpCA),anesterofcinnamicacidisolatedfromplants.Thisclass ofcompoundsisknownfortheirantibacterial,antifungaland/or
Table4
CPssubstitutioncase:productrequirementsandcalculablepropertieswithcorrespondingCAMDparameters.
Productrequirement Calculableproperty Weight Target Performancefunctiona Purecpndmodel Operatingconditions
Liquidatusagetemperature Meltingpoint 1 <283.15K G(10,0.8) Hukkerikaretal.(2012) Boilingpoint 1 >523.15K G(5,0.8) Hukkerikaretal.(2012) Nonflammableathightemperatures Flashpoint 1 >505.15K G(5,0.8) Hukkerikaretal.(2012)
Fluidity Viscosity 1 >37cP G(10,0.8) JobackandReid(1987) T=298.15K Lowpotentialtobioaccumulation Log(Kow) 1 <3 G(2,0.8) Hukkerikaretal.(2012)
aG(Tol,Val):GaussiantypefunctionseeTable1.
Table5
Predictedpropertiesforthebest10candidatesforchlorinatedparaffinssubstitution(100thgeneration,kmax=7,levulinic+basicgroups+complexgroups).
Performance Meltingpoint(K) Boilingpoint(K) Flashpoint(K) Viscosity(cP) Log(Kow)
Molecule1 1.0000 274.80 358.67 518.95 – 0.31 Molecule2 1.0000 266.34 360.24 507.30 – 0.30 Molecule3 0.9999 283.30 362.01 520.23 – 0.36 Molecule4 0.9986 284.72 361.60 506.98 – −0.97 Molecule5 0.9982 284.94 362.59 527.74 – −0.82 Molecule6 0.9976 284.14 317.97 – 35.19 −0.72 Molecule7 0.9964 285.71 353.64 512.73 – −1.45 Molecule8 0.9902 287.39 369.86 537.69 – 0.37 Molecule9 0.9884 280.08 307.51 – 32.38 −0.96 Molecule10 0.9656 291.30 364.25 522.11 – −1.33
antiviralactivity(Galanakisetal.,2013;Sova,2012).Pantelietal. (2010)foundthat tert-pentanolwasthebestsolvent forMpCA among tert-butanol, tert-pentanol, ethyl acetate and n-hexane.
Herewesearchforaglycerolbasedsolvent.Themolecular struc-turewelookforconsistsintwofragments.FragmentFgt1istaken
fromalistoftwoglycerolderivativeswitheithersn-1orsn-2 sub-stituent.Theotherfragmentisconstructedasassemblyoflessthan 10groupsamongthosedisplayedinFig.14.
WeusetheIBSS toolwithatwolevel search.Atlevel 2,we keepthethreebestmoleculesfromthelistofcandidatesfound
Fig.10.Achlorinatedparaffinexampleandbuildingblocksusedtogeneratealternativemolecules.
Fig.12.Influenceofthemaximumnumberofbuildingblocksonthegenetic algo-rithmperformance.
atlevel1andimprovethepredictionoftheirsolubilitybyusing asolid–liquidequilibriumcalculationinsteadoftheREDproperty (seeEq.(6))usedatlevel1.Thesolubilityxofanidealsolidphasein aliquidsolventisgivenbyEq.(7)infirstapproximation(Prausnitz
etal.,1998): ln x=1Hm RT
1− T Tm −ln (7)where istheactivitycoefficient(computedbytheUNIFAC mod-ifiedDortmund1993model),1HmandTmarethefusionenthalpy
andthemeltingtemperatureofMpCA,respectively.
Table6summarizesthephysicalpropertiestobesatisfiedbya potentialcandidate.
Thefollowingsetofparametersareused:numberofgenerations (300),populationsize(50),elitism(5),kmax(4)andthe
probabili-tiesofcrossover,mutation,insertionanddeletion(respectively40, 50,5,5).Theformationofcyclicandaromaticcompoundsisnot allowed.
5.3.2. Results
Themaximumperformance(0.9802)isachievedin60 gener-ations.Thebest10moleculesofthe60thgenerationaregivenin Fig.15andtheirpropertiesaredisplayedinTable7.
None of those molecules exhibit a sn-2 glycerol derivative. Indeed,thecontributiontothemeltingpoint predictionsofthe
sn-2 OCHCH2OH group is much larger (10.9421) than the
sn-1 OCH2CHOH group contribution (1.7527) in the Hukkerikar’s method(Hukkerikaretal.,2012).Thus,isomerscanshowa dif-ferenceinmeltingpointupto100K.Eventhoughallthegenerated
Fig.13.Candidatesforchlorinatedparaffinssubstitution.Buildingblocks:levulinicacid+basicgroups+complexgroups.100thgeneration,kmax=7.
Table6
MpCAsolventextractioncase:productrequirementsandcalculablepropertieswithcorrespondingCAMDparameters.
Productrequirement Calculableproperty Weight Target Performance functiona
Purecpndmodel Level1
Purecpndmodel Level2
MustdissolveMpCA RED 2 =0 G(1,0.95) Hukkerikaretal.(2012)b UNIFAC+SLEat25◦C Liquidatroomtemperature Meltingpoint 1 <283.15 G(10,0.8) Hukkerikaretal.(2012) –
Boilingpoint 1 >373.15K G(5,0.8) Hukkerikaretal.(2012) – Nonflammability Flashpoint 1 >343.15K G(1050.8) Hukkerikaretal.(2012) – Lowpotentialtobioaccumulation Log(Kow) 1 <3 G(2,0.8) Hukkerikaretal.(2012) – aG(Tol,Val):GaussiantypefunctionseeTable1.
bTheHukkerikarmodelisusedtocomputetheHansensolubilityparameters,whichareusedtocomputeRED.
Table7
Predictedpropertiesforthebest10candidatesformethylp-coumaratesolubilization(60thgeneration,kmax=4,glycerol+basicgroups).
Level1 Level2
Performance Meltingpoint(K) Boilingpoint(K) Flashpoint(K) Log(Kow) RED Solubilityfraction
Molecule1 0.9802 285.24 532.92 349.35 0.03 1.04 0.0896 Molecule2 0.9732 287.34 548.81 341.47 1.29 0.99 0.0787 Molecule3 0.9710 290.63 529.80 349.19 0.17 0.75 0.0902 Molecule4 0.9635 283.62 537.89 339.35 1.07 0.99 – Molecule5 0.9570 293.23 523.40 345.25 0.09 0.74 – Molecule6 0.9471 290.25 535.61 340.53 −0.84 1.24 – Molecule7 0.9213 298.71 539.53 356.66 0.55 0.74 – Molecule8 0.9196 295.18 535.27 343.21 −1.09 1.46 – Molecule9 0.9150 295.13 534.63 338.28 0.78 0.67 – Molecule10 0.9095 300.37 556.14 347.41 0.96 0.77 –
moleculeshavepredictedmeltingpointsgreaterthanthetarget, thisdifferenceiswithinthereportedaverageabsoluteerrorofthe model(17.65K),andthesemoleculescanberetainedforfurther experimentalanalysis.Itcanbealsonotedthatthebestmolecule isacompromise,withalargerREDthanothermoleculesbutwith anoverallbetterperformance.
Thetop3moleculesproposedbylevel1areretainedfor fur-theranalysisoftheirsolubilitywiththeUNIFAC–SLEapproach.The resultsoflevel2shifttheorderofmolecules1and2intermsof performancecomparedtothatofthelevelonebyusingtheRED model:themolarfractionsolubilityofmolecule2ispredictedat 0.0787,lowerthanthatofmolecule1at0.0896.Thepredicted solu-bilityofmolecule3isthehighest:0.0902.Thisvalueconfirmswith thelevel1REDprediction,inwhichmolecule3isthatwiththe lowestREDvalue.
Fig.15.Candidates for methylp-coumaratesolubilization.Glyceroland basic groupsasbuildingblocks;60thgeneration.
6. Conclusion
Wehavedescribedthemethods,datastructuresandsoftware implementationofIBSS,acomputeraidedproductdesign(CAPD) toolthatisaimedatfindingmixtures.BasedonCAMDconcepts,it isabletooptimizesimultaneouslythemixturecomponents, com-positionsandmixtureconditions.
Withthehelpofamodeldrivenengineeringapproach,the archi-tectureandthefunctionalneedsoftheCAPDtoolweredevised,and specificallyaimedatfindingmixtureswithmoleculesthatcanbe sourcedfrompoolsofrenewablematerials.Thisisdoneby set-tingconstraintsonthemixturemoleculesoronfragmentswithin molecules,likebio-sourcedsynthons.
A molecular representation based on an adjacency matrix wasselected.Itwasmadeflexibletorepresentbothatom-based structures and fragment-based structures. Diagonal hydrogen-suppressedatomicelementsofthematrixweredescribedwitha novelcodingoffourindexesdescribingtheatomnumber,the high-estbondtype,theatomneighbouringcontextandthenumberof hydrogenatoms.Thecodingwasalsodevisedfortheidentification ofchemicalgroupsordescriptorsnecessarytoevaluate proper-tieswithmodelsfromvariousauthors.Acomplexgroupdescribing polyatomicstructureswascreatedtorepresentpolyatomic chem-icalfunctionsandsynthons.Itenabledtokeepintactbio-sourced synthonsduringthesearchandpromotetheiroccurrenceinthe finalsolutioniftheirperformancewasdeemedhighenough.
Ageneticalgorithmwasselectedtooptimizesimultaneouslythe mixtureelements,itscompositionandadditionaloperating con-ditions.Itwasmadecapabletoperformamultilevelsearchwith differentobjectivefunctions.Modificationoperatorswereadapted tocopewiththemixturesearchcontextandwiththe aforemen-tionedmolecularrepresentation.Classicalcrossoverandmutation operatorswereused,whileinsertionanddeletionoperatorswere adaptedtoinsertordeletesidebranch.Anewsubstitution opera-torwasproposedtomaintaincyclicmoleculesthroughthegenetic algorithm generations. Thebuilding of fragmentsfrombasic or complexgroupswasmademoreefficientbyusingavectorgroup classificationinventoryingbuilding groupsonthebasis oftheir externalconnections.
The mixture performance is calculated through a sum of weightedpropertyperformancethatcanbepenalizedifspecific molecularpatternsoccur,likethosefoundintoxicmolecules.
Thetoolwasimplementedasasetofthreeindependent soft-ware components, namely a MMI component, a CAPD search component and a property library component; associated to a databaseof basicandcomplex functionalgroups. Tocope with thedifficultyforsomeusersofexpressingrequirementsinterms ofnumericaltargetvalues,wedistinguishedproductrequirement (bettersuitabletobasicuser)vscalculableproperties(forexpert users).
FutureworkisongoingtosettheCAPDtoolwithinavirtual lab-oratorydecision-makingframeworkasdescribedbyHeintzetal. (2014). We also intend to benefit from the flexible tool archi-tecture, first byadding alternative solving methods other than geneticalgorithm,andbytestingthesuitabilityofthemolecular graphrepresentationfornovelpropertymodelsbasedonhigher dimensionalitywhichareusefultohandleconformationdependent properties.
Acknowledgment
This scientific work was supported by the French National Research Agency(InBioSynSolv ANR-CP2D-2009-08)in partner-shipwithRhodia-SolvayCompany,ENSCLandLCA-INPT.
AppendixA. Supplementarydata
Supplementarydataassociatedwiththisarticlecanbefound, intheonlineversion,athttp://dx.doi.org/10.1016/j.compchemeng. 2014.09.009.
References
AchenieLEK,GaniR,VenkatasubramanianV.Computeraidedmoleculardesign: theoryandpractice.Amsterdam:Elsevier;2003.
AnastasP,WarnerJ.Greenchemistrytheoryandpractice.Oxford:OxfordUniversity Press;1998.p.135.
BayenS,ObbardJP,ThomasGO.Chlorinatedparaffins:areviewofanalysisand environmentaloccurrence.EnvironInt2006;32:915–29.
BozellJJ,MoensL,ElliottDC,WangY,NeuenscwanderGG,FitzpatrickSW,etal. Productionoflevulinicacidanduseasaplatformchemicalforderivedproducts. ResourConservRecycl2000;28:227–39.
CatoireL,PaulmierS,NaudetV.Experimentaldeterminationandestimationof closedcupflashpointsofmixturesofflammablesolvents.ProcessSafProg 2006;25:33–9.
ChemmangattuvalappilNG,EdenMR.Anovelmethodologyforproperty-based molecular design using multiple topological indices. Ind Eng Chem Res 2013;52:7090–103.
ChuriN,AchenieLEK.Novelmathematicalprogrammingmodelforcomputeraided moleculardesign.IndEngChemRes1996;35:3788–94.
ChuriN,Achenie LEK.The optimaldesignofrefrigerant mixturesfor a two-evaporatorrefrigerationsystem.ComputChemEng1997;21S:S349–54. ConstantinouL,GaniR.Newgroupcontributionmethodforestimatingproperties
ofpurecompounds.AIChEJ1994;40:1697–710.
ConstantinouL,BagherpourK,GaniR,KleinJA,WuDT.Computeraidedproduct design:problemformulations,methodologyandapplications.ComputChem Eng1996;20:685–702.
ConteE.InnovationinIntegratedChemicalproduct-processDesign:Development throughaModel-basedSystemsApproach.TechnicalUniversityofDenmark (DTU);2010,PhDThesis.
ConteE,GaniR,NgKM.Designofformulatedproducts:asystematicmethodology. AIChEJ2011;57:2431–49.
ConteE,GaniR.Chemicals-basedformulationdesign.ComputAidedChemEng 2011;29:1588–92.
ConteE,MartinhoA,MatosHA,GaniR.Combinedgroup-contributionatom connec-tivityindex-basedmethodsforestimationofsurfacetensionandviscosity.Ind EngChemRes2008;47:7940–54.
CostaR,MoggridgeGD,SaraivaPM.Chemicalproductengineering:anemerging paradigmwithinchemicalengineering.AIChEJ2006;52:1976–86.
DelCastilloE,MontgomeryDC,McCarvilleDR.Modifieddesirabilityfunctionsfor multipleresponseoptimization.JQualTechnol1996;28:337–45.
Duvedi AP, Achenie LEK. On the design of environmentally benign refriger-antmixtures:a mathematical programmingapproach.Comput Chem Eng 1997;21:915–23.
ECHA. REACH marketfinal report; 2012, Available from: http://ec.europa.eu/ enterprise/sectors/chemicals/files/reach/review2012/market-final-reporten. pdf[lastaccessedFebruary2014].
EU report. European UnionRisk Assessmentreport, alkanes, C10-13, chloro. EuropeanChemicalsBureau;2008,http://echa.europa.eu/documents/10162/ 6434698/oratsaddendumalkanesc10-13chloroen.pdf [last accessed September2013].
GalanakisCM,GoulasV,TsakonaS,ManganarisGA,GekasV.Aknowledgebase fortherecoveryofnaturalphenolswithdifferentsolvents.IntJFoodProp 2013;16:382–96.
GallenosSA.Mini-reviewonchemicalsimilarityandpredictionoftoxicity.Curr ComputAidedDrugDes2006;2:105–22.
GaniR,FredenslundA.Computeraidedmolecularandmixturedesignwithspecified propertyconstraints.FluidPhaseEquilib1993;82:39–46.
GaniR,NielsenB,FredenslundA.Agroupcontributionapproachtocomputer-aided moleculardesign.AIChEJ1991;37:1318–32.
GaniR.Chemicalproductdesign:challengesandopportunities.ComputChemEng 2004;28:2441–57.
GaniR,HarperPM,HostrupM.Automaticcreationofmissinggroupsthrough con-nectivityindexforpure-componentpropertyprediction.IndEngChemRes 2005;44:7262–9.
HarperPM,GaniR,KolarP,IshikawaT.Computer-aidedmoleculardesignwith combinedmolecularmodeling andgroupcontribution.FluidPhaseEquilib 1999;158–160:337–47.
HeintzJ,ToucheI,TelesDosSantosM,GerbaudV.Anintegratedframeworkfor productformulationbycomputeraidedmixturedesign.ComputAidedChem Eng2012;30:702–6.
Heintz J, Belaud JP, Gerbaud V. Chemical enterprise model and decision-making framework for sustainable chemical product design. Comput Ind 2014;65:505–20.
HerringRHIII,EdenMR.Denovomoleculardesignusingagraph-basedgenetic algorithmapproach.ComputAidedChemEng2014;33:7–12.
HSPiP3.1.http://hansen-solubility.com/index.php;2010.
Hukkerikar AS,SarupB,TenKateA,Abildskov J,SinG, GaniR,etal. Group-contribution+ (GC+) based estimation of properties of pure components: improvedpropertyestimationanduncertaintyanalysis.FluidPhaseEquilib 2012;321:25–43.
Joback R, Reid RC. Estimation of pure-component properties from group-contributions.ChemEngCommun1987;57:233–43.
Karelson M, Lobanov VS, Katrinsky AR. Quantum-chemical descriptors in QSAR/QSPRstudies.ChemRev1996;96:1027–43.
KarunanithiAT,AchenieLEK,GaniR.Anewdecomposition-basedcomputer-aided molecular/mixturedesignmethodologyforthedesignofoptimalsolventsand solventmixtures.IndEngChemRes2005;44:4785–97.
KleinJA,WuDT,GaniR.Computeraidedmixturedesignwithspecifiedproperty constraints.ComputChemEng1992;16S:S229–36.
KorichiM,GerbaudV,FloquetP,MeniaiAH,NacefS,JouliaX.Computeraidedaroma designI–molecularknowledgeframework.ChemEngProc:ProcessIntensif 2008;47:1902–11.
KrollP,KruchtenP.Therationalunifiedprocessmadeeasy:apractitioner’sguideto theRUP.1sted.Boston:AddisonWesley;2003.
LinB,ChavaliS,CamardaK,MillerDC.Computer-aidedmoleculardesignusingTabu search.ComputChemEng2005;29:337–47.
MarreroJ,GaniR.Group-contributionbasedestimationofpurecomponent proper-ties.FluidPhaseEquilib2001;183–184:183–208.
MarreroJ,GaniR.Group-contribution-basedestimationofoctanol/water parti-tion coefficientandaqueous solubility.Ind EngChem Res2002;41:6623– 33.
MartinTM,YoungDM.Predictionoftheacutetoxicity(96-hLC50)oforganic com-poundstothefatheadminnow(Pimephalespromelas)usingagroupcontribution method.ChemResToxicol2001;14:1378–85.
MatteiM,HillM,KontogeorgisGM,GaniR.Acomprehensiveframeworkfor sur-factantselectionanddesignforemulsionbasedchemicalproductdesign.Fluid PhaseEquilib2014a;362:288–99.
MatteiM,YunusNA,KalakulS,KontogeorgisGM,WoodleyJM,GernaeyKV,etal. Thevirtualproduct-processdesignlaboratoryforstructuredchemicalproduct designandanalysis.ComputAidedChemEng2014b;33:61–6.
NannoolalY,RareyJ,RamjugernathD,CordesW.Estimationofpurecomponent properties:Part1.Estimationofthenormalboilingpointofnon-electrolyte organiccompoundsviagroupcontributionsandgroupinteractions.FluidPhase Equilib2004;226:45–63.
NannoolalY,RareyJ,RamjugernathD.Estimationofpurecomponentproperties: Part2.Estimationofcriticalpropertydatabygroupcontribution.FluidPhase Equilib2007;252:1–27.
NgLY,ChemmangattuvalappilNG,NgDKS.Optimalchemicalproductdesignvia fuzzyoptimisationbasedinversedesigntechniques.ComputAidedChemEng 2014;33:325–9.
GarnierE,BliardC,NiedduM.Theemergenceofdoublygreenchemistry,anarrative approach.EurRevIndEconPolicy2012;4:1.
OuriqueJE,SilvaTellesA.Computer-aidedmoleculardesignwithsimulated anneal-ingandmoleculargraphs.ComputChemEng1998;22S:S615–8.
PanteliE,SaratsiotiP,StamatisH,VoutsasE.Solubilitiesofcinnamicacidestersin organicsolvents.JChemEngData2010;55:745–9.
PapadopoulosAI,StijepovicM,LinkeP,SeferlisP,VoutetakisS.Moleculardesign ofworkingfluidmixturesfororganicRankinecycles.ComputAidedChemEng 2013;32:289–94.
PrausnitzJW,LichtenthalerRN,deAzevedoGE.Molecularthermodynamicsoffluid phaseequilibria.3rded.NewYork:Prentice-Hall;1998.
REACH.Regulationtext, corrigendumandamendments;2006,Availablefrom: http://ec.europa.eu/enterprise/sectors/chemicals/reach/indexen.htm RiceP,TejaAS.Ageneralizedcorresponding-statesmethodfortheprediction
ofsurfacetensionofpureliquidsandliquidmixtures.JColloidInterfaceSci 1982;86:158–63.
Riedel L. Eineneue universelleDampfdruckformel Untersuchungen übereine ErweiterungdesTheoremsderübereinstimmendenZustände.TeilI.ChemIng Tec1954;26:83–9.
SinhaM,AchenieLEK.CAMDinsolventmixturedesign.In:AchenieLEK,GaniR, VenkatasubramanianV,editors.Computeraidedmoleculardesign:theoryand practice.Amsterdam:Elsevier;2003.p.261–87[Chapter11].
Solvason CC, Chemmangattuvalappil NG,Eden MR. Asystematic method for integratingproductattributesandmolecularsynthesis.Comput ChemEng 2009;33(5):977–91.
SovaM.Antioxidantandantimicrobialactivitiesofcinnamicacidderivatives. Mini-RevMedChem2012;12:749–67.
TamuraM,KurataM.Ontheviscosityofbinarymixtureofliquids.BullChemSoc Jpn1952;25(1):32–8.
VaidyanathanR,El-HalwagiM.Computer-aidedsynthesisofpolymersandblends withtargetproperties.IndEngChemRes1996;35:627–34.
VeithGD,KonasewichDE.Structure–activitycorrelationsinstudiesoftoxicityand bioconcentrationwithaquaticorganisms.Windsor,ON:GreatLakesResearch AdvisoryBoard;1975.
VenkatasubramanianV,ChanK,CaruthersJM.Computer-aidedmoleculardesign usinggeneticalgorithms.ComputChemEng1994;18:833–44.
VOC.VOCSolventsEmissionsDirective(Directive1999/13/EC)amendedthrough article13ofthePaintsDirective(Directive2004/42/EC);2004.
Weis DC, Visco DP. Computer-aided molecular design using the signature moleculardescriptor: application to solvent selection. Comput Chem Eng 2010;34:1018–29.
YunusNA,GernaeyKV,WoodleyJ,GaniR.Asystematicmethodologyfordesignof tailor-madeblendedproducts.ComputChemEng2014;66:201–13.