• Aucun résultat trouvé

Computer aided product design tool for sustainable chemical product development

N/A
N/A
Protected

Academic year: 2021

Partager "Computer aided product design tool for sustainable chemical product development"

Copied!
16
0
0

Texte intégral

(1)

Any correspondence concerning this service should be sent to the repository administrator:

staff-oatao@inp-toulouse.fr

Identification number: DOI : 10.1016/j.compchemeng.2014.09.009

Official URL:

http://dx.doi.org/10.1016/j.compchemeng.2014.09.009

This is an author-deposited version published in:

http://oatao.univ-toulouse.fr/

Eprints ID: 12221

To cite this version:

Heintz, Juliette and Belaud, Jean-Pierre and Pandya, Nishant and Teles dos

Santos, Moises and Gerbaud, Vincent Computer aided product design tool for

sustainable chemical product development. (2014) Computers & Chemical

Engineering, vol. 71 . pp. 362-376. ISSN 0098-1354

O

pen

A

rchive

T

oulouse

A

rchive

O

uverte (

OATAO

)

OATAO is an open access repository that collects the work of Toulouse researchers and

makes it freely available over the web where possible.

(2)

Computer

aided

product

design

tool

for

sustainable

product

development

Juliette

Heintz

a,b,3

,

Jean-Pierre

Belaud

a,b

,

Nishant

Pandya

a,b,2

,

Moises

Teles

Dos

Santos

a,b,1

,

Vincent

Gerbaud

a,b,∗

aUniversitédeToulouse,INP,UPS,LGC(LaboratoiredeGénieChimique),4alléeEmileMonso,F-31432ToulouseCedex04,France bCNRS,LGC(LaboratoiredeGénieChimique),F-31432ToulouseCedex04,France

Keywords:

Computeraidedproductdesign Geneticalgorithm

Moleculargraph Bio-basedmolecule Sustainableproductdesign

Acomputeraidedproductdesign(CAPD)toolisproposedthatfindsmixturesmatchingtargetproperties. Geneticalgorithmcrossoverandmutationoperatorsarecompletedwithinsertionordeletionoperators adaptedforsidebranches.Anewsubstitutionoperatorisdevisedforcyclicmolecules.Themixture fitnessisevaluatedbyaweightedsumofpropertyperformances.Moleculesarerepresentedbymolecular graphs.Theyaresplitintomolecularfragmentswhicharebuiltfrompolyatomicgroups.Moleculesor molecularfragmentscanbefixed,constrainedorleftfreeforbuildinganewmolecule.Buildingblocksare chemicalfunctionalgroupsorbio-sourcedsynthons.Aspecificcodingofhydrogen-suppressedatomsis devisedthatcanbeusedwithvariouspropertyestimationmodelswhereatomconnectivityinformation isrequired.Illustrationisprovidedthroughthreecasestudiestofindlevulinic,glycerolandbio-based derivativesassubstituteforchlorinatedparaffin,methylp-coumarateestersolventandblanketwash solvent,respectively.

1. Introduction

Thechemicalindustriesareonthefrontlineofsustainable devel-opmentduetothepotentialimpactontheenvironment,healthand safetyofitsproductandprocessactivities.Regulationssuchasthe EuropeanREACH(REACH,2006)andVOC(VOC,2004)directivesor thekeeninterestofconsumersforeco-labelledproductspushthe chemicalindustriestoreconsidertheproductswhichtheyuseand produce.

In Europe, the cost ofregistering chemicals tocomply with REACHcouldexceedD 2.1 billion,basedonabout30,000 sub-stances(ECHA,2012).Thereforethereisastrongincentivetofind substitutemoleculesandchemicalproducts.Newproductsneed

∗ Correspondingauthorat:CNRS,LGC(LaboratoiredeGénieChimique),F-31432 ToulouseCedex04,France.Tel.:+33534323651.

E-mailaddress:Vincent.Gerbaud@ensiacet.fr(V.Gerbaud).

1 Currentaddress:DepartmentofChemicalEngineering,PolytechnicSchoolof theUniversityofSãoPaulo,AvenidaProfessorLineuPrestes,05088-900SãoPaulo, Brazil.

2 Currentaddress:ShroffS.R.RotaryInstituteofChemicalTechnology,BlockNo. 402,At&PO–Vataria,Bharuch,Gujarat393001,India.

3 Currentaddress:ProsimSA,51rueAmpère,ImmeubleStratègeA,31670Labège, France.

toobeyenvironmental,healthandsafetyconstraintsinadditionto usualproductandprocessrequirements.Economistshaveargued thatadoublygreenchemistryperspectiveprevailsamongchemical industryengagedingreenactivities:onegreenforthereductionof theirimpactsonenvironmentandonegreenfortheuseof renew-ablerawmaterials(Garnieretal.,2012).Thefirstperspectiveis adirecttranscriptofthedefinitionofsustainablegrowthinthe foundingBrundtland1987report.Thesecondistheseventh prin-cipleofgreenchemistry(AnastasandWarner,1998).Asitshould allowsustainableissues liketoxicityor degradabilitytobemet moreeasily,theuseofbio-sourcedmoleculesorsynthonsisamajor stimuluswhenlookingforanewproduct.

Forfindingasubstitutionmolecule,theusual‘trialanderror’ approach seemsinefficientunless highthroughputscreening is used. Instead, reverse engineering approaches, like Computer AidedMolecularDesign(CAMD)arefittohandleseveral proper-tiesandtoproposemolecularstructuresmatchingthetargetvalues oftheseproperties.Insomecases,theproblemofsubstitutinga moleculemayresultinproposingamixture.Thisfurtherbrings forththechallengeofcomputingmixturepropertieswhichmay notalwaysobeyalinearmixingrule.

Thispaperpresentsa ComputerAidedMolecularDesigntool and its tailoring for finding alternative bio-sourced molecules andmixtures,withthehelpofmodeldrivenengineering(MDE)

(3)

concepts.TheCAPDtoolfollowsthegeneralmethodologyofCAMD toolwithseveralmodifications.Byusingageneticalgorithm,this toolsimultaneouslyoptimizesthemolecularstructureofthe com-ponentsandtheircompositionsinthemixtureinordertobestfit thedesiredpropertiesatnormaloperatingconditionssetbythe user.

After a section devoted to present background information relatedtoCAMD,wedescribethedatastructuresandmethods. They concernmolecular representation, atomcoding, fragment builder along withspecificgeneticoperators tobuildor delete sidechemicalbranchesandtoenhancechangesinaromaticrings whilekeepingtheiraromaticity.Theirimplementationintoathree software-componenttoolisthenpresentedusingMDEconcepts. Threecase studiesare presented to illustratesomeof the fea-tures ofthetool:mixturesearch(case1),search ofa molecule withpredefinedbio-sourcedsynthons(case2),twolevelsearch (case3).

2. Background

Computer Aided Molecular Design (CAMD) aims at finding moleculesthatsatisfyasetofpropertytargetsdefinedinadvance (Achenie et al., 2003). CAMD relies upon four main concepts, namely,amolecularrepresentationmodel,asetofproperty cal-culationmodels,a solvingmethodanda performancecriterion. Candidatemoleculescanbesearchedinadatabaseorbuiltfrom chemicalgroups.Theirfitnessisevaluatedthankstoproperty esti-mationmodelsbycomparingthevaluesofestimatedpropertyand thetargetproperty.Thentheyarediscriminatedaccordingtotheir performanceandeithermodified,keptasisorrejected,withthe helpofthesolvingalgorithm.Duringtheproblemsetting,in addi-tiontotheinitialdefinitionofthepropertytargetvalues,chemical blocksarepre-selectedtobeusedinthemolecularconstruction.

The CAMD problemsolving methodhasoften been tailored toaspecificrepresentationmodel.Theearly“generateandtest” methodwasdevelopedforasetofchemicalgroupsthatwerealso usedbythegroup-contributionpropertyestimationmethod(Gani etal.,1991;Constantinouetal.,1996).Avectorofgroupsandtheir occurrencesdescribedcandidates.However,asinglevectormay correspondtoseveralisomermolecules andin thiscase a final stepisrequiredtogeneratethetruemolecules.Toovercomethis, somerepresentationdescribingexplicitlythegroup interconnec-tionshavebeenused:ageneticalgorithmwithadaptedoperators wasused to generatepolymers witha symbol string encoding (Venkatasubramanianetal.,1994),abinaryrepresentationofatom connectivityinmoleculeswasusedwithaMILNPmethod(Churi andAchenie,1996),anadjacencymatrixwasusedwithasimulated annealing(OuriqueandSilvaTelles,1998),agraphrepresentation wasusedwithTABUsearch(Linetal.,2005)andrecentlya graph-basedrepresentationissuedfromsignaturedescriptorswasused withageneticalgorithm(HerringandEden,2014).Theseexplicit representationsofmoleculearefitformanykindsofproperty esti-mationmethodsoncearoutineforfindingthegroupsordescriptors ofthecorrespondingestimationmethodisprovided.

Regardingthefitnessofacandidatemolecule,thedifferences betweenthepredictedandtargetvaluesofallpropertiesare aggre-gatedinaglobalobjectivefunctionthrougheitheranarithmetic mean(VaidyanathanandEl-Halwagi,1996)orageometricmean (DelCastilloetal.,1996).Thegeometricmeanpenalizesseverely the fitness when an individual property prediction/estimation methodisfarfromtarget.Inthatwayitismorediscriminantthan thearithmeticmean.

Theevaluationoftheperformanceofeachmoleculereliesupon the calculation of property values that have beenclassified as product-properties,process-relatedpropertiesandusage-related

properties (Costa et al., 2006). Product attributes found desir-ableorundesirablebyconsumersbelongtothelatterclass.For theCAMDproblem, productrequirementshavetobetranslated intotargetpropertyvalues,whichhavebeendonebyusing prob-lemtemplates(Matteietal.,2014a,b).Mostproductandprocess propertiesareusuallydescribedbygroupcontributionmethods (JobackandReid,1987;ConstantinouandGani,1994;Martinand Young,2001;MarreroandGani,2001,2002;Nannoolaletal.,2004, 2007;Hukkerikaretal.,2012)orQSAR/TItopologicalindex/QSPR methods(VeithandKonasewich,1975;Karelsonetal.,1996;Gani etal.,2005;ChemmangattuvalappilandEden,2013).Some envi-ronmental,healthandsafety(EHS)propertieslikeR-phraseorCMR classificationaredescribedbysimilaritymethods,relyinguponthe findingofspecificmolecularpatternsinmolecules(Gallenos,2006). The problem of designing a mixture is referred to as Com-puterAided ProductDesign(CAPD)whereindividualmolecules withinthemixtureandtheircompositionmustbefound. Some CAMDmethodshavebeenextendedtoCAPDwithanadditional composition search (Klein et al., 1992; Gani and Fredenslund, 1993;VaidyanathanandEl-Halwagi,1996;DuvediandAchenie, 1997;ChuriandAchenie,1997;SinhaandAchenie,2003). Over-all, CAPD raises new issues compared to CAMD: firstly, more propertieshavetobematched,includingmoreusage-related prod-uctpropertiesorthemixturestability.Secondly,severalmixture propertymodelssuchasboilingpointandflashpoint,exhibit non-linearmixingrulesandneedtobesolvedwithbuilt-inroutines, whichmayincreasethecomputationtime.Thirdly,some usage-relatedpropertiesmaynotbedescribedbyanysuitableprediction model.

Severalapproaches havebeentakentosolveCAPDproblem: somehaveperformedasequentialsearchofeachmixture com-ponentsindividually,beforecheckingmixtureproperties,stability and composition (Gani, 2004; Conteet al., 2011;Papadopoulos etal.,2013;Matteietal.,2014a,b),someothershavedone decom-positionof theprobleminto asetof subproblems(Karunanithi et al., 2005), while somehave solved theproblem globally for a givenapplication, forexample polymerblends(Vaidyanathan andEl-Halwagi,1996).Aspartofa methodologyforthedesign of formulated products, Gani and co-workers (Conte and Gani, 2011;Conte etal.,2011;Matteietal.,2014a,b)have conceived theVirtual Product-ProcessDesign Laboratory.Theyproposeto runsequentiallyadesignscenariowithinacomputeraidedstage: selectaproblemtemplateandtranslateproductneedsinto prop-erties(Matteietal.,2014a,b),chooseanactiveingredientofthe productfrom thedatabase,thendesign thesolvents withtheir MIXDalgorithmeitherfromapre-definedlistorgeneratedwith aCAMDtool(Conte,2010)andthenaddadditivesfromanother listand finallyend upwiththeoptimizationofcomposition.To escapethecomputer-aidedstage,averificationscenarioisrunwith more accuratemodels,possiblyinvolving model developments. Anultimateexperimentalvalidationendsthedesignactivity.For overcomingtheproblemofconsumerattributesnotdescribedby models,Solvasonetal.(2009)combinedanenumeratingCAMD techniqueandMDOE(mixturedesignofexperiments)technique. Illustratedwiththeformulationofarefrigerantmixture,theyfirst solveareverseformulationproblemtofindpropertyrelationsthat matchuser-defined attributes.Thoserelationsarethen usedas targetofareverseproblemaimingatfindingthesuitablemixture.

3. Methodsanddatastructures

WehavedevelopedaCAPDtool,namedasIBSS(IntegratedBio SourcedSearch).ItfollowsthegeneralmethodologyofCAMDtools andisaimedatfindingmixturesinwhichsomemoleculemaybear bio-sourcedfragments.Theproblemoffindingasinglemolecule

(4)

ishandledasamixturewithoneelement.Themethodsanddata structuresdevelopedtocopewiththattailoringarenowpresented.

3.1. Optimizationproblem

TheCAPDproblemismulti-objectivesinceseveralproperties mustbematched.Itistransformedintoasingle-objective prob-lem,aimingatmaximizingaglobalperformance,GloPerf,described byanobjectivefunctionOF,subjecttokequalityconstraintsandi

inequalityconstraintsonpropertytargetsP.Itcanbemodelledas follows:

OF=max(GloPerf(MGi,zi,condj))

s.t. Pk(MGi,zi,condj)=Pk,fixed

Pl,lowerbound≤Pl(MGi,zi,condj)≤Pl,upperbound

s.t. constraintson MGi,zi,condj

(1)

Theoptimizationvariablesarethemolecular graphstructure

MGioftheindividualimixturecomponents,themixture

compo-sitionziand jconditions condj.Theconditions, condj,affectthe

performancecalculationbyimposingconditionsunderwhichthe propertiesarecalculated.

Theoptimizationvariablescanbeconstrainedtoallowtheuser totailortheproblem:thecompositionofanymolecule,ziand

con-ditioncondjcanbefixed,boundedorfree.Forexample,theuser

canimposemolefractionofaningredient,specifyaphysicalstate ofthemoleculeordefinetherangeofoperatingconditions.Any moleculeMGiofthemixturecanbefixed(ex.anactiveingredient),

sourcedfromalistofmolecule(ex.alistofadditivesorsolvents)or leftfreeforoptimization.Inthatlattercase,oneormorechemical fragmentscanbefixedortakenfromalistoffragmentstodesignthe molecule(ex.toimposearenewablematerialderivativefragment inthemolecule).

Theglobalperformance,GloPerf,isformulatedastheproduct ofapenaltyfunctionandofaweightedsumofnpindividual per-formancePropPerfpwithweightwpwithrespecttoeachproperty

target.

GloPerf(MGi,zi,condj)=min ır=1(ır ·(1−Penalr)) ·

P

np p=1wp·PropPerfp(MGi,zi,condj)

P

np p=1wp (2)

The penalty function minr·(1−Penalr)) is related to user

definedrules.Eachrulercontainsdatarelatedtoamolecular pat-terndescribedas anopened moleculargraph andis assigned a penaltypercentagePenalr.ırisequalto1iftherthruleisviolated,

0otherwise.Typicalrulesdescribeunrealisticstructuresfromthe chemicalsynthesispointofview,ormolecularpatternsthatare correlatedwithtoxicity.

EachindividualperformancePropPerfpforthepropertyp,

com-paresthepredictedvaluexwiththetargetedvalueP.Theuser canselectamongmathematicalfunctionsF(x)asshowninTable1: Gaussian(Venkatasubramanianetal.,1994),desirabilityfunctions (DelCastilloetal.,1996)orstraightfunctions.

CAMDsolutionrobustnessisshatteredbythepropertymodel predictionuncertainty.Solutionshave beenproposedinthe lit-erature,liketheuseoffuzzylogicoperatorstodefineupperand lowerboundedpropertyrangesassociatedtodegreesof satisfac-tion(Ngetal.,2014),ascanbedoneherewiththestraightfunction representation. Alternatively,the knowledgeof property model uncertainty for some group contribution methods (Hukkerikar etal.,2012)canbeusedtodefinetheTolparameterintheGaussian functionrepresentation.

Table1

Propertyperformancefunctions.

3.2. Thesearchalgorithm

Thesearchalgorithmselectedisageneticalgorithmwithelitism policyasearlierproposedbyVenkatasubramanianet al.(1994) inCAMD.Modificationoperatorsareaddedtoalter themixture composition,conditionsandmoleculesandtoperformamultilevel search.Thepopulationsize,theelitismvalue,thenumberoflevel andalltheprobabilitiesofoperatorsaredefinedbytheuser.

The initial population of individuals is generated randomly withinthepredefinedconstraintsontheoptimizationvariables relatedtoMGi,zi,condj.Themethodforbuildingfragmentsfrom

chemicalbuildingblocksisdescribedlater.

TheCAPDsearchcanbeperformedinseveralsequential lev-els(Harperetal.,1999;Korichietal.,2008).Atlowlevel,simple and/orfast-computingpropertypredictionmodelsareusedover alargepopulation. Thenasthelevel increments,morecomplex and/ortime-consumingmodelsareusedoverasmallerpopulation originatedfromthefittestindividualsofthepreviouslevel popula-tion.Atthenextlevel,thesamesetofbuildingblocksandmolecular structuresiskept.Inthemeantime,theobjectivefunctioncanbe modifiedaccordingtotheuser’sinitialchoices:property estima-tionmodelscanbedropped,addedorsubstitutedbymorecomplex ones.

3.3. Mixturerepresentationdata 3.3.1. Mixturerepresentation

ThemixturestructureiscustomisableaspresentedinFig.1.Each mixtureisanassemblyofitemsandconditions.Eachitemcontains onemoleculeandonemolefractionvalue.Eachmoleculeisfurther splitintointerconnectedfragments.Thefragmentsarefurtherbuilt frombasicorcomplexfunctionalgroups.

Initially,theuserdefines themixture structure:thenumber ofmolecules,theirtype(fixed,listorfree)andcomposition con-straints.Foreachfreemolecule,hesetsthenumberoffragments, fragmenttype(fixed,listorfree)andfragmentinterconnections. Foreachfreefragments,hedefinesthebuildinggroupstobeused andtheirmaximumnumberDifferentbuilding blocklistcanbe usedfordifferentfragments.Amoleculemaycontainasinglefree fragment.Inthatcasethefragmenthasnoexternalconnections.

3.3.2. Molecularrepresentation

Wehaveselectedmoleculargraphforthemolecular represen-tationwhichisdescribedbyanadjacencymatrix(Achenieetal.,

(5)

Fig.1. Overviewofthemixturestructureanditssubstructures.

2003).Thediagonal elementsareeithera hydrogen-suppressed atom basicgroup, or a complex group or a fragment. The off-diagonal elements are bond type connections. A matrix with a diagonal that contains basic groups exclusively is called an extendedmoleculargraphhereafter.

A molecule graph isthe aggregationof itsfragmentgraphs. Fig.2describestheacetoinmolecule(3-hydroxybutanone)builtas astructureofthreeinterconnectedfragments.Fragment1isbuilt fromtwobasicgroupsandonecomplexgroup.Fragments2and3 arejustonebasicgroup.

Basicgroups(BGs)representahydrogen-suppressedatom,like O, OH, CH3, CH2,>CH2, NH3, N,etc.BGsareoften

simi-lartosomefirstordergroupsingroupcontributionmethods.They areassigneda“BG.ID”attribute.Itisan‘elementarygroup’integer EG=P1P2P3P4 thatisdisplayedintheextendedmoleculargraph

diagonal.P1referstotheatomicnumberprecededwitha1(106

forC,107forN,108forO,117forCl...),P2referstothehighest

bondorder,P3 tothetypeoftheatomattachedto,includingits

occurrencein anaromaticornon-aromaticcycle,andP4 tothe

number of implicit hydrogen atoms bonded to the atom. BGs alsobeara“bondvector”attributethatrepresentsthenumberof single,doubleandtriplebonds.e.g.forBG.shortformula=“=C<”, BG.ID=“106200”andBG.bondvector=“[2;1;0]”.

Complexfunctionalgroups(CGs)aremulti-atomgroups.They are useful for a compact description of multi-atom chemi-cal functions,like carboxylgroups, R COOH,nitrite R O N O, nitroR NO2,peroxyROOR′,esterRCOOR′,acetalsRCH(OR′)(OR′′),

sulfenylRSOR′...Anon-exhaustivelistofBGsandCGsalongwith

BG.IDisprovidedasSupplementarymaterial.Astheyarekeptintact whenapplyingmodificationoperators,complexgroupsare suit-abletodescribebio-sourcedmoleculederivativeorsynthonsand tokeeptheminthemoleculecandidates.

CGsinheritfromthebasicgroupattributes,butthe“ID”attribute hasnospecificmeaningandisassignedauniqueincrementalvalue 2xxxxxdefinedbytheuser.AdditionalCGsattributesarea molecu-largraph“CG.graph”describingthecomplexgroupintermsofbasic

(6)

groupsanda“CG.connectionVector”integerattributethat repre-sentsthenumberandlocationoftheexternalconnectionsofthe groupinthemoleculargraph(seeFig.2).

Fragments“Fgt”aredescribedasanadjacencymatrixandan externalconnectionvector.Theadjacencymatrixcontainsbasicor complexfunctionalgroupinformation(seeFig.2).Thevector rep-resentsthesingle,doubleortriplebondexternalconnections,and theirlocationonthefunctionalgroups.Similarbondtypeexternal connectionsononegrouparedistinguishedbyaletter.

3.4. Groupvector

Theclassificationoffunctionalgroups byConstantinouetal. (1996)isadaptedintogroupvectors“GV”toprovidethelistofbasic andcomplexfunctionalgroupsauthorizedbytheusertobuilda freefragmentwherekgroupsareallowed.AgroupvectorGVis representedbythefollowingway:

GV={N1,N2,...,Nn} (3) WhereNiisthenumberofgroupsinthefragmentthathave

iconnections,from1ton.Basicfunctionalgroupshaveupto4 connections.Somegroupswithsulphurandphosphorousatoms whereatomvalencecanbe6and5respectively,thoughtheyare describedwith4connections(seealistofbasicgroupsin Supple-mentarymaterial).Forcomplexgroups,thenumberofconnections

ncanbehigherthan4.Authorshaveusedsomegroupswith6 exter-nalconnections,especiallythosethataresourcedfromrenewable synthons.

Inaddition,thefollowingchemicalfeasibilityrulescomingfrom theoctetrulehold:

n

X

j=1 Nj=k (4) n

X

j=1 Nj(2−j)=2m−extconnections (5)

wheremisanumberequalto1minusthemaximumnumberof cyclesallowedbytheuserandextconnectionsisthenumberof externalconnectionsofthefragment.

3.5. Fragmentcreation

Themethodfor building afree fragmentfroma preselected listofbasicorcomplexfunctionalgroupsisdevelopedtoensure adiversityofstructures,inparticularheterocyclesandaromatic cycleswhichmayoftendisappearduringmodificationbyagenetic algorithm.

Theprocedureforconstructingafragmentisdescribedasan activitydiagraminFig.3whichisexplainedbelow:

(1)Initializationoftheuserparameters:minimumkminand max-imumkmaxnumberofBGsand/orCGsfunctionalgroupsina

fragment,themaximumnumberofcyclesmmax,thenumber ofexternalconnectionsforthefragmentextconnectionsand thepreselectedBGsandCGsorderedintoNjlists.

(2)Choiceofthefragmentparameters:avalueofkbetweenkmin

and kmax and a value of m between 1 and 1−mmax are

randomlyselected.Usingthosevalues,thepossiblegroup vec-tors are determined, e.g. assuming that groups with up to 4externalconnectionsexist(n=4)andk=6,fivegroup vec-torsarepossibleGV1={1,5,0,0},GV2={2,3,1,0},GV3={3,1,2,0},

GV4={3,2,0,1},GV5={4,0,1,1}.AGViispickedand usedasa

basisforthefragmentconstruction.Eachgroupvectorhasthe

sameprobabilitytobechosentoensureahigherdiversityof thegeneratedstructures.

(3)Additionofanewelementtothefragment:Atthispointeither anacyclicgrouporanentirecyclestructurecanbeadded.The decisionismaderandomlyconsideringthenumberofcyclesyet tobeconstructedandtheremainingnumberofelementsthat canbeinserted.Ifanacyclicgroupisselected,aNjisselected

randomlyfromGVi.ThenoneofthegroupsoftheNjlistispicked

andisadded.Step3isrepeateduntilallkgroupshavebeen added(gotostep6).Ifacycleisselectedgotostep4.

(4)Cyclebuilding:Thecyclesizeanditsaromaticityaredecided beforethecycleconstructionstarts.Thenalltheelementsthat formthecycleareinsertedoneaftertheotheruntilthecycle iscomplete.Sidebranchestothecycleareinsertedonlyonce thecycleisclosedtothecyclegroupsthatstillbearunsaturated externalconnections.Thiswaytheelementsofacycleare con-secutiveintheadjacencymatrixtomakethegrapheasierto handle.

(5)Fusedcyclebuilding:Attheendofstep4itispossibletoadd afusedcycle.Thedecisionismaderandomlyconsideringthe numberofcyclesyettobeconstructedandtheremaining num-berofelementsthatcanbeinserted.Thensizeandaromaticity aredecided.Fora non-aromaticfusedcycle,theattachment pointsofthefusedcyclearesearchedonthelastinsertedcycle anditsadjacentcycles.Acoupleofattachmentpointsare ran-domlychosenandtheshortestwaybetweenthesetwopoints is determined.Thispath isthenconsideredas apartofthe fusedcycletobuild.Thenthenumberofelementsstilltobe addedisrandomlychosenandtheelementsareaddedoneby one.Theprocesstoconstructanaromaticcycleisthesame withsupplementaryconstraints:theattachmentpointsmust beconsecutiveinthelastcycle;theymustbeconnectedwitha doubleboundandmustconcernaromaticgroups.Thisprocess goesonuntilthefragmentiscomplete(gotostep6).

(6)Fragmentcomplete:Allthegroupshavingbeenadded,the cor-respondingmolecularconsistencyistested.Ifso,thepossible complexgroupsareexpandedintobasicgroupstocreatethe expandedmoleculargraphofthefragment.

3.6. Moleculemodificationoperators

The operators used to alter the molecular graph adjacency matrixaresummarizedinFig.4: mutation,crossover, insertion, deletionandsubstitution.

Regardingtheinsertionanddeletionoperators,wehaveaddeda specificbranchconstructorordestructor.Wehavealsodeveloped anewsubstitutionoperatortoimprovethechancesofstructural changesinaromaticringswithoutbreakingtheiraromaticity.

• Themutationoperatorconsistsintherandomreplacementofa singlegroupbyagroupfromtheNireferencelistthatbearsthe

sameexternalconnections,e.g.>NHby>CH2bothwithtwosingle

bondsinFig.4.

• Thecrossoveroperatorconsistsinrandomlychoosingtwo iden-ticalnon-cyclicbondtypes(single,doubleortriplebound)intwo extendedmoleculargraphs.ThisischeckedwiththeP3valuein

theIDattributeofagroup.Thesemi-graphsarethenswitched andrecombinedtoformtwonewmolecules.

• Theinsertionoperatorconsistsintherandomadditionofagroup inthegraph.Wehavedevelopedthepossibilitytoinsertagroup thathasmorethantwoconnections,like CH<,thusenablingthe completionofthegraphwithanewbranch(Fig.4).

• Thedeletionoperatorconsistsintherandomremovalofagroup withatleasttwoconnectionsofthesametypeinthegraph.To beconsistentwiththeinsertionoperator,itispossibletodelete agroupthathasmorethantwoexternalconnections.Thismay

(7)

Fig.3.Activitydiagramofthecreationofafreefragmentobject.

possiblyinducethedeletionofasidebranchofthemolecule.In Fig.4,thegroup C<israndomlychosenfordeletion.Theextra branchesaredeletedandthetworemainingbranchesaredirectly reconnected.ThebranchNH isdeletedbecauseitisnot consis-tentwiththeremainingconnectiontype.Thusanothersuitable groupF isreconnectedinstead.

• Thenewsubstitutionoperatorcombinestheprinciplesof muta-tionandinsertionandconsistsinthereplacementofagroupby agroupthathasmoreconnections.Itwasdevelopedbecausethe otheroperatorsperformedpoorlyinmodifyingaromaticcycles withoutdestroyingtheiraromaticity:amutationoperatoralone wouldbeineffectivebecauseofthelimitednumberofaromatic

(8)

Fig.4.Moleculemodificationoperators.

groups,hereonlyenablingtoreplace CaromH by Narom and

conversely.Thecrossoveroperatorcannotbeappliedonrings. Theinsertionanddeletionoperatorswouldonlyallowaddingor deletingthearomaticheteroatoms O and NH ,tomaintain thearomaticity.

4. Implementation

Themethodsdescribedabovehavebeenimplementedinthe “IBSS” CAPD tool software prototype. The iterative process of theIBM-RUPsoftwaredevelopmentmethod(KrollandKruchten, 2003)wasapplied.It is centredonthearchitectureand driven bythefunctionalneedsthat shouldcover theCAPDtool.Those needswereexpressedbythepartnersoftheFrenchANRCDP2D 2009projectInBioSynSolv,aimingatdesigningbiosolvents.Those needshighlighted that thesustainability of thecandidate mix-ture may rise from the occurrence of bio-sourced fragments withinthemolecular structureand bytheuseof EHSproperty modelstoevaluatetheperformanceofeachcandidate.To speed-uptheimplementationprocess,ModelDrivenEngineering,MDE principleswerefollowedwiththehelpofUML2.0(Unified Mod-ellingLanguage)andBPMN(BusinessProcessModellingNotation) diagrams.It producedarchitectural, behavioural,functional and structuralUML2.0viewsthatarenowbrieflypresented.

4.1. Architecturalview

TheCAPDtoolisdevelopedascomponent-basedsoftware.Each ofthethreecomponents,‘Man–Machine-InterfaceMMI’;‘SEARCH’ and ‘P3’,is asoftware package thatencapsulatesa setof func-tionsanddataandcommunicatesthroughinterfacewiththeother components(Fig.5).Nexttothem,anXML-structureddatabase containsthebasicandcomplexgroups.

InthespiritofMDE,effectivecodingwasstartedaftertheCAPD tool architectureandneedsdefinitionwerewelladvanced. The firstMMIcomponentiswritteninjavaandaimsatprovidingan inputXMLfiletothesearchcomponent.Thesecondcomponent “SEARCH”isbuiltaroundanobject-orientedarchitectureand is writteninC#withintheVisualStudio® environment.Thethird

propertycalculationcomponent “P3”is a Dynamic-LinkLibrary writteninVB.NET.Itcontainsalibraryofpropertyestimation mod-elsandautomaticgroupfindersormoleculardescriptorroutines used totranslate themolecular graph information sent by the

searchcomponent intosuitableinputs fortheproperty estima-tionmodels.Asastandalonecomponent,variousinterfacemethods enabletheuseoftheP3componentwiththesearchcomponentor withotherindependentsoftware.Currently,thirtypropertiescan beestimatedfromtwentypropertyestimationmodelswhichare listedintheSupplementarymaterial.

4.2. Behaviouralview

Thebehaviouralviewpresentsthedifferentprocessesofthetool andhighlightsthecomponentsinteroperability(Fig.6).

TheinteroperabilitybetweentheMMIandtheSearch compo-nentsisasynchronousviaanXMLfileasinputandatextfilefor theresults.TheinteroperabilitybetweentheSearchandthe Prop-ertycalculationcomponentsissynchronousandWindows-Library like.TheXMLfileisgeneratedthroughtheMMI-componentand loadedinthesearchcomponent interface.Then, theResolution packagelaunchesthesearchalgorithm,ageneticalgorithmwith asinglelevelinFig.6.Firsttheinitialpopulationisgenerated.Then propertiesareevaluatedbythepropertycalculationcomponentto calculatetheperformanceofeachmixture.Thepopulationis mod-ifiedandevaluatedagainuntilastopcriterionissatisfiedputting anendtothesearch.Resultsarethensavedinatextformattothe MMIpackageofthesearchcomponent.Thedatainthesefilesare afterwardsdisplayedbytheMMIcomponent.

4.3. Functionalview

Thefunctionalviewhighlightsthepotentialusersandthemain functionalitiesofthesoftware,likethefunction“launchingaCAPD search”representedinFig.7.Incompliancewiththehierarchical decisionmakingprocessforsustainableproductdesignpresented elsewhere(Heintzet al., 2014), we distinguishthe expertuser fromthebasicuser.Theformercanaccessallparameters.Therole ofbasicuserisintended forpeoplewithmoderate expertisein propertyestimationandchemistry,typicallybusinessor techni-calmanagers.Thebasicuserhasaccesstofollowingfunctionality ofthetool:toselectpoolsofchemicalbuildingblocksclassifiedby rawmaterialsources,tochooseproductrequirementstobe con-sidered,todefinetheroughstructureoftheproductmixtureandto setgenericproductconstraints,likeafixedingredientwithinthe mixture.

(9)

Fig.5. CAPDtoolUMLcomponentdiagram.

Fig.6. BPMNdiagramofthethreeIBSScomponentsbehaviour.

Fig.7describestheusecasediagramofthe‘launchasearch’ functionality.Itgivesaccesstothedefinitionoftheproblemdata throughthreedatasubsetsanddescribesactionsavailabletothe user.

• Themixturedataarerelevanttothestructureofthemixtureand itscomponents:buildingblocks andcompositions.Withthese parameters,theusercancustomizethemixturebydefiningthe

possiblefixedpartsandthedegreesoffreedomofthedifferent variableparts.

• Theobjectivefunctiondataarerelatedtotheproperties,their targetvalues,theirestimationmodelsandtheconditionsusedto calculatetheseproperties.

• The genetic algorithm search parameters are data that can directlyinfluencethespeedandtheeffectivenessofthesearch: population size, elitism, modification operator probabilities, searchlevel.

(10)

Table2

Examplesofproductrequirementsandassociatedcalculableproperties.

Productrequirement Calculable property

Defaultpurecompound calculationmodel

Fluidity Viscosity Conteetal.(2008) Molecular

weight

Atomicweightsummation

Volatility Boilingpoint MarreroandGani(2001) Vaporpressure Riedel(1954)

Toxicity

Log(Kow) MarreroandGani(2002) −Log(LC50) MartinandYoung(2001) Log(BCF) VeithandKonasewich(1975)

Themixturedata shouldpreferablybesetbeforethe objec-tivefunctiondatabecausesomepropertycalculationmodelsmust bechosenforeach mixturecomponentwhich numbermustbe knownbeforehand.Thealgorithmdatacanbespecifiedatanytime. Anydatasetcanbesavedandretrievedindependently.Whenthe dataarecorrectlydefined,theapplicationofferstheusertorun thesearchalgorithm.Theresultsaredisplayedasalistofproduct candidatewhichtheusercanchoosetosave.

4.4. Structuralview

The structuralview concerns themodel abstractions, object classesandtheirrelationships.

• The mixture data structure implements the hierarchy described earlier in Figs. 1 and 2: mix-ture>composition+molecules+conditions>fragments>building blocks;alongwiththemethods likethefragmentconstructor displayedinFig.3.Aroutineisdevelopedtousethebasicand complexgroupsstoredinthe“Block”database.

• Thesearchalgorithmdatastructurecontainsthegenetic algo-rithmparametersandthemodificationoperatormethodswhich aredescribedearlier.

• Theobjectivefunctionstructurecontainsallthedataand meth-odsrelatedtopropertytargetvaluesandperformancefunctionin additiontothepropertyestimationmodelforpurecompounds andmixtures.Inaddition,weallowthebasicusertobeloggedin theMMIcomponenttoidentifyproductrequirementsasneeds (Yunusetal.,2014)ratherthanasprecisepropertynamesand models.Requirementsareeditedbyanexpertuserintoasetof calculableproperties,withspecifictargetsandproperty estima-tionmodelsasdonebyothers(Matteietal.,2014a,b).Table2 displaysanexample.

5. Casestudies

5.1. Casestudy1:blanketwashmixturesubstitution 5.1.1. Problemsetting

“Blanketwash”areneededtocleaninkresiduesfromrubber blanketsinthelithographicprinting process.In replacementof petroleumsourcedsolvents,SinhaandAchenie(2003)usedaCAPD approachonapre-selectionofsevenwater-solubleandlow EHS-impactssolventsandsolvedaMINLPproblemtofindtheoptimal compositionoftheaqueousblend.Heintzetal.(2014)revisited thewholedecisionprocessleadingtosubstituteblanketwash.It ledtothespecificationofatreeofrequirementstranslatedhere intopropertytargetvaluesforaCAPDsearchwiththeIBSStool.We searchforanaqueousmixtureandweoptimizesimultaneouslythe organicmoleculeanditsmolarcomposition.

Afirstrequirementconsistsinsolubilizingthephenolicresinink whichisevaluatedbytheRelativeEnergyDifference,RED, prop-ertygivenbyEq.(6).REDiscomputedbyusingHansensolubility

Fig.8.Organicmolecularstructureandbuildingblocksintheblanketwashmixture search.

parameters,ıD,ıP,ıH,alongwiththeHansendistancebytheink

solubilityradius:

RED=

p

4(ıD−19.7)2+(ıP−11.6)2+(ıH−11.6)2

12.7 (6)

IfRED<1,thesolventdissolvesthesolute.Forthebest solubi-lizationresults,wesetatargetvalueofRED=0.

Otherrequirementsrefertothecleaningprocessconditions:the solventshouldkeepitsfluidity overtherubberblanketsurface, whichisevaluatedbysettingtargetvaluesforviscosityandsurface tension.ItshouldcomplywithVolatileOrganicCompound,VOC limits,evaluatedwiththevapourpressure.EnvironmentalHealth andSafetyissuesareassessedwithafiveEHSindicesmodelanda flammabilityclassisevaluatedwithaflashpointmodel.The prop-ertytargetvaluesaredisplayedinTable3alongwiththeproperty weights,operationconditions,linearornon-linearmixturemodels andpurecompoundmodel.AGaussianfunctionisusedforeach propertyperformanceinEq.(2).

IntheglobalperformanceEq.(2),wesetpenalties,namelythe occurrenceof three consecutiveoxygenatoms orof C-C-Cring (Penalr=100%=>forbidden).

Theorganicsolventstructureissplitintotwofragments,one witha coresynthontraceablefromvariousrenewable biomass materialstocksandtheotherbuiltfromlessthan10groupsamong apoolofsimpleandcomplexgroupsdisplayedinFig.8.

Thesearchisranwiththefollowingsetofparameters: num-berofgenerations(300),populationsize(100),elitism(30)andthe probabilitiesofcrossover,mutation,insertionanddeletion (respec-tively65,15,10and10).Itiscompletedinlessthan40min.Cyclic compoundsareexcluded.Theprobabilityforcompositionchange andmoleculechangeare0.7and0.3respectively.

5.1.2. Results

Theoutputfiledisplaysalistofhundredmixturesratedbytheir performance (Heintzet al.,2012)and thebestsolution(Fig.9) emergedafter40generations.

Sinha and Achenie (2003) suggested a mixture of g-butyrolactone and water (45/55mol%) for which posterior evaluationoftheperformancewithourcriteriais0.94.Our solu-tionreachesaperformanceof0.96foracompositionof31mol%. Confidentialissuespreventustodisplaytheformulaoforganic moleculewhichisdifferentthang-butyrolactone.Fig.9showshow theorganicmolarfractionaffectstheREDproperty.Theminimum isat0.3,indicatingthatthegeneticalgorithmcorrectlyfindsthe bestsolution.It alsoshowsthataddingwater tothebiosolvent improvesitscleaningability.

(11)

Table3

Blanketwashsubstitutioncalculableproperties,target,modelandparameters.

Propertyname Weight Target Performancefunctiona Mixturemodel Purecpndmodel Operatingconditions

Molecularweight 1 <200g/mol G(20,0.8) –

Flashpoint 1 >323.15K G(5,0.8) Linear Catoireetal.(2006)

Vaporpressure 1 <0.00267bar G(10−4,0.8) Linear Riedel(1954) T=298.15K

RED 4 =0 G(1,0.9) Volumicfraction HSPiP SeeEq.(6)

Env.waste 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)

Env.impact 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)

Health 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)

Safety 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)

LCA 0.2 >8 G(1,0.8) Linear WeisandVisco(2010)

Viscosity 1 ∈[0.8;1.4]cP G(0.1,0.8) Nonlinear;Tamura andKurata(1952)

Conteetal.(2008) T=298.15K Surfacetension 1 ∈[30;45]dyn/cm2 G(5,0.8) Nonlinear;Rice

andTeja(1982)

Conteetal.(2008) T=298.15K Density 1 ∈[0.9;1.1] G(0.05,0.8) Linear HSPiP3.1,Model,2010

Log(Ws) 4 >4mg/L G(0.5,0.8) Linear MarreroandGani(2002)

aG(Tol,Val):GaussiantypefunctionseeTable1.

5.2. Casestudy2:substitutionofchlorinatedparaffins 5.2.1. Problemsetting

Weseekanalternativemoleculetochlorinatedparaffins(CPs). With a general formula CnH2n+2−xClx, they have been used as

additivesinhigh-temperaturelubricantsandcuttingfluidsfor met-alworking,plasticizersand flameretardants inplastics,sealants andleather(Bayenetal.,2006).Sourcedfrompetroleum,theiruse andrisks(toxicity,potentialcarcinogenic)havebeenassessedand areregulated(EUReport,2008).

Table4summarizesthetargetedphysicalproperties.

Weseekasuitablealternativesourcedfromlevulinicacid(LA), whichisknownasabiomassplatformchemicalwithderivatives alreadyinuseaslubricants,coatingsandprinting/inks(Bozelletal., 2000).Thestructureofsubstitutemoleculeconsistsintwo frag-ments,oneisfixedasLAandtheotherisconstructedasassembly oflessthan10groupsamong19basicgroupsand8complexgroups asdisplayedinFig.10.

Twodifferentsearchesareexecuted:oneusesonlybasicand complexgroupstogeneratecandidates(set1);theotherusesa fixedfragment(levulinicacid)andgenerateLAderivativesusing basicandcomplexgroups(set2).

The following parameters are used: number of generations (300),populationsize(100),elitism(10)andtheprobabilitiesof crossover,mutation,insertionanddeletion(respectively20,50,15 and15).Theformationofcyclicandaromaticcompoundsisnot allowed.Nopenaltyrelatedtospecificchemicalsub-structureis set.

5.2.2. Results

Thesearchwithset1(noLAfragment)producesthemolecule showninFig.11thatwasbroughtoutbythegeneticalgorithmafter the100thgenerationwithaperformanceof0.9122.It hasgreat similaritieswithknownCPsalternativesanddisplayedinFig.11. Noticethatviscosityisnotcomputedbecauseofnosuitablegroup

Fig.9. Performanceandpropertyvaluesforthebestbiomassderivativemixture andinfluenceofitsfractiononREDproperty.

contributionvalueinJobackandReid’smodelexists.Theusercan overcomefailureinpropertycalculationbyallowingthe perfor-manceequationtobecomputedwithoutthoseproperties.Insuch situation,propertyweightsarenormalizedappropriately.

IntroducingLAfragmentinthemolecule,wedisplaythe influ-enceofkmax(maximumnumberofbuildinggroupsinfragment2),

ontheperformanceofthegeneticalgorithminFig.12.Itshows thatacceptablesolutionsarefoundifweusekmax>4.The

maxi-mumperformance(1.000)isalwaysfoundforkmax≥6,andfound morerapidlywithkmax=7.

Withtheset2(LA+basic+complexgroups),themaximum per-formance(1.00)isachievedafter100generationsforkmax=7.The

best10moleculesofthe100thgenerationaregiveninFig.13,along withageneralschemeforthepresentapproachandtheirpredicted valuesofvariouspropertiesinTable5.

Fortheset2,theflashpointofmolecules6and9couldnotbe estimated.Indeed,groupcontributionsdonotexistforthegroup >CO(molecules1–8and10)and forthegroupCHCO(molecule 9).TheConfigurationInteraction“CI”correctionoftheHukkerikar modelwasnotused(Hukkerikaretal.,2012).Forthemolecules otherthan6and 9,theviscositywasnotpredictedbecausethe Joback andReid methoddoesnot handlethephosphategroup. Besides,themethodofHukkerikaretal.doesnothavethe contribu-tionofthegroupaC-PO4fortheboilingpoint;therefore,thevalues ofboilingpointshowninTable5areonlyfirstapproximationsthat donottakeintoaccounttheaC-PO4group.Suchpropertymodel deficienciesforcomplexstructuresconfirmthatCAPDcomputer basedapproaches shouldnecessarilybevalidatedby laboratory experiments.Asmentioned before, theglobalperformance was renormalized.

Analysingtheresultsformolecules6and9,predictedmelting pointsvaluesaregreaterthanthetargetvaluesbutremainsbelow thereportedaverageabsoluteerrorofthemodel(17.65K).These moleculescanberetainedforfurtherexperimentalanalysisoftheir meltingpoint.Nevertheless,theywilllikelybewaivedbychemists becausetheyexhibitadoubleketonesequence, C( O) C( O) , thatisknowntobeunstable.Alternatively,thisrulecouldbecoded intheIBSStoolandapenalizedperformancecouldhavebeenused instead.

5.3. Casestudy3:findsolventsforextractionofnatural antioxidants

5.3.1. Problemsetting

The objective is to find a solvent for methyl p-coumarate (MpCA),anesterofcinnamicacidisolatedfromplants.Thisclass ofcompoundsisknownfortheirantibacterial,antifungaland/or

(12)

Table4

CPssubstitutioncase:productrequirementsandcalculablepropertieswithcorrespondingCAMDparameters.

Productrequirement Calculableproperty Weight Target Performancefunctiona Purecpndmodel Operatingconditions

Liquidatusagetemperature Meltingpoint 1 <283.15K G(10,0.8) Hukkerikaretal.(2012) Boilingpoint 1 >523.15K G(5,0.8) Hukkerikaretal.(2012) Nonflammableathightemperatures Flashpoint 1 >505.15K G(5,0.8) Hukkerikaretal.(2012)

Fluidity Viscosity 1 >37cP G(10,0.8) JobackandReid(1987) T=298.15K Lowpotentialtobioaccumulation Log(Kow) 1 <3 G(2,0.8) Hukkerikaretal.(2012)

aG(Tol,Val):GaussiantypefunctionseeTable1.

Table5

Predictedpropertiesforthebest10candidatesforchlorinatedparaffinssubstitution(100thgeneration,kmax=7,levulinic+basicgroups+complexgroups).

Performance Meltingpoint(K) Boilingpoint(K) Flashpoint(K) Viscosity(cP) Log(Kow)

Molecule1 1.0000 274.80 358.67 518.95 – 0.31 Molecule2 1.0000 266.34 360.24 507.30 – 0.30 Molecule3 0.9999 283.30 362.01 520.23 – 0.36 Molecule4 0.9986 284.72 361.60 506.98 – −0.97 Molecule5 0.9982 284.94 362.59 527.74 – −0.82 Molecule6 0.9976 284.14 317.97 – 35.19 −0.72 Molecule7 0.9964 285.71 353.64 512.73 – −1.45 Molecule8 0.9902 287.39 369.86 537.69 – 0.37 Molecule9 0.9884 280.08 307.51 – 32.38 −0.96 Molecule10 0.9656 291.30 364.25 522.11 – −1.33

antiviralactivity(Galanakisetal.,2013;Sova,2012).Pantelietal. (2010)foundthat tert-pentanolwasthebestsolvent forMpCA among tert-butanol, tert-pentanol, ethyl acetate and n-hexane.

Herewesearchforaglycerolbasedsolvent.Themolecular struc-turewelookforconsistsintwofragments.FragmentFgt1istaken

fromalistoftwoglycerolderivativeswitheithersn-1orsn-2 sub-stituent.Theotherfragmentisconstructedasassemblyoflessthan 10groupsamongthosedisplayedinFig.14.

WeusetheIBSS toolwithatwolevel search.Atlevel 2,we keepthethreebestmoleculesfromthelistofcandidatesfound

Fig.10.Achlorinatedparaffinexampleandbuildingblocksusedtogeneratealternativemolecules.

(13)

Fig.12.Influenceofthemaximumnumberofbuildingblocksonthegenetic algo-rithmperformance.

atlevel1andimprovethepredictionoftheirsolubilitybyusing asolid–liquidequilibriumcalculationinsteadoftheREDproperty (seeEq.(6))usedatlevel1.Thesolubilityxofanidealsolidphasein aliquidsolventisgivenbyEq.(7)infirstapproximation(Prausnitz

etal.,1998): ln x=1Hm RT



1− T Tm



−ln (7)

where istheactivitycoefficient(computedbytheUNIFAC mod-ifiedDortmund1993model),1HmandTmarethefusionenthalpy

andthemeltingtemperatureofMpCA,respectively.

Table6summarizesthephysicalpropertiestobesatisfiedbya potentialcandidate.

Thefollowingsetofparametersareused:numberofgenerations (300),populationsize(50),elitism(5),kmax(4)andthe

probabili-tiesofcrossover,mutation,insertionanddeletion(respectively40, 50,5,5).Theformationofcyclicandaromaticcompoundsisnot allowed.

5.3.2. Results

Themaximumperformance(0.9802)isachievedin60 gener-ations.Thebest10moleculesofthe60thgenerationaregivenin Fig.15andtheirpropertiesaredisplayedinTable7.

None of those molecules exhibit a sn-2 glycerol derivative. Indeed,thecontributiontothemeltingpoint predictionsofthe

sn-2 OCHCH2OH group is much larger (10.9421) than the

sn-1 OCH2CHOH group contribution (1.7527) in the Hukkerikar’s method(Hukkerikaretal.,2012).Thus,isomerscanshowa dif-ferenceinmeltingpointupto100K.Eventhoughallthegenerated

Fig.13.Candidatesforchlorinatedparaffinssubstitution.Buildingblocks:levulinicacid+basicgroups+complexgroups.100thgeneration,kmax=7.

(14)

Table6

MpCAsolventextractioncase:productrequirementsandcalculablepropertieswithcorrespondingCAMDparameters.

Productrequirement Calculableproperty Weight Target Performance functiona

Purecpndmodel Level1

Purecpndmodel Level2

MustdissolveMpCA RED 2 =0 G(1,0.95) Hukkerikaretal.(2012)b UNIFAC+SLEat25C Liquidatroomtemperature Meltingpoint 1 <283.15 G(10,0.8) Hukkerikaretal.(2012) –

Boilingpoint 1 >373.15K G(5,0.8) Hukkerikaretal.(2012) – Nonflammability Flashpoint 1 >343.15K G(1050.8) Hukkerikaretal.(2012) – Lowpotentialtobioaccumulation Log(Kow) 1 <3 G(2,0.8) Hukkerikaretal.(2012) – aG(Tol,Val):GaussiantypefunctionseeTable1.

bTheHukkerikarmodelisusedtocomputetheHansensolubilityparameters,whichareusedtocomputeRED.

Table7

Predictedpropertiesforthebest10candidatesformethylp-coumaratesolubilization(60thgeneration,kmax=4,glycerol+basicgroups).

Level1 Level2

Performance Meltingpoint(K) Boilingpoint(K) Flashpoint(K) Log(Kow) RED Solubilityfraction

Molecule1 0.9802 285.24 532.92 349.35 0.03 1.04 0.0896 Molecule2 0.9732 287.34 548.81 341.47 1.29 0.99 0.0787 Molecule3 0.9710 290.63 529.80 349.19 0.17 0.75 0.0902 Molecule4 0.9635 283.62 537.89 339.35 1.07 0.99 – Molecule5 0.9570 293.23 523.40 345.25 0.09 0.74 – Molecule6 0.9471 290.25 535.61 340.53 −0.84 1.24 – Molecule7 0.9213 298.71 539.53 356.66 0.55 0.74 – Molecule8 0.9196 295.18 535.27 343.21 −1.09 1.46 – Molecule9 0.9150 295.13 534.63 338.28 0.78 0.67 – Molecule10 0.9095 300.37 556.14 347.41 0.96 0.77 –

moleculeshavepredictedmeltingpointsgreaterthanthetarget, thisdifferenceiswithinthereportedaverageabsoluteerrorofthe model(17.65K),andthesemoleculescanberetainedforfurther experimentalanalysis.Itcanbealsonotedthatthebestmolecule isacompromise,withalargerREDthanothermoleculesbutwith anoverallbetterperformance.

Thetop3moleculesproposedbylevel1areretainedfor fur-theranalysisoftheirsolubilitywiththeUNIFAC–SLEapproach.The resultsoflevel2shifttheorderofmolecules1and2intermsof performancecomparedtothatofthelevelonebyusingtheRED model:themolarfractionsolubilityofmolecule2ispredictedat 0.0787,lowerthanthatofmolecule1at0.0896.Thepredicted solu-bilityofmolecule3isthehighest:0.0902.Thisvalueconfirmswith thelevel1REDprediction,inwhichmolecule3isthatwiththe lowestREDvalue.

Fig.15.Candidates for methylp-coumaratesolubilization.Glyceroland basic groupsasbuildingblocks;60thgeneration.

6. Conclusion

Wehavedescribedthemethods,datastructuresandsoftware implementationofIBSS,acomputeraidedproductdesign(CAPD) toolthatisaimedatfindingmixtures.BasedonCAMDconcepts,it isabletooptimizesimultaneouslythemixturecomponents, com-positionsandmixtureconditions.

Withthehelpofamodeldrivenengineeringapproach,the archi-tectureandthefunctionalneedsoftheCAPDtoolweredevised,and specificallyaimedatfindingmixtureswithmoleculesthatcanbe sourcedfrompoolsofrenewablematerials.Thisisdoneby set-tingconstraintsonthemixturemoleculesoronfragmentswithin molecules,likebio-sourcedsynthons.

A molecular representation based on an adjacency matrix wasselected.Itwasmadeflexibletorepresentbothatom-based structures and fragment-based structures. Diagonal hydrogen-suppressedatomicelementsofthematrixweredescribedwitha novelcodingoffourindexesdescribingtheatomnumber,the high-estbondtype,theatomneighbouringcontextandthenumberof hydrogenatoms.Thecodingwasalsodevisedfortheidentification ofchemicalgroupsordescriptorsnecessarytoevaluate proper-tieswithmodelsfromvariousauthors.Acomplexgroupdescribing polyatomicstructureswascreatedtorepresentpolyatomic chem-icalfunctionsandsynthons.Itenabledtokeepintactbio-sourced synthonsduringthesearchandpromotetheiroccurrenceinthe finalsolutioniftheirperformancewasdeemedhighenough.

Ageneticalgorithmwasselectedtooptimizesimultaneouslythe mixtureelements,itscompositionandadditionaloperating con-ditions.Itwasmadecapabletoperformamultilevelsearchwith differentobjectivefunctions.Modificationoperatorswereadapted tocopewiththemixturesearchcontextandwiththe aforemen-tionedmolecularrepresentation.Classicalcrossoverandmutation operatorswereused,whileinsertionanddeletionoperatorswere adaptedtoinsertordeletesidebranch.Anewsubstitution opera-torwasproposedtomaintaincyclicmoleculesthroughthegenetic algorithm generations. Thebuilding of fragmentsfrombasic or complexgroupswasmademoreefficientbyusingavectorgroup classificationinventoryingbuilding groupsonthebasis oftheir externalconnections.

(15)

The mixture performance is calculated through a sum of weightedpropertyperformancethatcanbepenalizedifspecific molecularpatternsoccur,likethosefoundintoxicmolecules.

Thetoolwasimplementedasasetofthreeindependent soft-ware components, namely a MMI component, a CAPD search component and a property library component; associated to a databaseof basicandcomplex functionalgroups. Tocope with thedifficultyforsomeusersofexpressingrequirementsinterms ofnumericaltargetvalues,wedistinguishedproductrequirement (bettersuitabletobasicuser)vscalculableproperties(forexpert users).

FutureworkisongoingtosettheCAPDtoolwithinavirtual lab-oratorydecision-makingframeworkasdescribedbyHeintzetal. (2014). We also intend to benefit from the flexible tool archi-tecture, first byadding alternative solving methods other than geneticalgorithm,andbytestingthesuitabilityofthemolecular graphrepresentationfornovelpropertymodelsbasedonhigher dimensionalitywhichareusefultohandleconformationdependent properties.

Acknowledgment

This scientific work was supported by the French National Research Agency(InBioSynSolv ANR-CP2D-2009-08)in partner-shipwithRhodia-SolvayCompany,ENSCLandLCA-INPT.

AppendixA. Supplementarydata

Supplementarydataassociatedwiththisarticlecanbefound, intheonlineversion,athttp://dx.doi.org/10.1016/j.compchemeng. 2014.09.009.

References

AchenieLEK,GaniR,VenkatasubramanianV.Computeraidedmoleculardesign: theoryandpractice.Amsterdam:Elsevier;2003.

AnastasP,WarnerJ.Greenchemistrytheoryandpractice.Oxford:OxfordUniversity Press;1998.p.135.

BayenS,ObbardJP,ThomasGO.Chlorinatedparaffins:areviewofanalysisand environmentaloccurrence.EnvironInt2006;32:915–29.

BozellJJ,MoensL,ElliottDC,WangY,NeuenscwanderGG,FitzpatrickSW,etal. Productionoflevulinicacidanduseasaplatformchemicalforderivedproducts. ResourConservRecycl2000;28:227–39.

CatoireL,PaulmierS,NaudetV.Experimentaldeterminationandestimationof closedcupflashpointsofmixturesofflammablesolvents.ProcessSafProg 2006;25:33–9.

ChemmangattuvalappilNG,EdenMR.Anovelmethodologyforproperty-based molecular design using multiple topological indices. Ind Eng Chem Res 2013;52:7090–103.

ChuriN,AchenieLEK.Novelmathematicalprogrammingmodelforcomputeraided moleculardesign.IndEngChemRes1996;35:3788–94.

ChuriN,Achenie LEK.The optimaldesignofrefrigerant mixturesfor a two-evaporatorrefrigerationsystem.ComputChemEng1997;21S:S349–54. ConstantinouL,GaniR.Newgroupcontributionmethodforestimatingproperties

ofpurecompounds.AIChEJ1994;40:1697–710.

ConstantinouL,BagherpourK,GaniR,KleinJA,WuDT.Computeraidedproduct design:problemformulations,methodologyandapplications.ComputChem Eng1996;20:685–702.

ConteE.InnovationinIntegratedChemicalproduct-processDesign:Development throughaModel-basedSystemsApproach.TechnicalUniversityofDenmark (DTU);2010,PhDThesis.

ConteE,GaniR,NgKM.Designofformulatedproducts:asystematicmethodology. AIChEJ2011;57:2431–49.

ConteE,GaniR.Chemicals-basedformulationdesign.ComputAidedChemEng 2011;29:1588–92.

ConteE,MartinhoA,MatosHA,GaniR.Combinedgroup-contributionatom connec-tivityindex-basedmethodsforestimationofsurfacetensionandviscosity.Ind EngChemRes2008;47:7940–54.

CostaR,MoggridgeGD,SaraivaPM.Chemicalproductengineering:anemerging paradigmwithinchemicalengineering.AIChEJ2006;52:1976–86.

DelCastilloE,MontgomeryDC,McCarvilleDR.Modifieddesirabilityfunctionsfor multipleresponseoptimization.JQualTechnol1996;28:337–45.

Duvedi AP, Achenie LEK. On the design of environmentally benign refriger-antmixtures:a mathematical programmingapproach.Comput Chem Eng 1997;21:915–23.

ECHA. REACH marketfinal report; 2012, Available from: http://ec.europa.eu/ enterprise/sectors/chemicals/files/reach/review2012/market-final-reporten. pdf[lastaccessedFebruary2014].

EU report. European UnionRisk Assessmentreport, alkanes, C10-13, chloro. EuropeanChemicalsBureau;2008,http://echa.europa.eu/documents/10162/ 6434698/oratsaddendumalkanesc10-13chloroen.pdf [last accessed September2013].

GalanakisCM,GoulasV,TsakonaS,ManganarisGA,GekasV.Aknowledgebase fortherecoveryofnaturalphenolswithdifferentsolvents.IntJFoodProp 2013;16:382–96.

GallenosSA.Mini-reviewonchemicalsimilarityandpredictionoftoxicity.Curr ComputAidedDrugDes2006;2:105–22.

GaniR,FredenslundA.Computeraidedmolecularandmixturedesignwithspecified propertyconstraints.FluidPhaseEquilib1993;82:39–46.

GaniR,NielsenB,FredenslundA.Agroupcontributionapproachtocomputer-aided moleculardesign.AIChEJ1991;37:1318–32.

GaniR.Chemicalproductdesign:challengesandopportunities.ComputChemEng 2004;28:2441–57.

GaniR,HarperPM,HostrupM.Automaticcreationofmissinggroupsthrough con-nectivityindexforpure-componentpropertyprediction.IndEngChemRes 2005;44:7262–9.

HarperPM,GaniR,KolarP,IshikawaT.Computer-aidedmoleculardesignwith combinedmolecularmodeling andgroupcontribution.FluidPhaseEquilib 1999;158–160:337–47.

HeintzJ,ToucheI,TelesDosSantosM,GerbaudV.Anintegratedframeworkfor productformulationbycomputeraidedmixturedesign.ComputAidedChem Eng2012;30:702–6.

Heintz J, Belaud JP, Gerbaud V. Chemical enterprise model and decision-making framework for sustainable chemical product design. Comput Ind 2014;65:505–20.

HerringRHIII,EdenMR.Denovomoleculardesignusingagraph-basedgenetic algorithmapproach.ComputAidedChemEng2014;33:7–12.

HSPiP3.1.http://hansen-solubility.com/index.php;2010.

Hukkerikar AS,SarupB,TenKateA,Abildskov J,SinG, GaniR,etal. Group-contribution+ (GC+) based estimation of properties of pure components: improvedpropertyestimationanduncertaintyanalysis.FluidPhaseEquilib 2012;321:25–43.

Joback R, Reid RC. Estimation of pure-component properties from group-contributions.ChemEngCommun1987;57:233–43.

Karelson M, Lobanov VS, Katrinsky AR. Quantum-chemical descriptors in QSAR/QSPRstudies.ChemRev1996;96:1027–43.

KarunanithiAT,AchenieLEK,GaniR.Anewdecomposition-basedcomputer-aided molecular/mixturedesignmethodologyforthedesignofoptimalsolventsand solventmixtures.IndEngChemRes2005;44:4785–97.

KleinJA,WuDT,GaniR.Computeraidedmixturedesignwithspecifiedproperty constraints.ComputChemEng1992;16S:S229–36.

KorichiM,GerbaudV,FloquetP,MeniaiAH,NacefS,JouliaX.Computeraidedaroma designI–molecularknowledgeframework.ChemEngProc:ProcessIntensif 2008;47:1902–11.

KrollP,KruchtenP.Therationalunifiedprocessmadeeasy:apractitioner’sguideto theRUP.1sted.Boston:AddisonWesley;2003.

LinB,ChavaliS,CamardaK,MillerDC.Computer-aidedmoleculardesignusingTabu search.ComputChemEng2005;29:337–47.

MarreroJ,GaniR.Group-contributionbasedestimationofpurecomponent proper-ties.FluidPhaseEquilib2001;183–184:183–208.

MarreroJ,GaniR.Group-contribution-basedestimationofoctanol/water parti-tion coefficientandaqueous solubility.Ind EngChem Res2002;41:6623– 33.

MartinTM,YoungDM.Predictionoftheacutetoxicity(96-hLC50)oforganic com-poundstothefatheadminnow(Pimephalespromelas)usingagroupcontribution method.ChemResToxicol2001;14:1378–85.

MatteiM,HillM,KontogeorgisGM,GaniR.Acomprehensiveframeworkfor sur-factantselectionanddesignforemulsionbasedchemicalproductdesign.Fluid PhaseEquilib2014a;362:288–99.

MatteiM,YunusNA,KalakulS,KontogeorgisGM,WoodleyJM,GernaeyKV,etal. Thevirtualproduct-processdesignlaboratoryforstructuredchemicalproduct designandanalysis.ComputAidedChemEng2014b;33:61–6.

NannoolalY,RareyJ,RamjugernathD,CordesW.Estimationofpurecomponent properties:Part1.Estimationofthenormalboilingpointofnon-electrolyte organiccompoundsviagroupcontributionsandgroupinteractions.FluidPhase Equilib2004;226:45–63.

NannoolalY,RareyJ,RamjugernathD.Estimationofpurecomponentproperties: Part2.Estimationofcriticalpropertydatabygroupcontribution.FluidPhase Equilib2007;252:1–27.

NgLY,ChemmangattuvalappilNG,NgDKS.Optimalchemicalproductdesignvia fuzzyoptimisationbasedinversedesigntechniques.ComputAidedChemEng 2014;33:325–9.

GarnierE,BliardC,NiedduM.Theemergenceofdoublygreenchemistry,anarrative approach.EurRevIndEconPolicy2012;4:1.

OuriqueJE,SilvaTellesA.Computer-aidedmoleculardesignwithsimulated anneal-ingandmoleculargraphs.ComputChemEng1998;22S:S615–8.

PanteliE,SaratsiotiP,StamatisH,VoutsasE.Solubilitiesofcinnamicacidestersin organicsolvents.JChemEngData2010;55:745–9.

PapadopoulosAI,StijepovicM,LinkeP,SeferlisP,VoutetakisS.Moleculardesign ofworkingfluidmixturesfororganicRankinecycles.ComputAidedChemEng 2013;32:289–94.

(16)

PrausnitzJW,LichtenthalerRN,deAzevedoGE.Molecularthermodynamicsoffluid phaseequilibria.3rded.NewYork:Prentice-Hall;1998.

REACH.Regulationtext, corrigendumandamendments;2006,Availablefrom: http://ec.europa.eu/enterprise/sectors/chemicals/reach/indexen.htm RiceP,TejaAS.Ageneralizedcorresponding-statesmethodfortheprediction

ofsurfacetensionofpureliquidsandliquidmixtures.JColloidInterfaceSci 1982;86:158–63.

Riedel L. Eineneue universelleDampfdruckformel Untersuchungen übereine ErweiterungdesTheoremsderübereinstimmendenZustände.TeilI.ChemIng Tec1954;26:83–9.

SinhaM,AchenieLEK.CAMDinsolventmixturedesign.In:AchenieLEK,GaniR, VenkatasubramanianV,editors.Computeraidedmoleculardesign:theoryand practice.Amsterdam:Elsevier;2003.p.261–87[Chapter11].

Solvason CC, Chemmangattuvalappil NG,Eden MR. Asystematic method for integratingproductattributesandmolecularsynthesis.Comput ChemEng 2009;33(5):977–91.

SovaM.Antioxidantandantimicrobialactivitiesofcinnamicacidderivatives. Mini-RevMedChem2012;12:749–67.

TamuraM,KurataM.Ontheviscosityofbinarymixtureofliquids.BullChemSoc Jpn1952;25(1):32–8.

VaidyanathanR,El-HalwagiM.Computer-aidedsynthesisofpolymersandblends withtargetproperties.IndEngChemRes1996;35:627–34.

VeithGD,KonasewichDE.Structure–activitycorrelationsinstudiesoftoxicityand bioconcentrationwithaquaticorganisms.Windsor,ON:GreatLakesResearch AdvisoryBoard;1975.

VenkatasubramanianV,ChanK,CaruthersJM.Computer-aidedmoleculardesign usinggeneticalgorithms.ComputChemEng1994;18:833–44.

VOC.VOCSolventsEmissionsDirective(Directive1999/13/EC)amendedthrough article13ofthePaintsDirective(Directive2004/42/EC);2004.

Weis DC, Visco DP. Computer-aided molecular design using the signature moleculardescriptor: application to solvent selection. Comput Chem Eng 2010;34:1018–29.

YunusNA,GernaeyKV,WoodleyJ,GaniR.Asystematicmethodologyfordesignof tailor-madeblendedproducts.ComputChemEng2014;66:201–13.

Figure

Fig. 2. Molecular graph representation of 3-hydroxybutanone as three fragments.
Fig. 3. Activity diagram of the creation of a free fragment object.
Fig. 4. Molecule modification operators.
Fig. 7 describes the use case diagram of the ‘launch a search’ functionality. It gives access to the definition of the problem data through three data subsets and describes actions available to the user.
+6

Références

Documents relatifs