• Aucun résultat trouvé

Data-driven modeling and learning in science and engineering

N/A
N/A
Protected

Academic year: 2021

Partager "Data-driven modeling and learning in science and engineering"

Copied!
12
0
0

Texte intégral

(1)

Science Arts & Métiers (SAM)

is an open access repository that collects the work of Arts et Métiers Institute of

Technology researchers and makes it freely available over the web where possible.

This is an author-deposited version published in:

https://sam.ensam.eu

Handle ID: .

http://hdl.handle.net/10985/19480

To cite this version :

Francisco J. MONTÁNS, Francisco CHINESTA, Rafael GÓMEZBOMBARELLI, J. Nathan KUTZ

Datadriven modeling and learning in science and engineering Comptes Rendus Mécanique

-Vol. 347, n°11, p.845-855 - 2019

Any correspondence concerning this service should be sent to the repository

(2)

Data-Based Engineering Science and Technology / Sciences et technologies de l’ingénierie basées sur les données

Data-driven

modeling

and

learning

in

science

and

engineering

Francisco

J.

Montáns

a,

,

Francisco

Chinesta

b

,

Rafael

Gómez-Bombarelli

c

,

J.

Nathan Kutz

d

aEscuelaTécnicaSuperiordeIngenierosAeronáuticos,UniversidadPolitécnicadeMadrid,PlazaCardenalCisneros3,28045Madrid,Spain bESIGroupChair@PIMM,ArtsetMétiersParisTech,151,boulevarddel’Hôpital,75013Paris,France

cDepartmentofMaterialsScienceandEngineering,MassachusettsInstituteofTechnology,77MassachusettsAvenue,Cambridge, MA02139,USA

d DepartmentofAppliedMathematics,UniversityofWashington,Seattle,WA98195,USA

a

b

s

t

r

a

c

t

Keywords: Data-drivenscience Data-drivenmodeling Artificialintelligence Machinelearning Data-science Bigdata

In thepast, data inwhichscience and engineering isbased, was scarceand frequently obtained by experiments proposed to verify a given hypothesis. Each experiment was able toyield onlyvery limiteddata. Today, data is abundant and abundantly collected in each single experiment at a very small cost. Data-driven modeling and scientific discoveryisachangeofparadigmonhowmanyproblems,bothinscienceandengineering, are addressed. Some scientific fields have been using artificial intelligence for some time due to the inherent difficulty in obtaining laws and equations to describe some phenomena.However,todaydata-drivenapproachesarealsofloodingfieldslikemechanics and materials science, where thetraditional approach seemed tobe highly satisfactory. In this paper we review the application of data-driven modeling and model learning procedurestodifferentfieldsinscienceandengineering.

1. Introduction

The walk of humankind inlife is a complex journey of learning through observation andexperiencing ofthe world, fromwhichwecollectpropertydata,sometimeseasilyquantifiable,sometimesmorequalitative.Also,fromobservation,we relateeventstothatpropertydata.Itisfromrepetitiveexperiencesfromwhichwecandeterminesomepatternsthatrelate eventstodata,andeventstoeventsthemselves.Inthecaseofsciencediscovery,thesepatternsandrelationsareformalized inlaws andequations,the dataare formalized in propertiesandvariables, andtheobservations are formalized inevent measurements,whichmaybeactionsorpropertiesthemselves.Lawsandequations,typicalinscience,allowustoperform predictionsandfacilitatethetransmissionofthelearningprocedureinaverycompactmanner,withtheminimumamount of information.However, the classical process of learning in science is a slow process which needs much observational experience,usuallyfromexpensiveproposedexperiments,astodiscoverthemainvariablesinvolvedandtheirinfluenceon eventsforaprobablyhugeamountofpossiblecombinations,missingfrequentlyunforeseenrelevantvariables.Furthermore, theclassical scientific approachishypotheses-driven andhenceisbiasedby them.Thescientific methodwas established

*

Correspondingauthor.

E-mailaddresses:Fco.Montans@upm.es(F.J. Montáns),Francisco.Chinesta@ensam.eu(F. Chinesta),rafagb@mit.edu(R. Gómez-Bombarelli),kutz@uw.edu

(3)

because ofthenaturalbiasesandweaknessesofthe humanmind,includingthenaturalbiashumanshavewhen seeking metaphysicalexplanations whicharenotbasedonrealobservations([1] and[2],citingFrancisBaconinNovumOrganum Scientiarum, AD 1620). However, the classical scientific method is still biased by the deductive thinking of the human mind.Data-drivenproceduresseek,ifpossible,anunbiasedimplicitapproachtoourlearningexperiencebasedonrawdata fromrealobservations.Theseprocedures havethe additionaladvantage oftestingcorrelationsbetweendifferentvariables and observations, learning unforeseen patternsin nature and allowing us to discover newscientific lawsor even more, performingpredictionswithouttheavailabilityofsuchlaws.Wearelivingtheeraofdatascience,whichimpactsallaspects of life [3]. Data-driven procedures are givingrise to a neweconomy. Data-based science will also changeour lives and howwedoscience.Datacollection,data-mining[4] anddata-visualizationwillalsobeofparamountimportanceinscience discovery.Data-drivenscientificdiscoveryisconsideredthefourthparadigm[5],sofoundingagencieshavebeguntoinvest significantlyindata-drivenscience(TheEconomist,Briefing5/6/2017).Forexample,NationalScienceFoundationoftheUSA grantedin2014$31millioninawardstolaythegroundworkindata-science(NSFNewsRelease14-132).Thepurposeofthis briefreviewoftherelativelynovelfieldofdata-drivenmodelinginscienceandengineering, istogiveascentofdifferent approachesandapplicationsinseveralscientificandengineeringfieldsofdata-drivenmodeling.Therefore,samplingsome approachesandapplicationsisourpurpose;nointentionismadetoincludeallpossibleworks,applicationsorprocedures, butjusttogiveabigpictureofsomepathsfollowedindifferentfields.

2. Data-drivendiscoveryinscience

2.1. Computerpowerfacilitatescomputer-baseddiscovery

Theever-increasingriseofcomputationalpowerinthepastfewdecadeshasledtosignificantadvancesinstatisticaland machinelearning techniques[6].Thecollectionofalgorithmsanddataminingmethodsdevelopedasaresulthaveformed the core mathematical architecture of artificial intelligence (AI) agents, and although AI has a long history in scientific discovery [7],data-drivenapproachesinmoderncomputerscannowingestandprocessalgorithmsatscale.Thishasbeen largelyenabledbytheplummetingcostsofsensors,computationalpower,anddatastoragetechnologies.Indeed,suchvast quantitiesofdataaffordusnewopportunitiesfordata-drivendiscovery,which,asmentioned,hasbeenreferredtoasthe 4thparadigmofscience[5].

In mostapplicationfields inthe engineering, physical andbiological sciences, physicalmodels are expressedas aset ofgoverningconstitutiverelations,spatio-temporalrelations,and/ordynamicalsystems.Data-drivendiscovery inthese ap-plication areasare specificallyconstructedtodiscoverconstitutiverelationsanddifferentialgoverningequationsinawide variety offields,[8], conservationlaws[9] or propagationofnonlinear waves[10], spatio-temporaldynamics [11], devel-opmentofpredictiveproceduresformoleculardynamicsfornanoscaleflow[12],andhealthmonitoring[13].Inparticular, data-driventechniquesmaybeextremelyimportantincomplexareasoflifesciences[14],allowingforunveilingunknown biologicalmechanisms[15].Thus,thereareincreasinglyfundinginitiativestoobtainnewmethods,softwaretoolsand train-ing within the frameworkofBig Data analysisinhealth [13].In thefield ofdata-driven modelingin agro-environmental science,anoverviewmaybefoundin[16] andthereinreferences.

From the Schrödinger equationof quantummechanics toMaxwell’s equationsforelectromagneticpropagation, know-ing thegoverninglawshasallowed fortransformative technologicalimpactinsociety(e.g.,smartphones,internet,lasers, satellites). And just asNewton builtupon thework of Keplerand others,proposing the existence ofgravity in order to derive F

=

ma andexplainKepler’sellipticorbits,thediscoveryofafundamentalgoverninglawiscriticalfortechnological development,enablingunprecedentedengineeringandscientificprogress,suchassendingarockettothemoon.

2.2. Algorithmsfordata-drivenmodeling,discoveryoflawsandlearningphysicalconstraints

Therecentandrapidincreaseintheavailabilityofmeasurementdataofphysicalsystemshasspurredthedevelopment of many data-driven methods for modeling and predicting dynamics. At the forefront of data-driven methods are deep neuralnetworks(DNNs).DNNsnotonlyachievesuperiorperformancefortaskssuchasimage classification,buttheyhave alsobeenshowntobeeffectiveforfuturestatepredictionofdynamicalsystems[10].AkeylimitationofDNNs,andsimilar data-driven methods, is the lack of interpretability of the resulting model: they are focused on prediction and do not providegoverningequationsorclearlyinterpretablemodelsintermsoftheoriginalvariableset.Analternativedata-driven approach usessymbolicregressiontoidentifydirectly thestructureofa nonlineardynamicalsystemfromdata[17]. This worksremarkablywellfordiscoveringinterpretablephysicalmodels,butsymbolicregressioniscomputationallyexpensive and can be difficult to scale to large problems. However, the discovery process can be reformulated in terms ofsparse regression [8], providing a computationally tractable alternative, thus leveraging the power of symbolic regression with computational tractability. Thesecontrasting techniquesshow the diversity ofstrategies that can be employed toextract meaningfulphysicsfromdata.Theyalsohighlightthefactthatmachinelearningandartificialintelligencealgorithmsmaybe capableoflearningphysicsprinciplesandconstraints[9].Usingmodernsparseregressionarchitecturesandneuralnetworks, severalcriticaltasksmaybe enactedfromdataalone:(i) thediscovery offirstprinciplesmodels,(ii)theidentificationof physical constraints and conservation laws, and(iii) improved models using known physics. A diversity of architectures

(4)

allowsonetoalsodevelopblackboxandgraybox1 modelingstrategiesforcomplexsystemswherephysicsisonlypartially known.Additionally,thearchitecturenotonlyallowsonetoimposephysicsconstraints,orbake-inphysics,butitcanalso discoverphysicalconstraints thatneed tobe baked-in,i.e.one can constrainlearning andonecan learnconstraints [18]. Thus,notonlymayparsimoniousandinterpretablephysicalmodelsbediscoveredasadirectresultofsuchstrategies,but critical insightssuch asconservationlaws andphysical constraintscould also be discovered. Theseinnovations have the potentialtodiscovergeneralizablemodelswhichcanbemodifiedtohandlemulti-scalephysics,noisysystems,andlimited data.

3. Data-drivenmodelinginmechanicalengineeringandmaterialsscience 3.1. Thechangeofparadigminsolidmechanics

Whereasdata-driven(big-data)applicationshavebeenextensivelyusedinmanyfieldsformorethanadecade,thistype ofapproachhasattractedtheattentiononlyrecentlytoresearchersinthefield ofmodelingofsolids. Oneofthereasons forthisisthattraditionally,themechanicsofsolidshasfollowedaquitesuccessfuldeterministicapproachinwhich,with relativelylittleavailableexperimentaldata,relativelymeaningfulpredictionswhereobtainedingeneral,complexsituations. Furthermore,information about thebehavior of a material has been traditionallypassed to thecommunity through the specificationoffewmaterialparametersforaspecificconstitutivemodel.However,thedata-drivenchangeofparadigmhas reachedalsothesolid mechanicscommunity.Theseare mostprobablythemain reasons.(1)Currentlythecomputational power islarge, so the analysisof nonlinearsolids isroutinely being performedin industry.This hasfostered interestin simulatingmorecomplexmaterials,whichatthesametimearegeneratedandoptimizedbymaterialscientiststoachieve somedesiredproperties.(2)Thevarietyfoundinbiologicalmaterials,includinglivingmaterials,alongwiththedifficultyof theircharacterizationbecauseofthedifferentstructurefromspecimentospecimen, fromlocationtolocation,andbecause of aging andtime, alsoprompted the search for modeling tools that are not based on a specific modeling structure or function,butthatcanrepresentalargervarietyofmaterials,conceptuallysimilarbutwithawidespanofpossiblebehaviors. Withinthisfamilyofapproaches, areconstitutive manifold approaches.(3)Currently thereisa large amountofavailable material data formanytypes ofmaterials, so there is a need forunstructured modelingthat iscapable forassimilating thesedata,possiblyobtainedfromdiversetypesoftestingorobservationsunderdifferentconditions.(4)Therearecurrently successfulmodelorder reductiontechniques whichreduce the curseof dimensionality,allowing ustodevelop a modern version ofthe sliderule,precomputingoff-line theproblemformanypossibilities andsaving areducedrepresentationof them,soinformationcanbeeasilypassedoverandusedtorebuildspecificsolutionsatagiventime.

3.2. Constitutivemanifoldsandreducedrepresentationsformultiscaleanalysis

Oneofthemainproblemsaddressedcurrentlybydata-drivenmodelsisthemultiscaleanalysisofheterogeneous mate-rials, see[19] andthe morerecentworks[20–25].Forthecaseofsoils, multi-field,multi-scaleporoplasticitydata-driven modeling usingrecursive homogenizations and deep learning hasbeen performedby Wangand Sun[26]. Nonlinear hy-perelastic problems havebeen alsoaddressed in[27] through constitutivemaps. Here, one ofthe main purposes ofthe data-drivenprocedures istovirtually testmicrostructured solids, probablywithvery complexmicrostructures,andto de-velopaconstitutive manifold,seeFig.1.Thismanifold iscomputedoff-line, oftenthroughreducedsamplingandreduced representation(Fig.1),forexamplehyper-reduction[28,29] andProperGeneralizedDecompositions(PGD)[20,30,31].PGD approximationshavealsobeen usedcombinedwiththe LATINapproachformultiscaleanalysis[32].In[33],PGDis effi-cientlyconsideredfordata-assimilation.Onceanoff-linerepresentationisobtained,duringanalysisatthecontinuumlevel, a material behaviorrepresentation closest tothe precomputedmanifold is searched.The advantage isthe general repre-sentation ofmaterial behavior andmoderateonlinecomputational effort. Tothisregard, clustering techniqueshavebeen presentedasatoolforavoidingthecurseofdimensionality[34,35].Clusteringhasbeenusedforlongtimeindatadriven techniques,see,e.g., [36].In[37,38],seealso[39],numericallyexplicitsplinepotentials(NEXP)areproposed torepresent thehyperelasticbehaviorofmultiscalematerials,andthecoefficientsarecomputedsamplingthematerialinastrainspace. The advantage isthat an analytical differentiable function is available andconstitutive tangents forNewton schemes are easilycomputed.

Within the framework of solid mechanics, there are other applications where data-driven approaches are increas-inglypursued. Forexample,in[40],within thecontextofIntegratedComputationalMaterialsScience Engineering(ICME), reduced-orderdata-drivenmodelingisemployedforassessingthehighcyclefatigueperformanceofpolycrystallinealpha-Ti structures. Fatigueproblems arealso analyzed in [41] and alsoin[24], wherea data-driven approachis usedto identify the smallfatigue crackdriving force. A review ofdata science inmaterials science is givenin [42]. Diffusion inrandom heterogeneousmedia is studiedin[43]. Areview ofdata-driven techniques,classificationofvariables andmaterial prop-erties,andspeciallyofmachinelearninginmaterials informatics,isgivenin[44].In[45],modelreductiontechniquesare

1 Incontrasttotheoretical(fullydeterminedfromphysics,nodataneeded)andblack-boxmodels(fullydeterminedfromdata,nophysicalunderstanding

needed),a“graybox”modelisamodelemployingsomephysicsorbasicunderstanding,butwhichstillneedstobecalibratedfrom,orcompletedwith, data.

(5)

Fig. 1. Data-drivenapproachesinsolidmechanics.Macro(left):variablesandpossibleparametersaremacroscopic;data-drivenproceduresdeterminethe constitutive/designmanifoldsfromobservedmacroscopicbehavior.Micro-macro(center):amodelforthemicroscopicbehaviorisused,withmaterial parametersmeaningfulatthemicroscopicscale.Massivesimulationsareperformedtodevelopeithermicroormacroconstitutivemanifoldsasafunction ofparametersatthemicroscale.Reducedrepresentationsmaybeusedatanylevel.Macro-micro-macro(right):Minimalmicroscopicinformationisused (e.g.,thematerialstructure),assumedconstitutivelawsareavoided.Therawkinematicmicrostructuraldependenciesarepushedtothecontinuumscale, carryingthemicrostructuralvariables.Compliantmacro-microbehaviorisobtainedatthecontinuumlevel,solvingbothbehaviorsatonce.

alsoemployedforviscoplasticanalyses.Complexrelationsbetweenvariablesofprocessesandstructurerelationsinadditive manufacturingarealsoobtainedthroughamultiscale,multiphysicsapproachin[31].Adifferentapplicationisgivenin[22], where the nonlinearanisotropic electrical response of graphene/polymernanocomposites isstudied employing computa-tionalhomogenization basedonneuralnetworks.Neuralnetworkshavealsobeenusedinthecharacterizationofmaterials withaflexibleanalyticalmodelinthebackground(e.g.,[46],[47]).Thissamegroupusedgeneticalgorithmsanddata-based metamodelsoffrictionlaws(frictionresponsesurfaces)inthedesignoftirestooptimizetheirwearperformance[48],[49]. 3.3. Continuumapproachandrepresentationfunctions

As mentioned,data-drivenapproachesinsolidmechanicsentail usuallyalargenumberoffiniteelementcomputations to determinethebehaviorofthematerialforarepresentativeamountofcombinationsandloadingdomain,typicallytens [50] orhundredsofthousands[38] ofanalyses.Then,itisimportanttoselectaproperreducedsamplingandrepresentation approach [21]. Somehowin thislineisthecubic splinerepresentationforhyperelasticmaterials [51,52],which hasbeen extended toanisotropicmaterials [53,54] anddamage [55].Thistypeofprocedurehasalsobeenemployedinlargestrain nonequilibrium(strain-leveldependent)viscoelasticity[56,57].Data-drivenapproachesarealsousedinsolidmechanicsfor different purposes.For example,to be ableto determine thebehavior ofa material fromobserved deformation patterns of structures [58–60]. The idea isto fully substitute the material lawsby constitutive manifolds[61,62] preserving only the conservationlaws.Adata-drivenformulationfullyconsistent withthermodynamicswasproposed in[33,63]. The ini-tial smallstrain deterministicapproach in[58] hasalso beenextendedto noisy dataanddynamics[64], whereShannon entropy-maximizingschemes,frequentlyusedinimageprocessing[65] areemployed.Similarproblemsareaddressedinthe animation industry,whererealistic simulationsofthedeformationsofclothforrealistic visualperceptionarealsopursued [66].Ahybridapproachcombiningfirstordermodelswithdata-basedenrichmentwasaddressedin[67].

3.4. Bio-inspiredmodelsanddata-drivenmodelinginbiomechanics

Inthefieldofbiomechanics,andspeciallyincharacterizingsofttissues,cellsandtheirbehavior,data-drivenapproaches lookpromising,becauseadeepknowledgethatmaybringtraditionallaws,andevenrelationsbetweenvariables,islacking. Forexample,in[68] data-driven reduced-ordermodelsfromatomisticsimulationsare employedtodevelopamicrotubule model forcells. Theinterest indata-driven approachesforbiomechanicsishighlighted in[69]. Withinthecontext of bi-ologicalsystems,a polynomial orderheuristicalgorithm isdevelopedin[70] with thepurposeofinferring thegoverning

(6)

behaviorofdynamicalsystems.Areviewofdata-drivenmodelingofbiologicalprocesses,atdifferentscalesandfrom differ-entperspectives,isgivenin[71].Biological systemsalsoinspiredata-drivenapproachesastheso-calledartificialimmune systems (see,e.g., [72–74]).Thisis an adaptative computationalsystemwhich replicates some propertiesofthe immune systemaserrortolerance,redundancy anddiversityofsystems,distributionoftasks,dynamic learning,systemadaptation and, specially,self-monitoring.Thistechnique hasbeenused,forexample,by[75,76] for damagedetectionincomposites andinStructuralHealthMonitoring(SHM).InSHM,thecombinationofphysics-basedassessmentandofdata-driven proce-duresindamageidentificationcanbefoundin[77].Previousdata-drivenapproaches, asstochasticsubspaceidentification, werealsousedforfiniteelementmodelupdatingindamageevolutionin[78].

3.5. Physics- andstructure-baseddata-drivenapproaches

The difficulties in considering all aspects of engineering modeling and science with data-driven procedures, andthe interestintakingadvantageofthelearningsfromtheclassicalapproach,iscurrentlymotivatingamixedapproachinwhich data-drivenmodelingisguidedby somephysicalinsight[79].The purposeofdevelopingmixedapproachesisto improve thereliabilityoftheobtainedrelationsthroughfundamentalprinciples,likeconservationlaws(e.g.,energyconservationand maximumentropy).

Theusualapproaches,explainedinthepreviousparagraphs,areeithermacroormicro-macroapproaches, seeFig.1.In theformer,alltheprocedureisperformedandthecontinuumscale.Inmicro-macro(structure-based)approaches, constitu-tiverelationsaredefinedatthemicroscaleasafunctionofmaterialparameters.Inamacro-micro-macroapproach(Fig.1), themicroscaleonlydefinessomekinematicrelationsbetweenmicromechanicalvariables.Theserelationsarepushedtothe continuum scalerelatingthemto macroscopicones. Atthecontinuum level,adata-drivenmethodisapplieddetermining thebehavioratbothscales[80].

4. Data-drivenproceduresinotherengineeringfields

Apartfromthealreadymentionedengineeringapplications,data-drivenproceduresare beingusedinalargevarietyof otherengineeringfields.Inthissectionwejustsamplerepresentativeapplicationsindifferenttopics.

4.1. Industrialprocessing

Data-driven techniqueshavebeenused forsome yearsinindustrialprocessingbothformonitoringandforpredicting. A review of thesetechniques maybe found in[81]. Data-driven techniques combinedwith physically-based models are alsopresentinvirtual,digitalandhybridtwinsasreportedin[82].Data-drivenmodelingoftheproductionprocessesinthe automotiveindustrycanbefoundin[83].Intheproductionofbiofuels,createdbymicroorganisms,wherethereisaneedto engineerthemicroorganism’smetabolism,theoptimizationofthehostandthepathwaysastomaximizetheproductionof thefuelisperformedbydata-drivenapproachesin[84].Softsensorsarecomputer-basedvirtualsensorsgivinginformation aboutaprocess.Theyareused,forexample,innewautomobilestogiveremainingfuelreading,avoidingoscillationsofthe gauge.Anearlyreviewofdata-drivensoftsensorsinthechemicalproductionindustryarereviewedin[85].Toimprovethe duration ofproducts andtoavoidproblems duetostochastic variations inbatch processes,asubspace-aided data-driven approach isproposed in[86] and appliedto fed-batchpenicillinproduction.Physics-baseddata-driven modelingisbeing proposedinproductionengineeringandincontrol,e.g.,tocontrolheating/coolingsystemsinbuildings[87].

4.2. Gasandoilindustry

Of course, the oil and gas industry, as well as earth scientists and engineers, have also benefit from data-driven procedures inallaspects ofmodelingsoilbehavior, explorationandproduction,includingseismicanalysis, reservoir char-acterization,managementandproduction.Thebook[88] givesasummaryofdata-driventechniquesusedinthefield.Data mining,patternrecognitionandmachinelearningalgorithmsareusedin[89] forfullreservoirmodelingofshaleassetsfor hydrocarbonproduction.Mixedphysics-based,data-drivenapproachesarealsostartingtobeproposedinthisfield,see,e.g., [90–92].Inthecaseofearthquakeengineering, Songetal. [93] givedata-drivencomputercodes foraccurately predicting theperformanceofbuildingsfromrawdatabases.

4.3. Fluidandparticledynamics

Data-driven proceduresare beingincreasingly usedinfluid dynamics,especially toimproveaccuracyofsimulationsin turbulencewhenusingReynoldsAveragedNavierStokes(RANS)modeling.Duraisamy[94] developedadata-drivenapproach tomodelturbulentandtranslationalflows.Fromtheir data-drivenprocedure,applyinginverseproblems,theyinferredthe functionalformofdeficienciesinknownclosuremodelsandthenimprovedthemusingmachinelearningtoobtainaccurate predictions;seealso[95].Data-drivenapproaches(convolutionalnetworks)arealsousedin[96] toaccelerateNavier-Stokes simulations.Data-drivenmodelingtoimprovethepredictionoftheReynoldsstressanisotropyintheshearlayerof jet-in-crossflowsimulationsisalsoused[97].In[98],a reviewofdata-driventechniquestomodelturbulence,withemphasisin

(7)

Fig. 2. Thestepstodrug-discovery.Artificialintelligenceanddata-drivenproceduresusingmultipledatasets/databasesmaysubstantiallyacceleratethefirst steps,inwhichmoleculesareselected,combined,improvedandrefined,savingimportantfunds.

reducinguncertaintiesinRANSmodelsisgiven.Data-drivendimensionreductionproceduresappliedtodynamicalsystems, bothformodaldecompositionsandfortransferfunctions,arestudiedin[99] amongothers.Reducedorderrepresentations based on dynamic mode decomposition methods [100] combined with proper orthogonal decompositions are also em-ployedinnavalhullshapeoptimizationforimproveddragandliftproperties[101].Threedifferentmethodstolearnslosh dynamics andadvancingthesolution intime havealsobeen studiedin[102], namelyProperOrthogonal Decomposition, LocallyLinearEmbeddingandTopologicalDataAnalysis.Thesemethodsprovidedsolutionsfasterthanrealtimeinalaptop computer.

A different applicationespecially well-suited for data-driven procedures is the simulation ofcrowds. The behavior of crowdsandindividualswithina crowdhasbeenfrequentlymodeledasfluidsandparticles withinit,butitisrecognized that actual crowdsfollowcomplexlawsbecausethey reactto externalandinternal stimuli. Theworks[103,104] develop an agent-based data-driven simulation procedure to simulatecrowds in computer graphics simulations. The movements of thecrowdshavebeen obtainedfromaerial recordings,fromwhichthe behaviorofindividualsobtainedfromthose of surroundingoneswasdevised.Asimilarwork,butaimedatemergencyplans,ispresentedin[105].

4.4. Bioengineering

Bioengineeringisanotherfield inwhichdata-drivenalgorithmsmayfindextraordinaryapplications.Forexample, phys-iologicalvariablesasbloodglucoseandhormonessuchasinsulin,cortisol,epinephrine,glucagon,andtheirdynamic inter-relationsdeterminethemetabolicconditionsinpatientswithdiabetes.Data-drivenmodelsfordiagnosis,glucoseprediction ininsulin-dependentpatientsandtreatmentmanagementmaybefound,forexample,inthebookeditedbyMarmarelisand Mitsis [106].In[107], aninversedata-drivenregressionprocedureisdevelopedtocompute thecardiacelectricdiffusivity fromelectrocardiogramsignals. Also,inmedicalimaging,[108] employnon-parametricdensityestimationandedge confi-dencemapsinthesegmentation ofbrainimagesobtainedfrommagneticresonanceimaging. Ofcourse,data-driventools asanaidtomedicaldiagnosisanddecisionshavebeenusedformorethantwodecades[109].Examplesofapplicationsare [110] forbreastcancerand[111] forcoronaryheartdisease.AreviewofBigDataapplicationsinbiomedicalresearchmay befoundin[112].

5. Data-drivenchemistryanddrugdiscovery 5.1. Therelevanceofdata-drivenapproachesinchemistry

Thediscoveryofnewchemicalcompoundssuchassmallmoleculedrugs,andtheassignmentofnewapplicationlabels to existing ones are very complex processes. Usually, the lack of deterministic approaches to predict performance from structureandthecomplexityofcarryingoutdiscreteoptimizationoverchemicalgraphsresultinverycostlytrial-and-error teststoarriveataproductwiththedesiredperformance.Theprocedureofdrugdiscoverytypicallyfollowsdifferentstages (seeFig.2):(1)targetvalidation,(2)primaryandsecondaryassaydevelopment(high-throughputscreening), (3)hittolead compound,(4)leadoptimization,(5)preclinicaldrugdevelopmentand(6)clinicaldrugdevelopment[113].

Theavailabilityofcuratedlabeledandunlabeleddatainthechemicalsciencesisverylargecomparedwithsomeother physicalsciences.Sinceitsoriginsmanydecadesago,theChemicalAbstractsServicehascompiledalistofover144million knownsubstances,andabout67millionproteinandDNAsequences(www.cas.org).Morethan15,000substancesareadded each day. Thesize ofpotential chemical space, however,isoverwhelmingly largerthan ourability toexplore itby hand. Differentestimatesforthenumberofchemicallyaccessiblemoleculesrangefrom1,030to10,100.Therefore,thenumberof possiblecombinationstoexploreisverylarge.Thespeedatwhichdataisbeinggeneratedishigherthanthespeedatwhich

(8)

wecananalyzethem.Inthisregard,data-driventechniquesareimportant,andtheyareincreasinglybeingusedinguessing ornarrowingthesearchforcompoundswithgivendesiredcharacteristics[114].Thisisespeciallyimportantbecauseeven thoughthenumberofpossibletargetshasbeenincreasing,theactualnumberofnewdruglaunchesisdecreasing,whereas thecostsassociatedwiththeirdevelopmentisincreasingsteadily[115].

5.2. Data-drivenproceduresindrugdiscovery

Computationalmodelingindrugdiscovery hasbeenusedforsometimeinindustry.Becauseofthehighlycompetitive landscapeandthelargeeconomicincentives,drugdiscoveryisthestrongestdriverinthedevelopmentofcheminformatics anddata-driven toolsin chemistry, includingdata integration[116,117]. An early analysisofthe integration ofdata and knowledge indrugdiscoveryis giveninthereview paperofSearls [118].As mentionedby [119],data-driven algorithms mustbe used toidentifycompounds witha minimumnumberofliabilitiesinlead-like anddrug-likehits, sincehit lists havemorethan1,000compoundsifdrug-likehitsorleadsarenotdiscarded.High-throughputvirtualscreeningapproaches havealonghistoryindrugdiscovery,wherepredictivemachinelearningtoolsareusedtoprioritizecompoundsand iden-tifyleads ininternal databaseswithmillionsofcompounds(Fig. 2). Aninteractive data-drivenvisual analyticstechnique, namedConTour,wasdevelopedbyPartletal. [120] toenabletheanalysisofcompoundsbasedonmulti-correlateddatasets. Areviewofdata-reductiontechniquesindrugdiscoveryusingprincipalcomponentanalysis,Bayesian analysis,hierarchical clustering,similarityanalysisandprojections,canbefoundin[121] andthereinreferences.Oneofthemostappealing re-centdevelopmentsincomputer-drivendrugdiscoveryisthecombinationofsupervisedmachinelearningmodelstopredict performance withunsupervised modelsto generatenovel promisingcompounds ina fullyautomatic manner. Combining thelargenumberofunlabeledchemicalsthatsomehowcharacterizethenatureofaccessiblechemicalspace,andthe abun-dant activity labels from high-through combinatorial chemistry, automatic chemical design is closer to practice. In this line,Gómez-Bombarelli etal. [122] havedevelopedanautomaticprocedurebasedoncontinuousvector representationsof moleculesandtheuseofneuralnetworkstoperforminversedesignofmolecules.Alargenumberofrelatedworkscontinue toexploretheapplicationofdeepneuralnetworkstothistask,includingsyntaxandgrammarbasedgenerativemodelsthat canwritechemicalgraphs[123–125],deep-reinforcementlearningtools[126,127].

6. Dataqualityandstochasticprocesses

Inanydata-drivenmodelorprocess, dataquality isveryimportant,sinceerroneousorbiaseddatamayproduce erro-neousmodelsanderroneousdecisions.ArecentdramaticexamplemaybethecrashoftwoB-737-Maxairplanes.Whereas thecrashesarestillunderinvestigation,preliminaryfindingsnotethaterroneousdatafromasingleangle-of-attacksensor mayhaveproducedtheanti-stallsoftwaretotilttheairplanedown,anissuecentraltothefatalaccidents[128].Itis appar-entthatdataredundancyanddatatime-seriesanalysesmay,inmostoccasions,enhancedataqualitytoyieldbettermodels andmodelpredictions.AccordingtoISO9001:2015standard,thedefinitionandassessmentofdataqualitydependsonthe context anduseofdata,see also[129,130]. There areseveralreviewson data-qualityresearch, datacuration (correction, reorganization,maintenance,integrationandpreparationofdata),data-qualitydefinitionsandattributesindifferentfields, seeforexample[131,132],and[133] forlinkeddata.Therecentbook[134] reviewsmanydataqualityaspectsindifferent fields.

Withinthecontextofmodellearninginscienceandengineering,quantitativedataquality,asobtainedfromexperiments, seemscentraltothequalityofmodelsandpredictionsthemselves.Inthisregard,somecharacteristics(qualitydimensions) arevery important,asaccuracy, completeness,consistencyandcredibility[135,129].Thereare algorithmsfordata-quality analysisandcuration.Fortimeseries,algorithmscandetectandcorrectissueslikedatashifts,divergingpatterns,unphysical patterns,mismatchedrecordsortrends,noisepatterns,etc,seeforexample[136,137] andthereinreferences.Insomefields like Prognostic and Health Management of systems, data clustering plays a key role in differentiating multiple system conditions.Algorithmsarereviewedin[138] regardingclusteringdifferentiationandqualityenhancement;seealsotherein references.

Thequalityindescriptionofaprocessisoftenrelatedtotheunderstandingofitsstochasticnatureorthatofthevariables andobservationsinvolved,sotheapplicationofdata-drivenmethods tostochasticphenomena andstochastic processesis alsoacurrentareaofresearch.Forexample,Houetal. [139] studytheapproximationcapabilityofdeepgenerativenetworks incapturingtheposteriordistributioninBayesianinverseproblemsthroughlearningatransportmap.Raissietal. introduce in[140] theconceptofparametric Gaussian processesinordertoencode massiveamounts ofdatainasmallnumberof data points.A remarkablefeature isthat their parametric Gaussian processes quantifythe uncertaintyof thepredictions associatedwiththeprocessimperfections.Surrogatemodelsfacilitatesimplerandfasterwaysofinquiringapproximate so-lutions inthespace ofdesignvariables. Forthe caseofstochastic,high-dimensional andvariable-fidelity-sourcesystems, Yang andPerdikaris [141] presentadeeplearning probabilistic proceduretoconstructthesepredictivedata-driven surro-gates, basedoninput-output pairs,withquantified uncertainty.Other worksarethose ofSoizeandGhanem [142] which proposeaprocedureforgeneratingrealizationsofarandomvectorinanunknownsubsetoftheEuclideanspacewhichis consistentwithobservationaldataofthevector,andthatofSoizeandFarhat[143],whichpresentsafastpredictor-corrector approachforcomputingthevector-valuedhyperparameterforanovelnonparametricprobabilisticmethodwiththepurpose ofquantifyingtheuncertaintiesofthemodel-forminnonlinearcomputationalmechanics;seealsothereinreferences.

(9)

7. Conclusions

The 21st Centuryis considered the Century ofBig Data. The change ofcentury hasbrought a historic changein our society. Computers, Internet and the new digital devices are producing a large amount of data. Current computational power andcloudcomputingalsoallowforanunforeseen numberofsimulations.However, insteadofbeingoverwhelmed bysuchamountofrawinformation,wearelearninghowtotakeadvantageofthenewparadigm.Softwarecompanieshave taughtusthatmuchbenefitandkeyinformationmaybeobtainedfromdataanalytics.Then,wearelearningnewwaysof doingthings,amongthem,scienceandengineering.

Data-drivenproceduresfocusondataandtrytoextractvariablesandrelationsdirectlyfromrawdata,givingfrequently more accurate responses without the use of classical analytical laws andequations. However, many open questions re-main,andinsome occasions,drawbackshavebeenfoundasthelackoffulfillmentofsomephysicalprinciples.Then,new physics-baseddata-drivenproceduresaregettingin.

Data-drivenproceduresarealsousedtopredictwhatsciencewillbedoneinscience(afieldknownasScienceofScience [144]),whichmayberelevanttofundingagenciesandtoresearcherslookingforlongtermfunding[145].However,weare atthestart ofanewepoque inwhichdatasciencehasshownmanyremarkablesuccess cases,so data-drivenalgorithms arenotneededforpredictingthatfundingagencieswillincreasinglyinvestmoreandmorefundsindata-drivenscience. Acknowledgements

FJMacknowledgessupportfromAgenciaEstataldeInvestigación ofSpain,grantPGC-2018-097257-B-C32.JNK acknowl-edgessupportfromtheAirForceOfficeofScientificResearch(AFOSR)grantFA9550-17-1-0329.

References

[1] Wikipediateam(Addressed11/22/2018):https://en.wikipedia.org/wiki/Novum_Organum.

[2]F.Mazzocchi,Couldbigdatabetheendofthetheoryinscience?Afewremarksontheepistemology ofdata-drivenscience,EMBORep.16 (10)

(2015)1250–1255.

[3]L.Cao,Datascience:acomprehensiveoverview,ACMComput.Surv.50 (3)(2017)43.

[4]U.Fayyad,G.Piatetsky-Saphiro,P.Smyth,Fromdataminingtoknowledgediscoveryindatabases,AIMag.17 (3)(1996)37–54.

[5]T.Hey,S.Tansley,K.M.Tolle,TheFourthParadigm:Data-IntensiveScientificDiscovery,Vol. 1,MicrosoftResearch,Redmond,WA,2009.

[6]C.M.Bishop,PatternRecognitionandMachineLearning,InformationScienceandStatistics,Springer-VerlagNewYorkInc.,Secaucus,NJ,USA,2006.

[7]P.Langley,J.M.Zytkow,Data-drivenapproachestoempiricaldiscovery,Artif.Intell.40(1989)283–312.

[8]S.L.Brunton,J.L.Proctor,J.N.Kutz,Discoveringgoverningequationsfromdatabysparseidentificationofnonlineardynamicalsystems,Proc.Natl. Acad.Sci.(2016),201517384.

[9]J.-C.Loiseau,S.L.Brunton,ConstrainedsparseGalerkinregression,J.FluidMech.838(2018)42–67.

[10]M.Raissi,P.Perdikaris,G.E.Karniadakis,Physicsinformeddeeplearning(PartII):data-drivendiscoveryofnonlinearpartialdifferentialequations, arXiv:1711.10566,2017.

[11]S.H.Rudy,S.L.Brunton,J.L.Proctor,J.N.Kutz,Data-drivendiscoveryofpartialdifferentialequations,Sci.Adv.3 (4)(2017)e1602614.

[12]P.Angelikopoulos,C.Papadimitriou,P.Loumoutsakos,Datadriven,predictivemolecular dynamicsfornanoscaleflowsimulationsunderuncertainty,

J. Phys.Chem.B117 (47)(2013)14808–14816.

[13]P.E.Bourne,V.Bonazzi,M.Dunn,E.D.Green,M.Guyer,G.Komatsoulis,J.Larkin,B.Russell,TheNIHbigdatatoknowledge(BD2K)initiative,J.Amer. Med.Inform.Assoc.22 (6)(2015)1114.

[14]N.Merchant,E.Lyons,S.Goff,M.Vaughn,D.Ware,D.Mickos,P.Antin,TheiPlantcollaborative:cyberintraestructureforenablingdatatodiscovery forthelifesciences,PLoSBiol.14 (1)(2016)e1002342.

[15]A.Gaudinier,S.M.Brady,Mappingtranscriptionalnetworksinplants:data-drivendiscoveryofnovelbiologicalmechanisms,Annu.Rev.PlantBiol.67

(2016)575–594.

[16]R.Lokers,R.Knapen,S.Janssen,Y.vanRanden,J.Jansen,AnalysisofBigDatatechnologiesforuseinagro-environmentalscience,Environ.Model.

Softw.84(2016)494–504.

[17]M.Schmidt,H.Lipson,Distillingfree-formnaturallawsfromexperimentaldata,Science324 (5923)(2009)81–85.

[18]K.Kaiser,J.N.Kutz,S.L.Brunton,Discoveringconservationlawsfromdataforcontrol,in:57thIEEEConferenceonDecisionandControl,MiamiBeach,

FL,Dec17–19,2018,2018,pp. 6415–6421.

[19]B.A.Le,J.Yvonnet,Q.C.He,Computationalhomogenization ofnonlinearelasticmaterialsusingneuralnetworks,Int.J.Numer.MethodsEng.104

(2015)1061–1084.

[20]F.ElHalabi,D.González,J.A.Sanz-Herrera,M.Doblaré,APGD-basedmultiscaleformulationfornon-linearsolidmechanicsundersmalldeformations,

Comput.MethodsAppl.Mech.Eng.305(2016)806–826.

[21]F.Fritzen,O.Kunc,Two-stagedata-drivenhomogenizationfornonlinearsolidsusingareducedordermodel,Eur.J.Mech.A,Solids69(2018)201–220.

[22]X.Lu,D.G.Giovanis,J.Yvonnet,V.Papadopoulus,F.Detrez,J.Bai,Adata-drivencomputationalhomogenization methodbasedonneuralnetworksfor

thenonlinearanisotropicelectricalresponseofgraphene/polymernanocomposites,Comput.Mech.64(2019)307–321.

[23] N.H.Paulson,M.W.Priddy,D.L.McDowell,S.R.Kalidindi,Data-drivenreduced-ordermodelsforrank-orderingthehighcyclefatigueperformanceof polycrystallinemicrostructures,Mater.Des.154(2018)170–183,https://doi.org/10.1016/j.matdes.2018.05.009.

[24] A.Rovinelli,M.D.Sangid,H.Proudhon,W.Ludwig,Usingmachinelearningandadata-drivenapproachtoidentifythesmallfatiguecrackdriving forceinpolycrystallinematerials,Comput.Mech.4(2018)35,https://doi.org/10.1038/s411524-018-0094-7.

[25]W.Yan,S.Lin,O.L.Kafka,Y.Lian,C.Yu,Z.Liu,J.Yan,S.Wolff,H.Wu,E.Ndip-Agbor,M.Mozaffar,K.Ehmann,J.Cao,G.J.Wagner,W.K.Liu,Data-driven multi-scalemulti-physicsmodelstoderiveprocess-structure-propertyrelationshipsforadditivemanufacturing,Comput.Mech.61(2018)521–541.

[26]K.Wang,W.Sun,Amultiscalemulti-permeabilityporoplasticitymodellinkedbyrecursivehomogenizationsanddeeplearning,Comput.Methods

Appl.Mech.Eng.334(2018)337–380.

[27]I.Temizer,T.I.Zohdi,Anumericalmethodforhomogenizationinnon-linearelasticity,Comput.Mech.40(2007)281–298.

[28]D.Ryckelynck,Apriorihyperreductionmethod:anadaptiveapproach,J.Comput.Phys.202(2005)346–366.

(10)

[30]F.Chinesta,A.Leygue,F.Bordeu,J.V.Aguado,E.Cueto,D.Gonzalez,I.Alfaro,ParametricPGDbasedcomputationalvademecumforefficientdesign,

optimizationandcontrol,Arch.Comput.MethodsEng.20 (1)(2013)31–59.

[31]D.Neron,P.Ladeveze,Propergeneralizeddecompositionformultiscaleandmultiphysicsproblems,Arch.Comput.MethodsEng.17(2010)351–372.

[32]M.Cremonesi,P.-A.Neron,D.Guidault,P.Ladeveze,APGD-basedhomogenizationtechniquefor theresolutionofnonlinearmultiscaleproblems,

Comput.MethodsAppl.Mech.Eng.267(2013)275–292.

[33]D.González,A.Badias,I.Alfaro,F.Chinesta,E.Cueto,Modelorderreductionforreal-timedataassimilationthroughextendedKalmanfilters,Comput.

MethodsAppl.Mech.Eng.326(2017)679–693.

[34]M.A.Bessa,R.Bostanabad,Z.Liu,A.Hu,D.W.Apley,C.Brinson,W.Chen,W.K.Liu,Aframeworkfordata-drivenanalysisofmaterialsunderuncertainly:

counteringthecurseofdimensionality,Comput.MethodsAppl.Mech.Eng.320(2017)633–667.

[35]S.Tang,L.Zhang,W.K.Liu,Fromvirtualclusteringanalysistoself-consistentclusteringanalysis:amathematicalstudy,Comput.Mech.62(2018) 1143–1460.

[36]P.Ma,C.I.Castillo-Davis,W.Zhong,J.S.Liu,Adata-drivenclusteringmethodfortimecoursegeneexpressiondata,NucleicAcidsRes.34(2006) 1261–1269.

[37]J.Yvonnet,D.Gonzalez,Q.-C.He,Numericallyexplicitpotentialsforthehomogenization ofnonlinearelastichomogeneousmaterials,Comput.

Meth-odsAppl.Mech.Eng.198(2009)2723–2737.

[38]J.Yvonnet,E.Monteiro,Q.-C.He,Computationalhomogenizationmethodandreduceddatabasemodelforhyperelasticheterogeneousstructures,Int.

J. MultiscaleComput.Eng.11(2013)201–225.

[39]L.Xia,P.Breitkopf,Multiscalestructuraltopologyoptimizationwithanapproximateconstitutivemodelforlocalmaterialmicrostructure,Comput.

MethodsAppl.Mech.Eng.104(2015)1061–1084.

[40] N.H.Paulson,M.W.Priddy,D.L.McDowell,S.R.Kalidindi,Data-drivenreduced-ordermodelsforrank-orderingthehighcyclefatigueperformanceof polycrystallinemicrostructures,Mater.Des.154(2018)170–183,https://doi.org/10.1016/j.matdes.2018.05.009.

[41]D.Ryckelynck,Hyper-reductionframeworkformodelcalibrationinplasticity-inducedfatigue,Adv.Model.Simul.Eng.Sci.3 (15)(2016).

[42]S.R.Kalidindi,M.DeGraef,Materialsdatascience:currentstatusandfutureoutlook,Annu.Rev.Mater.Res.45(2015)171–193.

[43]B.Ganapathysubramanian,N.Zabaras,Modelingdiffusioninrandomheterogeneousmedia:data-drivenmodels,stochasticcollocationandthe

varia-tionalmultiscalemethod,J.Comput.Phys.226(2007)326–353.

[44] R.Ramprasad,R.Batra,G.Pilania,A.Mannodi-Kanakkithodi,C.Kim,Machinelearninginmaterialsinformatics:recentapplicationsandprospects, Comput.Mat.3(2017)54,https://doi.org/10.1038/s41524-017-0056-5.

[45]N.Relun,D.Neron,P.A.Boucard,AmodelreductiontechniquebasedonthePGDforelastic-viscoplasticcomputationalanalysis,Comput.Mech.51

(2013)83–92.

[46]C.Zopf,M.Kaliske,Numericalcharacterizationofuncuredelastomersbyaneuralnetworkbasedapproach,Comput.Struct.182(2017)504–525.

[47]I.Kopal,I.Labaj,M.Harnicarova,J.Valicek,D.Hruby,Predictionofthetensileresponseofcarbonblackfilledrubberblendsbyartificialneural

network,Polymers10 (6)(2018)644.

[48]A.Serafinska,N.Hassoun,M.Kaliske,Numericaloptimizationofwearperformance.Utilizingametamodelbasedfrictionlaw,Comput.Struct.165

(2016)10–23.

[49]W.Graf,M.Gütz,F.Leichsenring,M.Kaliske,Computationalintelligenceforefficientnumericaldesignofstructureswithuncertainparameters,in:

2015IEEESymposiumSeriesonComputationalIntelligence,2015,pp. 1824–1831.

[50]S.Bhattacherjee,K.Matous,Anonlinear manifold-basedreducedordermodelformultiscaleanalysisofheterogeneoushyperelasticmaterials,J.

Com-put.Phys.313(2016)635–653.

[51]T.Sussman,K.J.Bathe,Amodelofincompressibleisotropichyperelasticmaterialbehaviorusingsplineinterpolationsoftension-compressiondata,

Commun.Numer.MethodsEng.25 (1)(2009)53–63.

[52]J.Crespo,M.Latorre,F.J.Montáns,WYPIWYGhyperelasticityforisotropic,compressiblematerials,Comput.Mech.59 (1)(2017)73–92.

[53]J.Crespo,F.J.Montáns,Function-refreshalgorithmsfordeterminingthestoredenergydensityofnonlinearelasticorthotropicmaterialsdirectlyfrom experimentaldata,Int.J.Non-LinearMech.107(2018)16–33.

[54]J.Crespo,F.J.Montáns,Generalsolutionprocedurestocomputethestoredenergydensityofconservativesolidsdirectlyfromexperimentaldata,Int. J. Eng.Sci.141(2019)16–34.

[55]M.Miñano,F.J.Montáns,WYPiWYGdamagemechanicsforsoftmaterials:adata-drivenapproach,Arch.Comput.MethodsEng.25(2018)165–193.

[56]M.Latorre,F.J.Montáns,Fullyanisotropicfinitestrainviscoelasticitybasedonareversemultiplicativedecompositionandlogarithmicstrains,Comput. Struct.163(2016)56–70.

[57]M.Latorre,F.J.Montáns,Strain-leveldependentnonequilibriumanisotropicviscoelasticity:applicationtotheabdominalmuscle, J.Biomech.Eng.

139 (10)(2017)101007.

[58]T.Kichdoerfer,M.Ortiz,Data-drivencomputationalmechanics,Comput.MethodsAppl.Mech.Eng.304(2016)81–101.

[59] A.Leygue,M.Coret,J.Rethore,L.Stainier,E.Verron,Datadrivenconstitutiveidentification,2017,HALID:hal-01452492v2. [60]L.T.K.Nguyen,M-A.Keip,Adata-drivenapproachtononlinearelasticity,Comput.Struct.194(2018)97–115.

[61]R.Ibañez,D.Borzacchiello,J.V.Aguado,E.Abisset-Chavane,E.Cueto,P.Ladeveze,F.Chinesta,Data-drivennon-linearelasticity:constitutivemanifold

constructionandproblemdiscretization,Comput.Mech.60(2017)813–826.

[62]R.Ibañez,E.Abisset-Chavanne,J.V.Aguado,D.Gonzalez,E.Cueto,F.Chinesta,Amanifold-basedmethodologicalapproachtodata-driven

computa-tionalelasticityandinelasticity,Arch.Comput.MethodsEng.25 (1)(2018)47–57.

[63] D.González,F.Chinesta,E.Cueto, Thermodynamicallyconsistentdata-drivencomputationalmechanics,Contin. Mech.Thermodyn.31 (1)(2019) 239–253,https://doi.org/10.1007/s00161-018-0677-z.

[64]T.Kirchdoerfer,M.Ortiz,Datadrivencomputingwithnoisymaterialdatasets,Comput.MethodsAppl.Mech.Eng.326(2017)622–641.

[65]S.F.Gull,J.Skilling,Maximumentropymethodinimageprocessing,IEEProc.F,Commun.RadarSignalProcess.131(1984)646–659.

[66]H.Wang,J.F.O’Brien,R.Ramammoorthi,Data-drivenelasticmodelsforcloth:modelingandmeasurement,ACMTrans.Graph.30 (4)(2011)71.

[67] R.Ibañez,E.Abisset-Chavanne,D.Gonzalez,J.L.Duval,E.Cueto,F.Chinesta,Hybridconstitutivemodeling:data-drivenlearningofcorrectionsto plasticitymodels,Int.J.Mater.Form.12 (4)(2019)717–725,https://doi.org/10.1007/s12289-018-1448-x.

[68]Y.Feng,S.Mitran,Data-drivenreduced-ordermodelofmicrotubulemechanics,Cytoskeleton75(2018)45–60.

[69]J.Ku,J.L.Hicks,T.Hastie,J.Leskivec,C.Re,S.L.Delp,Themobilizecenter:anNIHbigdatatoknowledgecentertoadvancehumanmovementresearch

andimprovemobility,J.Amer.Med.Inform.Assoc.22 (6)(2015)1120–1125.

[70]E.Cai,P.Ranjan,P.Pan,M.Wuebbens,D.Marculescu,Efficientdata-drivenmodellearningfordynamicalsystems,in:ProceedingsoftheIntelligent

SystemsforMolecularBiologyMeeting,ISMB2016,Orlando,July8-12,2016,2016.

[71] J.Hasenauer,N.Jagiella,S.Hross,F.J.Theis,Data-drivenmodelingofbiologicalmulti-scaleprocesses,J.CoupledSyst.MultiscaleDyn.3 (2)(2015) 101–121,https://doi.org/10.1166/jcsmd.2015.1069.

[72]B.Chen,C.Zang,Artificialimmunepatternrecognitionforstructuredamageclassification,Comput.Struct.87(2013)1394–1407.

[73]S.A.Hofmeyr,S.Forrest,Architectureforanartificialimmunesystem,Evol.Comput.8 (4)(2000)443–473.

(11)

[75]M.Anaya,D.A.Tibaduiza-Burgos,F.Pozo,Abioinspiredmethodologybasedonanartificialimmunesystemfordamagedetectioninstructuralhealth

monitoring,ShockVib.2015(2014)648097.

[76]M. Anaya,D.A.Tibaduiza,F.Pozo,Detectionandclassificationofstructuralchanges usingartificial immunesystemsandfuzzyclustering,Int.J.

Bio-Inspir.Comput.9 (1)(2017)35–52.

[77]F.J.M.GuerradosSantosCavadas,StructuralHealthMonitoringofBridges:Physics-BasedAssessmentandData-DrivenDamageIdentification,Ph. D.

thesis,UniversidadedoPorto,2016.

[78]A.S.Kompalka,S.Reese,O.T.Bruhns,Experimentalinvestigationofdamageevolutionbydata-drivenstochasticsubspaceidentificationanditerative

finiteelementmodelupdating,Arch.Appl.Mech.77 (8)(2007)559–573.

[79]A.Karpatne,G.Alturi,J.H.Faghmous,M.Steinbach,A.Banerjee,A.Ganguly,S.Shekhar,N.Samatova,V.Kumar,Theory-guideddatascience:anew paradigmforscientificdiscoveryfromdata,IEEETrans.Knowl.DataEng.29(2017)2318–2331.

[80]V.J.Amores,J.M.Benitez,F.J.Montáns,Data-driven,structure-basedhyperelasticmanifolds:amacro-micro-macroapproach,arXiv:1903.11545 [cond -mat.mtrl-sci].

[81]S.Yin,S.X.Ding,X.Xie,H.Luo,Areviewonbasicdata-drivenapproachesforindustrialprocessmonitoring,IEEETrans.Ind.Electron.61 (11)(2014) 6418–6428.

[82] F.Chinesta,E.Cueto,E.Abisset,J.L.Duval,F.ElKhaldi,Virtual,digitalandhybridtwins.Anewparadigmindata-basedengineeringandengineered data,Arch.Comput.MethodsEng.(2019),https://doi.org/10.1007/s11831-018-9301-4,inpress.

[83]J.Wang,Q.Chang,G.Xiao,N.Wang,S.Li,Datadrivenproductionmodelingandsimulationofcomplexautomobilegeneralassemblyplant,Comput.

Ind.62 (7)(2011)765–775.

[84]P.P.Peralta-Yahya,F.Zhang,S.B.DelCardayre,J.D.Keasling,Microbialengineeringfortheproductionofadvancedbiofuels,Nature488 (7411)(2012) 320.

[85]P.Kadlec,B.Gabrys,S.Strandt,Data-drivensoftsensorsintheprocessindustry,Comput.Chem.Eng.33 (4)(2009)795–814.

[86]S.Yin,S.X.Ding,A.H.A.Sari,H.Hao,Data-drivenmonitoringforstochasticsystemsanditsapplicationonbatchprocess,Int.J.Syst.Sci.44 (7)(2013) 1366–1376.

[87]S.A.Vaghefi,M.A.Jafari,J.Zhu,J.Brouwer,Y.Lu,Ahybridphysics-basedanddatadrivenapproachtooptimalcontrolofbuildingcooling/heating systems,IEEETrans.Autom.Sci.Eng.13 (2)(2014)600–610.

[88]K.R.Holdaway,HarnessOilandGasBigDatawithAnalytics:OptimizeExplorationandProductionwithData-DrivenModels,Wiley,NewJersey,2014.

[89]S.Esmaili,S.D.Mohaghegh,Fullfieldreservoirmodelingofshaleassetsusingadvanceddata-drivenanalytics,Geosci.Front.7 (1)(2016)11–20.

[90]Y.Zhang,J.He,C.Yang,J.Xie,R.Fitzmorris,X-H.Wen,Aphysics-baseddata-drivenmodelforhistorymatching,prediction,andcharacterizationof unconventionalreservoirs,Soc.Pet.Eng.J.23 (4)(2018)SPE-191126-PA.

[91] Z.Guo,A.C.Reynolds,H.Zhao,Aphysics-baseddata-drivenmodelforhistorymatching,prediction,andcharacterizationofwaterfloodingperformance, Soc.Pet.Eng.J.(2018),https://doi.org/10.2118/182660-PA.

[92]T.Kaneko,R.Wada,M.Ozaki,T.Inoue,Combiningphysics-basedanddata-drivenmodelsforestimationofWOBduringultra-deepoceandrilling,in:

ASME201837thInternationalConferenceonOcean,OffshoreandArctic Engineering,Madrid,Spain,2018,PaperOMAE2018-78229,V008T11A007.

[93] I.Song,I.H.Cho,R.W.Wong,Anadvancedstatisticalapproachtodata-drivenearthquakeengineering,J.Earthq.Eng.(2018),https://doi.org/10.1080/ 13632469.2018.1461713,inpress.

[94]K.Duraisamy,Z.J.Zhang,A.P.Singh,Newapproachesinturbulenceandtransitionmodelingusingdata-driventechniques,in:53rdAIAAAerospace

SciencesMeeting,AIAASciTechForumAIAA2015-1284,2015,pp. 1–14.

[95]E.J.Parish,K.Duraisamy,Aparadigmfordata-drivenpredictivemodelingusingfieldinversionandmachinelearning,J.Comput.Phys.305(2016)

758–774.

[96]J.Tompson,K.Schlachter,P.Sprechmann,K.Perlin,Acceleratingfluidsimulationwithconvolutionalnetworks,arXiv:1607.03597,2017.

[97]J.Ling,A.Ruiz,G.Lacaze,J.Oefelein,Uncertaintyanddata-drivenmodeladvancesforajet-in-crossflow,J.Turbomach.139 (2)(2016)021008.

[98] K.Duraisamy,G.Laccarino,H.Xiao,Turbulencemodelingintheageofdata,Annu.Rev.FluidMech.51(2019)357–377,https://doi.org/10.1146/ annurev-fluid-010518-040547.

[99]S.Klus,F.Nüske,P.Koltai,H.Wu,I.Kevrekidis,C.Schötte,F.Noé,Data-drivenmodelreductionandtransferoperatorapproximation,J.NonlinearSci.

28 (3)(2018)985–1010.

[100]V.Beltran,S.LeClainche,J.M.Vega,Temporalextrapolationofquasi-periodicsolutionsviaDMD-likemethods,in:2018FluidDynamicsConference,

AIAAAviationForum(AIAA2018-3092),2018.

[101]N.Demo,M.Tezzele,G.Gustin,G.Lavini,G.Rozza,Shapeoptimizationbymeansofproperorthogonaldecompositionanddynamicmode decompo-sition,arXiv:1803.07368v2,2018.

[102]B.Moya,D.González,I.Alfaro,F.Chinesta,E.Cueto,Learningsloshdynamicsbymeansofdata,Comput.Mech.64(2019)511–523.

[103]K.H.Lee,M.G.Choi,Q.Hong,J.Lee,Groupbehaviorfromvideo:adata-drivenapproachtocrowdsimulation,in:SCA’07:Proceedingsofthe2007

ACMSIGGRAPH/EurographicsSymposiumonComputerAnimation,2007,pp. 109–118.

[104]P.Charalambous,Y.Chrysanthou,ThePAGcrowd:agraphbasedapproachforefficientdata-drivencrowdsimulation,Comput.Graph.Forum33 (8)

(2014)95–108.

[105]J.Zhang,H.Liu,Y.Li,X.Qin,S.Wang,Video-drivengroupbehaviorsimulationbasedonsocialcomparisontheory,PhysicaAStatist.Mech.Appl.512

(2018)620–634.

[106]V.Marmarelis,G.Mitsis,Data-DrivenModelingforDiabetes,LectureNotesinBioengineering,Springer-Verlag,Berlin,2014.

[107]O.Zettinig,T.Mansi,D.Neumann,B.Georgescu,S.Rapaka,P.Seegerer,E.Kayvanpour,F.Sedaghat-Hamedani,A.Amr,J.Haas,H.Steen,Data-driven estimationofcardiacelectricaldiffusivityfrom12-leadECGsignals,Med.ImageAnal.18 (8)(2014)1361–1376.

[108]J.R.Jimenez-Alanaiz,V.Medina-Banuelos,O.Yanez-Suarez,Data-drivenbrainMRIsegmentationsupported onedgeconfidenceandaprioritissue information,IEEETrans.Med.Imaging25 (1)(2006)74–83.

[109]J.P.Kassirer,Diagnosticreasoning,Ann.Intern.Med.110 (11)(1989)893–900.

[110]X.H.Wang,B.Zheng,W.F.Good,J.L.King,Y.H.Chang,Computer-assisteddiagnosisofbreastcancerusingadata-drivenBayesianbeliefnetwork,Int. J. Med.Inform.54 (2)(1999)115–126.

[111]Y.Xing,W.Wang,Z.Zhao,Combinationdataminingmethodswithnewmedicaldatatopredictingoutcomeofcoronaryheartdisease,in:

Interna-tionalConferenceonConvergenceInformationTechnology,Gyeongju,SouthKorea,21-23Nov2007,IEEE,2007.

[112]J.Luo,M.Wu,D.Gopukumar,Y.Zhao,Bigdataapplicationinbiomedicalresearchandhealthcare:aliteraturereview,Biomed.Inform.Insights8

(2016)1–10.

[113]K.H.Bleicher,H.J.Böhm,K.Müller,A.I.Alanine,Hitandleadgeneration:beyondhigh-throughputscreening,Nat.Rev.DrugDiscov.2(2003)369–378.

[114]Y.Jing,Y.Bian,Z.Hu,L.Wang,X-Q.S.Xie,Deeplearningfordrugdesign:anartificialintelligenceparadigmfordrugdiscoveryinthebigdataera, AAPSJ.20(2018)58.

[115]J.Dews,Strategictrendsinthedrugindustry,DrugDiscov.Today8(2003)411–420.

[116]N.Kumar,B.S.Hendriks,K.A.Janes,D.deGraaf,D.A.Lauffenburger,Applyingcomputationalmodelingtodrugdiscoveryanddevelopment,Drug

(12)

[117]H.P.Fisher,S.Heyse,Fromtargetstoleads:theimportanceofadvanceddataanalysisfordecisionsupportindrugdiscovery,Curr.Opin.DrugDiscov. Devel.8 (3)(2005)334–346.

[118]D.B.Searls,Dataintegration:challengesfordrugdiscovery,Nat.Rev.DrugDiscov.4(2005)45–58.

[119]T.Wunberg,M.Hendrix,A.Hillisch,M.Lobell,H.Meier,C.Schmeck,H.Wild,B.Hinzen,Improvingthehit-to-leadprocess:data-drivenassessment ofdrug-likeandlead-likescreeninghits,DrugDiscov.Today11(2006)175–180.

[120]C.Partl,A.Lex,M.Streit,H.Strobelt,A.-M.Wassermann,H.Pfister,D.Schmalstieg,ConTour:data-drivenexplorationofmulti-relationaldatasetsfor drugdiscovery,IEEETrans.Vis.Comput.Graph.20 (12)(2014)1883–1892.

[121]T.J.Howe,G.Mahieu,P.Marichal,T.Tabruyn,P.Vugts,Datareduction andrepresentationindrugdiscovery,DrugDiscov.Today12 (1–2)(2007) 45–53.

[122]R.Gómez-Bombarelli,J.N.Wei,D.Duvenaud,J.M.Hernández-Lobato,B.Sánchez-Lengeling,D.Sheberla,J.Aguilera-Iparraguirre,T.D.Hirzel,R.P.Adams, A. Aspuru-Guzik,Automaticchemicaldesignusingadata-drivencontinuousrepresentationofmolecules,ACSCent.Sci.4 (2)(2018)268–276.

[123]M.J.Kusner,B.Paige,J.M.Hernández-Lobato,Grammarvariationalautoencoder,arXiv:1703.01925 [stat],2017.

[124]W.Jin,R.Barzilay,T.Jaakkola,Junctiontreevariationalautoencoderformoleculargraphgeneration,arXiv:1802.04364,2018.

[125]H.Dai,Y.Tian,B.Dai,S.Skiena,L.Song,Syntax-directedvariationalautoencoderforstructureddata,arXiv:1802.08786,2018.

[126]K.Popova,O.Isayev,A.Tropsha,Deepreinforcementlearningfordenovodrugdesign,Sci.Adv.4 (7)(2018)eaap7885.

[127]G.L.Guimaraes,B.Sánchez-Lengeling,P.L.C.Farias,A.Aspuru-Guzik,C.Outeiral,P.L.C.Farias,A.Aspuru-Guzik,Objective-reinforcedgenerative adver-sarialnetworks(ORGAN)forsequencegenerationmodels,arXiv:1705.10843,2017.

[128]J.Nicas,J.Creswell,Boeing’s737Max:1960sDesign,1990sComputingPowerandPaperManuals,TheNewYorkTimes,NYEdition,9Apr2019,A1, 2019.

[129]J.Bicevskis,Z.Bicevska,G.Karnitis,Executabledataqualitymodels,Proc.Comput.Sci.104(2017)138–145.

[130]M.S.Marev,E.Compatangelo,W.Vasconcelos,TowardsaContext-DependentNumericalDataQualityEvaluationFramework,Technicalreport,2018, arXiv:1810.09399.

[131]R.Y.Wang,V.C.Storey,C.P.Firth,Aframeworkforanalysisofdataqualityresearch,IEEETrans.Knowl.DataEng.7 (4)(1995)623–639.

[132]J.Liu,J.Li,W.Li,J.Wu,Rethinkingbigdata:areviewonthedataqualityandusageissues,ISPRSJ.Photogramm.RemoteSens.115(2016)134–142.

[133]A.Zaveri,A.Rula,A.Maurino,R.Pietrobon,J.Lehman,S.Auer,Qualityassessmentforlinkeddata:asurvey,Semant.Web7 (1)(2019)63–93.

[134]C.Batini,M.Scannapieco,DataandInformationQuality.Dimensions,PrinciplesandTechniques,SpringerNatureSwitzerlandAG,2018.

[135]I.Kirchen, D.Schütz,J.Folmer, B.Vogel-Heuser,Metricsfor theevaluation ofdata qualityofsignaldatainindustrial processes,in:IEEE15th InternationalConferenceonIndustrialInformatics,INDIN,24-26July2017,2017.

[136]G.Pastorello,D.Agarwal,T.Samak,C.Poindexter,B.Faybishenko,D.Gunter,R.Hollowgrass,D.Papale,C.Trotta,A.Ribeca,E.Canfora,Observational datapatternsfortimeseriesdataqualityassessment,in:ProceedingsoftheIEEE10thInternationalConferenceoneScience,vol. 1,IEEEComputer Society,2014,pp. 271–277.

[137]R.Gitzel,Dataqualityintimeseries data.Anexperiencereport,in:ProceedingsofCBI2016IndustrialTrack,Paris,France,31Aug2016,2016, pp. 41–49.

[138]Y.Chen,F.Zhu,J.Lee,Dataqualityevaluationandimprovementforprognosticmodelingusingvisualassessmentbaseddatapartitioningmethod,

Comput.Ind.64(2013)214–225.

[139]T.Y.Hou,K.C.Lam,P.Zhang,S.Zhang,SolvingBayesianinverseproblemsfromtheperspectiveofdeepgenerativenetworks,Comput.Mech.64(2019) 395–408.

[140]M.Raissi,H.Babaee,G.E.Karniadakis,ParametricGaussianprocessregressionforbigdata,Comput.Mech.64(2019)409–416.

[141]Y. Yang,P.Perdikaris, Conditionaldeep surrogatemodelsfor stochastic,highdimensional,and multifidelitysystems, Comput.Mech. 64(2019) 417–434.

[142]C.Soize,R.Ghanem,Data-drivenprobabilityconcentrationandsamplingonmanifold,J.Comput.Phys.321(2016)242–258.

[143]C.Soize,C.Farhat,Probabilistic learningformmodelingand quantifyingmodel-formuncertaintiesinnonlinearcomputationalmechanics,Int.J.

Numer.MethodsEng.117(2019)819–843.

[144] S.Fortunato,C.T.Bergstrom,K.Börner,J.A.Evans,D.Helbing,S.Milojevic,A.M.Petersen,F.Radicchi,R.Sinatra,B.Uzzi,A.Vespignani,L.Waltman, D. Wang,A-L.Barabasi,Scienceofscience,Science359 (6379)(2018)eaao0185,https://doi.org/10.1126/science.aao0185.

Figure

Fig. 1. Data-driven approaches in solid mechanics. Macro (left): variables and possible parameters are macroscopic; data-driven procedures determine the constitutive/design manifolds from observed macroscopic behavior
Fig. 2. The steps to drug-discovery. Artificial intelligence and data-driven procedures using multiple datasets/databases may substantially accelerate the first steps, in which molecules are selected, combined, improved and refined, saving important funds.

Références

Documents relatifs

Classical plug and play data assimilation schemes typically utilise a pre-trained data- driven model in a data assimilation scheme. Figure 9.1 highlights the training setup of

In contrast to prior work, we use unique datasets and state- of-the-art data science methods to obtain new insight into the reasons why men and women want to study engineering, the

Using technology may be more environmentally friendly than paper materials, allow far greater autonomy, and be motivating for many – although perhaps fewer than

To do so, we leveraged the metamodeling capabilities provided by MDE environments, which allowed us to design a DEVS metamodel, but also to automate many operations

We have cloned a construction kit-like self-inactivating lenti- viral expression vector family which is compatible to state-of-the-art packaging and pseudotyping technologies

IJSLE will include the work of those at the D-Lab at MIT, Engineering Projects In Community Service at Purdue and other EPICS programs, Engineers Without Borders at

Figure 6 draws the big picture of the Execution Model implementation using the Visitor pattern by giving the example of Software Activity, Activity Edge and

To make UML4SPM process models executable, the semantics of the metamodel is implemented in terms of operations and instructions using Kermeta.. This implementation is then woven