Science Arts & Métiers (SAM)
is an open access repository that collects the work of Arts et Métiers Institute of
Technology researchers and makes it freely available over the web where possible.
This is an author-deposited version published in:
https://sam.ensam.eu
Handle ID: .
http://hdl.handle.net/10985/19480
To cite this version :
Francisco J. MONTÁNS, Francisco CHINESTA, Rafael GÓMEZBOMBARELLI, J. Nathan KUTZ
Datadriven modeling and learning in science and engineering Comptes Rendus Mécanique
-Vol. 347, n°11, p.845-855 - 2019
Any correspondence concerning this service should be sent to the repository
Data-Based Engineering Science and Technology / Sciences et technologies de l’ingénierie basées sur les données
Data-driven
modeling
and
learning
in
science
and
engineering
Francisco
J.
Montáns
a,∗
,
Francisco
Chinesta
b,
Rafael
Gómez-Bombarelli
c,
J.
Nathan Kutz
daEscuelaTécnicaSuperiordeIngenierosAeronáuticos,UniversidadPolitécnicadeMadrid,PlazaCardenalCisneros3,28045Madrid,Spain bESIGroupChair@PIMM,ArtsetMétiersParisTech,151,boulevarddel’Hôpital,75013Paris,France
cDepartmentofMaterialsScienceandEngineering,MassachusettsInstituteofTechnology,77MassachusettsAvenue,Cambridge, MA02139,USA
d DepartmentofAppliedMathematics,UniversityofWashington,Seattle,WA98195,USA
a
b
s
t
r
a
c
t
Keywords: Data-drivenscience Data-drivenmodeling Artificialintelligence Machinelearning Data-science BigdataIn thepast, data inwhichscience and engineering isbased, was scarceand frequently obtained by experiments proposed to verify a given hypothesis. Each experiment was able toyield onlyvery limiteddata. Today, data is abundant and abundantly collected in each single experiment at a very small cost. Data-driven modeling and scientific discoveryisachangeofparadigmonhowmanyproblems,bothinscienceandengineering, are addressed. Some scientific fields have been using artificial intelligence for some time due to the inherent difficulty in obtaining laws and equations to describe some phenomena.However,todaydata-drivenapproachesarealsofloodingfieldslikemechanics and materials science, where thetraditional approach seemed tobe highly satisfactory. In this paper we review the application of data-driven modeling and model learning procedurestodifferentfieldsinscienceandengineering.
1. Introduction
The walk of humankind inlife is a complex journey of learning through observation andexperiencing ofthe world, fromwhichwecollectpropertydata,sometimeseasilyquantifiable,sometimesmorequalitative.Also,fromobservation,we relateeventstothatpropertydata.Itisfromrepetitiveexperiencesfromwhichwecandeterminesomepatternsthatrelate eventstodata,andeventstoeventsthemselves.Inthecaseofsciencediscovery,thesepatternsandrelationsareformalized inlaws andequations,the dataare formalized in propertiesandvariables, andtheobservations are formalized inevent measurements,whichmaybeactionsorpropertiesthemselves.Lawsandequations,typicalinscience,allowustoperform predictionsandfacilitatethetransmissionofthelearningprocedureinaverycompactmanner,withtheminimumamount of information.However, the classical process of learning in science is a slow process which needs much observational experience,usuallyfromexpensiveproposedexperiments,astodiscoverthemainvariablesinvolvedandtheirinfluenceon eventsforaprobablyhugeamountofpossiblecombinations,missingfrequentlyunforeseenrelevantvariables.Furthermore, theclassical scientific approachishypotheses-driven andhenceisbiasedby them.Thescientific methodwas established
*
Correspondingauthor.E-mailaddresses:Fco.Montans@upm.es(F.J. Montáns),Francisco.Chinesta@ensam.eu(F. Chinesta),rafagb@mit.edu(R. Gómez-Bombarelli),kutz@uw.edu
because ofthenaturalbiasesandweaknessesofthe humanmind,includingthenaturalbiashumanshavewhen seeking metaphysicalexplanations whicharenotbasedonrealobservations([1] and[2],citingFrancisBaconinNovumOrganum Scientiarum, AD 1620). However, the classical scientific method is still biased by the deductive thinking of the human mind.Data-drivenproceduresseek,ifpossible,anunbiasedimplicitapproachtoourlearningexperiencebasedonrawdata fromrealobservations.Theseprocedures havethe additionaladvantage oftestingcorrelationsbetweendifferentvariables and observations, learning unforeseen patternsin nature and allowing us to discover newscientific lawsor even more, performingpredictionswithouttheavailabilityofsuchlaws.Wearelivingtheeraofdatascience,whichimpactsallaspects of life [3]. Data-driven procedures are givingrise to a neweconomy. Data-based science will also changeour lives and howwedoscience.Datacollection,data-mining[4] anddata-visualizationwillalsobeofparamountimportanceinscience discovery.Data-drivenscientificdiscoveryisconsideredthefourthparadigm[5],sofoundingagencieshavebeguntoinvest significantlyindata-drivenscience(TheEconomist,Briefing5/6/2017).Forexample,NationalScienceFoundationoftheUSA grantedin2014$31millioninawardstolaythegroundworkindata-science(NSFNewsRelease14-132).Thepurposeofthis briefreviewoftherelativelynovelfieldofdata-drivenmodelinginscienceandengineering, istogiveascentofdifferent approachesandapplicationsinseveralscientificandengineeringfieldsofdata-drivenmodeling.Therefore,samplingsome approachesandapplicationsisourpurpose;nointentionismadetoincludeallpossibleworks,applicationsorprocedures, butjusttogiveabigpictureofsomepathsfollowedindifferentfields.
2. Data-drivendiscoveryinscience
2.1. Computerpowerfacilitatescomputer-baseddiscovery
Theever-increasingriseofcomputationalpowerinthepastfewdecadeshasledtosignificantadvancesinstatisticaland machinelearning techniques[6].Thecollectionofalgorithmsanddataminingmethodsdevelopedasaresulthaveformed the core mathematical architecture of artificial intelligence (AI) agents, and although AI has a long history in scientific discovery [7],data-drivenapproachesinmoderncomputerscannowingestandprocessalgorithmsatscale.Thishasbeen largelyenabledbytheplummetingcostsofsensors,computationalpower,anddatastoragetechnologies.Indeed,suchvast quantitiesofdataaffordusnewopportunitiesfordata-drivendiscovery,which,asmentioned,hasbeenreferredtoasthe 4thparadigmofscience[5].
In mostapplicationfields inthe engineering, physical andbiological sciences, physicalmodels are expressedas aset ofgoverningconstitutiverelations,spatio-temporalrelations,and/ordynamicalsystems.Data-drivendiscovery inthese ap-plication areasare specificallyconstructedtodiscoverconstitutiverelationsanddifferentialgoverningequationsinawide variety offields,[8], conservationlaws[9] or propagationofnonlinear waves[10], spatio-temporaldynamics [11], devel-opmentofpredictiveproceduresformoleculardynamicsfornanoscaleflow[12],andhealthmonitoring[13].Inparticular, data-driventechniquesmaybeextremelyimportantincomplexareasoflifesciences[14],allowingforunveilingunknown biologicalmechanisms[15].Thus,thereareincreasinglyfundinginitiativestoobtainnewmethods,softwaretoolsand train-ing within the frameworkofBig Data analysisinhealth [13].In thefield ofdata-driven modelingin agro-environmental science,anoverviewmaybefoundin[16] andthereinreferences.
From the Schrödinger equationof quantummechanics toMaxwell’s equationsforelectromagneticpropagation, know-ing thegoverninglawshasallowed fortransformative technologicalimpactinsociety(e.g.,smartphones,internet,lasers, satellites). And just asNewton builtupon thework of Keplerand others,proposing the existence ofgravity in order to derive F
=
ma andexplainKepler’sellipticorbits,thediscoveryofafundamentalgoverninglawiscriticalfortechnological development,enablingunprecedentedengineeringandscientificprogress,suchassendingarockettothemoon.2.2. Algorithmsfordata-drivenmodeling,discoveryoflawsandlearningphysicalconstraints
Therecentandrapidincreaseintheavailabilityofmeasurementdataofphysicalsystemshasspurredthedevelopment of many data-driven methods for modeling and predicting dynamics. At the forefront of data-driven methods are deep neuralnetworks(DNNs).DNNsnotonlyachievesuperiorperformancefortaskssuchasimage classification,buttheyhave alsobeenshowntobeeffectiveforfuturestatepredictionofdynamicalsystems[10].AkeylimitationofDNNs,andsimilar data-driven methods, is the lack of interpretability of the resulting model: they are focused on prediction and do not providegoverningequationsorclearlyinterpretablemodelsintermsoftheoriginalvariableset.Analternativedata-driven approach usessymbolicregressiontoidentifydirectly thestructureofa nonlineardynamicalsystemfromdata[17]. This worksremarkablywellfordiscoveringinterpretablephysicalmodels,butsymbolicregressioniscomputationallyexpensive and can be difficult to scale to large problems. However, the discovery process can be reformulated in terms ofsparse regression [8], providing a computationally tractable alternative, thus leveraging the power of symbolic regression with computational tractability. Thesecontrasting techniquesshow the diversity ofstrategies that can be employed toextract meaningfulphysicsfromdata.Theyalsohighlightthefactthatmachinelearningandartificialintelligencealgorithmsmaybe capableoflearningphysicsprinciplesandconstraints[9].Usingmodernsparseregressionarchitecturesandneuralnetworks, severalcriticaltasksmaybe enactedfromdataalone:(i) thediscovery offirstprinciplesmodels,(ii)theidentificationof physical constraints and conservation laws, and(iii) improved models using known physics. A diversity of architectures
allowsonetoalsodevelopblackboxandgraybox1 modelingstrategiesforcomplexsystemswherephysicsisonlypartially known.Additionally,thearchitecturenotonlyallowsonetoimposephysicsconstraints,orbake-inphysics,butitcanalso discoverphysicalconstraints thatneed tobe baked-in,i.e.one can constrainlearning andonecan learnconstraints [18]. Thus,notonlymayparsimoniousandinterpretablephysicalmodelsbediscoveredasadirectresultofsuchstrategies,but critical insightssuch asconservationlaws andphysical constraintscould also be discovered. Theseinnovations have the potentialtodiscovergeneralizablemodelswhichcanbemodifiedtohandlemulti-scalephysics,noisysystems,andlimited data.
3. Data-drivenmodelinginmechanicalengineeringandmaterialsscience 3.1. Thechangeofparadigminsolidmechanics
Whereasdata-driven(big-data)applicationshavebeenextensivelyusedinmanyfieldsformorethanadecade,thistype ofapproachhasattractedtheattentiononlyrecentlytoresearchersinthefield ofmodelingofsolids. Oneofthereasons forthisisthattraditionally,themechanicsofsolidshasfollowedaquitesuccessfuldeterministicapproachinwhich,with relativelylittleavailableexperimentaldata,relativelymeaningfulpredictionswhereobtainedingeneral,complexsituations. Furthermore,information about thebehavior of a material has been traditionallypassed to thecommunity through the specificationoffewmaterialparametersforaspecificconstitutivemodel.However,thedata-drivenchangeofparadigmhas reachedalsothesolid mechanicscommunity.Theseare mostprobablythemain reasons.(1)Currentlythecomputational power islarge, so the analysisof nonlinearsolids isroutinely being performedin industry.This hasfostered interestin simulatingmorecomplexmaterials,whichatthesametimearegeneratedandoptimizedbymaterialscientiststoachieve somedesiredproperties.(2)Thevarietyfoundinbiologicalmaterials,includinglivingmaterials,alongwiththedifficultyof theircharacterizationbecauseofthedifferentstructurefromspecimentospecimen, fromlocationtolocation,andbecause of aging andtime, alsoprompted the search for modeling tools that are not based on a specific modeling structure or function,butthatcanrepresentalargervarietyofmaterials,conceptuallysimilarbutwithawidespanofpossiblebehaviors. Withinthisfamilyofapproaches, areconstitutive manifold approaches.(3)Currently thereisa large amountofavailable material data formanytypes ofmaterials, so there is a need forunstructured modelingthat iscapable forassimilating thesedata,possiblyobtainedfromdiversetypesoftestingorobservationsunderdifferentconditions.(4)Therearecurrently successfulmodelorder reductiontechniques whichreduce the curseof dimensionality,allowing ustodevelop a modern version ofthe sliderule,precomputingoff-line theproblemformanypossibilities andsaving areducedrepresentationof them,soinformationcanbeeasilypassedoverandusedtorebuildspecificsolutionsatagiventime.
3.2. Constitutivemanifoldsandreducedrepresentationsformultiscaleanalysis
Oneofthemainproblemsaddressedcurrentlybydata-drivenmodelsisthemultiscaleanalysisofheterogeneous mate-rials, see[19] andthe morerecentworks[20–25].Forthecaseofsoils, multi-field,multi-scaleporoplasticitydata-driven modeling usingrecursive homogenizations and deep learning hasbeen performedby Wangand Sun[26]. Nonlinear hy-perelastic problems havebeen alsoaddressed in[27] through constitutivemaps. Here, one ofthe main purposes ofthe data-drivenprocedures istovirtually testmicrostructured solids, probablywithvery complexmicrostructures,andto de-velopaconstitutive manifold,seeFig.1.Thismanifold iscomputedoff-line, oftenthroughreducedsamplingandreduced representation(Fig.1),forexamplehyper-reduction[28,29] andProperGeneralizedDecompositions(PGD)[20,30,31].PGD approximationshavealsobeen usedcombinedwiththe LATINapproachformultiscaleanalysis[32].In[33],PGDis effi-cientlyconsideredfordata-assimilation.Onceanoff-linerepresentationisobtained,duringanalysisatthecontinuumlevel, a material behaviorrepresentation closest tothe precomputedmanifold is searched.The advantage isthe general repre-sentation ofmaterial behavior andmoderateonlinecomputational effort. Tothisregard, clustering techniqueshavebeen presentedasatoolforavoidingthecurseofdimensionality[34,35].Clusteringhasbeenusedforlongtimeindatadriven techniques,see,e.g., [36].In[37,38],seealso[39],numericallyexplicitsplinepotentials(NEXP)areproposed torepresent thehyperelasticbehaviorofmultiscalematerials,andthecoefficientsarecomputedsamplingthematerialinastrainspace. The advantage isthat an analytical differentiable function is available andconstitutive tangents forNewton schemes are easilycomputed.
Within the framework of solid mechanics, there are other applications where data-driven approaches are increas-inglypursued. Forexample,in[40],within thecontextofIntegratedComputationalMaterialsScience Engineering(ICME), reduced-orderdata-drivenmodelingisemployedforassessingthehighcyclefatigueperformanceofpolycrystallinealpha-Ti structures. Fatigueproblems arealso analyzed in [41] and alsoin[24], wherea data-driven approachis usedto identify the smallfatigue crackdriving force. A review ofdata science inmaterials science is givenin [42]. Diffusion inrandom heterogeneousmedia is studiedin[43]. Areview ofdata-driven techniques,classificationofvariables andmaterial prop-erties,andspeciallyofmachinelearninginmaterials informatics,isgivenin[44].In[45],modelreductiontechniquesare
1 Incontrasttotheoretical(fullydeterminedfromphysics,nodataneeded)andblack-boxmodels(fullydeterminedfromdata,nophysicalunderstanding
needed),a“graybox”modelisamodelemployingsomephysicsorbasicunderstanding,butwhichstillneedstobecalibratedfrom,orcompletedwith, data.
Fig. 1. Data-drivenapproachesinsolidmechanics.Macro(left):variablesandpossibleparametersaremacroscopic;data-drivenproceduresdeterminethe constitutive/designmanifoldsfromobservedmacroscopicbehavior.Micro-macro(center):amodelforthemicroscopicbehaviorisused,withmaterial parametersmeaningfulatthemicroscopicscale.Massivesimulationsareperformedtodevelopeithermicroormacroconstitutivemanifoldsasafunction ofparametersatthemicroscale.Reducedrepresentationsmaybeusedatanylevel.Macro-micro-macro(right):Minimalmicroscopicinformationisused (e.g.,thematerialstructure),assumedconstitutivelawsareavoided.Therawkinematicmicrostructuraldependenciesarepushedtothecontinuumscale, carryingthemicrostructuralvariables.Compliantmacro-microbehaviorisobtainedatthecontinuumlevel,solvingbothbehaviorsatonce.
alsoemployedforviscoplasticanalyses.Complexrelationsbetweenvariablesofprocessesandstructurerelationsinadditive manufacturingarealsoobtainedthroughamultiscale,multiphysicsapproachin[31].Adifferentapplicationisgivenin[22], where the nonlinearanisotropic electrical response of graphene/polymernanocomposites isstudied employing computa-tionalhomogenization basedonneuralnetworks.Neuralnetworkshavealsobeenusedinthecharacterizationofmaterials withaflexibleanalyticalmodelinthebackground(e.g.,[46],[47]).Thissamegroupusedgeneticalgorithmsanddata-based metamodelsoffrictionlaws(frictionresponsesurfaces)inthedesignoftirestooptimizetheirwearperformance[48],[49]. 3.3. Continuumapproachandrepresentationfunctions
As mentioned,data-drivenapproachesinsolidmechanicsentail usuallyalargenumberoffiniteelementcomputations to determinethebehaviorofthematerialforarepresentativeamountofcombinationsandloadingdomain,typicallytens [50] orhundredsofthousands[38] ofanalyses.Then,itisimportanttoselectaproperreducedsamplingandrepresentation approach [21]. Somehowin thislineisthecubic splinerepresentationforhyperelasticmaterials [51,52],which hasbeen extended toanisotropicmaterials [53,54] anddamage [55].Thistypeofprocedurehasalsobeenemployedinlargestrain nonequilibrium(strain-leveldependent)viscoelasticity[56,57].Data-drivenapproachesarealsousedinsolidmechanicsfor different purposes.For example,to be ableto determine thebehavior ofa material fromobserved deformation patterns of structures [58–60]. The idea isto fully substitute the material lawsby constitutive manifolds[61,62] preserving only the conservationlaws.Adata-drivenformulationfullyconsistent withthermodynamicswasproposed in[33,63]. The ini-tial smallstrain deterministicapproach in[58] hasalso beenextendedto noisy dataanddynamics[64], whereShannon entropy-maximizingschemes,frequentlyusedinimageprocessing[65] areemployed.Similarproblemsareaddressedinthe animation industry,whererealistic simulationsofthedeformationsofclothforrealistic visualperceptionarealsopursued [66].Ahybridapproachcombiningfirstordermodelswithdata-basedenrichmentwasaddressedin[67].
3.4. Bio-inspiredmodelsanddata-drivenmodelinginbiomechanics
Inthefieldofbiomechanics,andspeciallyincharacterizingsofttissues,cellsandtheirbehavior,data-drivenapproaches lookpromising,becauseadeepknowledgethatmaybringtraditionallaws,andevenrelationsbetweenvariables,islacking. Forexample,in[68] data-driven reduced-ordermodelsfromatomisticsimulationsare employedtodevelopamicrotubule model forcells. Theinterest indata-driven approachesforbiomechanicsishighlighted in[69]. Withinthecontext of bi-ologicalsystems,a polynomial orderheuristicalgorithm isdevelopedin[70] with thepurposeofinferring thegoverning
behaviorofdynamicalsystems.Areviewofdata-drivenmodelingofbiologicalprocesses,atdifferentscalesandfrom differ-entperspectives,isgivenin[71].Biological systemsalsoinspiredata-drivenapproachesastheso-calledartificialimmune systems (see,e.g., [72–74]).Thisis an adaptative computationalsystemwhich replicates some propertiesofthe immune systemaserrortolerance,redundancy anddiversityofsystems,distributionoftasks,dynamic learning,systemadaptation and, specially,self-monitoring.Thistechnique hasbeenused,forexample,by[75,76] for damagedetectionincomposites andinStructuralHealthMonitoring(SHM).InSHM,thecombinationofphysics-basedassessmentandofdata-driven proce-duresindamageidentificationcanbefoundin[77].Previousdata-drivenapproaches, asstochasticsubspaceidentification, werealsousedforfiniteelementmodelupdatingindamageevolutionin[78].
3.5. Physics- andstructure-baseddata-drivenapproaches
The difficulties in considering all aspects of engineering modeling and science with data-driven procedures, andthe interestintakingadvantageofthelearningsfromtheclassicalapproach,iscurrentlymotivatingamixedapproachinwhich data-drivenmodelingisguidedby somephysicalinsight[79].The purposeofdevelopingmixedapproachesisto improve thereliabilityoftheobtainedrelationsthroughfundamentalprinciples,likeconservationlaws(e.g.,energyconservationand maximumentropy).
Theusualapproaches,explainedinthepreviousparagraphs,areeithermacroormicro-macroapproaches, seeFig.1.In theformer,alltheprocedureisperformedandthecontinuumscale.Inmicro-macro(structure-based)approaches, constitu-tiverelationsaredefinedatthemicroscaleasafunctionofmaterialparameters.Inamacro-micro-macroapproach(Fig.1), themicroscaleonlydefinessomekinematicrelationsbetweenmicromechanicalvariables.Theserelationsarepushedtothe continuum scalerelatingthemto macroscopicones. Atthecontinuum level,adata-drivenmethodisapplieddetermining thebehavioratbothscales[80].
4. Data-drivenproceduresinotherengineeringfields
Apartfromthealreadymentionedengineeringapplications,data-drivenproceduresare beingusedinalargevarietyof otherengineeringfields.Inthissectionwejustsamplerepresentativeapplicationsindifferenttopics.
4.1. Industrialprocessing
Data-driven techniqueshavebeenused forsome yearsinindustrialprocessingbothformonitoringandforpredicting. A review of thesetechniques maybe found in[81]. Data-driven techniques combinedwith physically-based models are alsopresentinvirtual,digitalandhybridtwinsasreportedin[82].Data-drivenmodelingoftheproductionprocessesinthe automotiveindustrycanbefoundin[83].Intheproductionofbiofuels,createdbymicroorganisms,wherethereisaneedto engineerthemicroorganism’smetabolism,theoptimizationofthehostandthepathwaysastomaximizetheproductionof thefuelisperformedbydata-drivenapproachesin[84].Softsensorsarecomputer-basedvirtualsensorsgivinginformation aboutaprocess.Theyareused,forexample,innewautomobilestogiveremainingfuelreading,avoidingoscillationsofthe gauge.Anearlyreviewofdata-drivensoftsensorsinthechemicalproductionindustryarereviewedin[85].Toimprovethe duration ofproducts andtoavoidproblems duetostochastic variations inbatch processes,asubspace-aided data-driven approach isproposed in[86] and appliedto fed-batchpenicillinproduction.Physics-baseddata-driven modelingisbeing proposedinproductionengineeringandincontrol,e.g.,tocontrolheating/coolingsystemsinbuildings[87].
4.2. Gasandoilindustry
Of course, the oil and gas industry, as well as earth scientists and engineers, have also benefit from data-driven procedures inallaspects ofmodelingsoilbehavior, explorationandproduction,includingseismicanalysis, reservoir char-acterization,managementandproduction.Thebook[88] givesasummaryofdata-driventechniquesusedinthefield.Data mining,patternrecognitionandmachinelearningalgorithmsareusedin[89] forfullreservoirmodelingofshaleassetsfor hydrocarbonproduction.Mixedphysics-based,data-drivenapproachesarealsostartingtobeproposedinthisfield,see,e.g., [90–92].Inthecaseofearthquakeengineering, Songetal. [93] givedata-drivencomputercodes foraccurately predicting theperformanceofbuildingsfromrawdatabases.
4.3. Fluidandparticledynamics
Data-driven proceduresare beingincreasingly usedinfluid dynamics,especially toimproveaccuracyofsimulationsin turbulencewhenusingReynoldsAveragedNavierStokes(RANS)modeling.Duraisamy[94] developedadata-drivenapproach tomodelturbulentandtranslationalflows.Fromtheir data-drivenprocedure,applyinginverseproblems,theyinferredthe functionalformofdeficienciesinknownclosuremodelsandthenimprovedthemusingmachinelearningtoobtainaccurate predictions;seealso[95].Data-drivenapproaches(convolutionalnetworks)arealsousedin[96] toaccelerateNavier-Stokes simulations.Data-drivenmodelingtoimprovethepredictionoftheReynoldsstressanisotropyintheshearlayerof jet-in-crossflowsimulationsisalsoused[97].In[98],a reviewofdata-driventechniquestomodelturbulence,withemphasisin
Fig. 2. Thestepstodrug-discovery.Artificialintelligenceanddata-drivenproceduresusingmultipledatasets/databasesmaysubstantiallyacceleratethefirst steps,inwhichmoleculesareselected,combined,improvedandrefined,savingimportantfunds.
reducinguncertaintiesinRANSmodelsisgiven.Data-drivendimensionreductionproceduresappliedtodynamicalsystems, bothformodaldecompositionsandfortransferfunctions,arestudiedin[99] amongothers.Reducedorderrepresentations based on dynamic mode decomposition methods [100] combined with proper orthogonal decompositions are also em-ployedinnavalhullshapeoptimizationforimproveddragandliftproperties[101].Threedifferentmethodstolearnslosh dynamics andadvancingthesolution intime havealsobeen studiedin[102], namelyProperOrthogonal Decomposition, LocallyLinearEmbeddingandTopologicalDataAnalysis.Thesemethodsprovidedsolutionsfasterthanrealtimeinalaptop computer.
A different applicationespecially well-suited for data-driven procedures is the simulation ofcrowds. The behavior of crowdsandindividualswithina crowdhasbeenfrequentlymodeledasfluidsandparticles withinit,butitisrecognized that actual crowdsfollowcomplexlawsbecausethey reactto externalandinternal stimuli. Theworks[103,104] develop an agent-based data-driven simulation procedure to simulatecrowds in computer graphics simulations. The movements of thecrowdshavebeen obtainedfromaerial recordings,fromwhichthe behaviorofindividualsobtainedfromthose of surroundingoneswasdevised.Asimilarwork,butaimedatemergencyplans,ispresentedin[105].
4.4. Bioengineering
Bioengineeringisanotherfield inwhichdata-drivenalgorithmsmayfindextraordinaryapplications.Forexample, phys-iologicalvariablesasbloodglucoseandhormonessuchasinsulin,cortisol,epinephrine,glucagon,andtheirdynamic inter-relationsdeterminethemetabolicconditionsinpatientswithdiabetes.Data-drivenmodelsfordiagnosis,glucoseprediction ininsulin-dependentpatientsandtreatmentmanagementmaybefound,forexample,inthebookeditedbyMarmarelisand Mitsis [106].In[107], aninversedata-drivenregressionprocedureisdevelopedtocompute thecardiacelectricdiffusivity fromelectrocardiogramsignals. Also,inmedicalimaging,[108] employnon-parametricdensityestimationandedge confi-dencemapsinthesegmentation ofbrainimagesobtainedfrommagneticresonanceimaging. Ofcourse,data-driventools asanaidtomedicaldiagnosisanddecisionshavebeenusedformorethantwodecades[109].Examplesofapplicationsare [110] forbreastcancerand[111] forcoronaryheartdisease.AreviewofBigDataapplicationsinbiomedicalresearchmay befoundin[112].
5. Data-drivenchemistryanddrugdiscovery 5.1. Therelevanceofdata-drivenapproachesinchemistry
Thediscoveryofnewchemicalcompoundssuchassmallmoleculedrugs,andtheassignmentofnewapplicationlabels to existing ones are very complex processes. Usually, the lack of deterministic approaches to predict performance from structureandthecomplexityofcarryingoutdiscreteoptimizationoverchemicalgraphsresultinverycostlytrial-and-error teststoarriveataproductwiththedesiredperformance.Theprocedureofdrugdiscoverytypicallyfollowsdifferentstages (seeFig.2):(1)targetvalidation,(2)primaryandsecondaryassaydevelopment(high-throughputscreening), (3)hittolead compound,(4)leadoptimization,(5)preclinicaldrugdevelopmentand(6)clinicaldrugdevelopment[113].
Theavailabilityofcuratedlabeledandunlabeleddatainthechemicalsciencesisverylargecomparedwithsomeother physicalsciences.Sinceitsoriginsmanydecadesago,theChemicalAbstractsServicehascompiledalistofover144million knownsubstances,andabout67millionproteinandDNAsequences(www.cas.org).Morethan15,000substancesareadded each day. Thesize ofpotential chemical space, however,isoverwhelmingly largerthan ourability toexplore itby hand. Differentestimatesforthenumberofchemicallyaccessiblemoleculesrangefrom1,030to10,100.Therefore,thenumberof possiblecombinationstoexploreisverylarge.Thespeedatwhichdataisbeinggeneratedishigherthanthespeedatwhich
wecananalyzethem.Inthisregard,data-driventechniquesareimportant,andtheyareincreasinglybeingusedinguessing ornarrowingthesearchforcompoundswithgivendesiredcharacteristics[114].Thisisespeciallyimportantbecauseeven thoughthenumberofpossibletargetshasbeenincreasing,theactualnumberofnewdruglaunchesisdecreasing,whereas thecostsassociatedwiththeirdevelopmentisincreasingsteadily[115].
5.2. Data-drivenproceduresindrugdiscovery
Computationalmodelingindrugdiscovery hasbeenusedforsometimeinindustry.Becauseofthehighlycompetitive landscapeandthelargeeconomicincentives,drugdiscoveryisthestrongestdriverinthedevelopmentofcheminformatics anddata-driven toolsin chemistry, includingdata integration[116,117]. An early analysisofthe integration ofdata and knowledge indrugdiscoveryis giveninthereview paperofSearls [118].As mentionedby [119],data-driven algorithms mustbe used toidentifycompounds witha minimumnumberofliabilitiesinlead-like anddrug-likehits, sincehit lists havemorethan1,000compoundsifdrug-likehitsorleadsarenotdiscarded.High-throughputvirtualscreeningapproaches havealonghistoryindrugdiscovery,wherepredictivemachinelearningtoolsareusedtoprioritizecompoundsand iden-tifyleads ininternal databaseswithmillionsofcompounds(Fig. 2). Aninteractive data-drivenvisual analyticstechnique, namedConTour,wasdevelopedbyPartletal. [120] toenabletheanalysisofcompoundsbasedonmulti-correlateddatasets. Areviewofdata-reductiontechniquesindrugdiscoveryusingprincipalcomponentanalysis,Bayesian analysis,hierarchical clustering,similarityanalysisandprojections,canbefoundin[121] andthereinreferences.Oneofthemostappealing re-centdevelopmentsincomputer-drivendrugdiscoveryisthecombinationofsupervisedmachinelearningmodelstopredict performance withunsupervised modelsto generatenovel promisingcompounds ina fullyautomatic manner. Combining thelargenumberofunlabeledchemicalsthatsomehowcharacterizethenatureofaccessiblechemicalspace,andthe abun-dant activity labels from high-through combinatorial chemistry, automatic chemical design is closer to practice. In this line,Gómez-Bombarelli etal. [122] havedevelopedanautomaticprocedurebasedoncontinuousvector representationsof moleculesandtheuseofneuralnetworkstoperforminversedesignofmolecules.Alargenumberofrelatedworkscontinue toexploretheapplicationofdeepneuralnetworkstothistask,includingsyntaxandgrammarbasedgenerativemodelsthat canwritechemicalgraphs[123–125],deep-reinforcementlearningtools[126,127].
6. Dataqualityandstochasticprocesses
Inanydata-drivenmodelorprocess, dataquality isveryimportant,sinceerroneousorbiaseddatamayproduce erro-neousmodelsanderroneousdecisions.ArecentdramaticexamplemaybethecrashoftwoB-737-Maxairplanes.Whereas thecrashesarestillunderinvestigation,preliminaryfindingsnotethaterroneousdatafromasingleangle-of-attacksensor mayhaveproducedtheanti-stallsoftwaretotilttheairplanedown,anissuecentraltothefatalaccidents[128].Itis appar-entthatdataredundancyanddatatime-seriesanalysesmay,inmostoccasions,enhancedataqualitytoyieldbettermodels andmodelpredictions.AccordingtoISO9001:2015standard,thedefinitionandassessmentofdataqualitydependsonthe context anduseofdata,see also[129,130]. There areseveralreviewson data-qualityresearch, datacuration (correction, reorganization,maintenance,integrationandpreparationofdata),data-qualitydefinitionsandattributesindifferentfields, seeforexample[131,132],and[133] forlinkeddata.Therecentbook[134] reviewsmanydataqualityaspectsindifferent fields.
Withinthecontextofmodellearninginscienceandengineering,quantitativedataquality,asobtainedfromexperiments, seemscentraltothequalityofmodelsandpredictionsthemselves.Inthisregard,somecharacteristics(qualitydimensions) arevery important,asaccuracy, completeness,consistencyandcredibility[135,129].Thereare algorithmsfordata-quality analysisandcuration.Fortimeseries,algorithmscandetectandcorrectissueslikedatashifts,divergingpatterns,unphysical patterns,mismatchedrecordsortrends,noisepatterns,etc,seeforexample[136,137] andthereinreferences.Insomefields like Prognostic and Health Management of systems, data clustering plays a key role in differentiating multiple system conditions.Algorithmsarereviewedin[138] regardingclusteringdifferentiationandqualityenhancement;seealsotherein references.
Thequalityindescriptionofaprocessisoftenrelatedtotheunderstandingofitsstochasticnatureorthatofthevariables andobservationsinvolved,sotheapplicationofdata-drivenmethods tostochasticphenomena andstochastic processesis alsoacurrentareaofresearch.Forexample,Houetal. [139] studytheapproximationcapabilityofdeepgenerativenetworks incapturingtheposteriordistributioninBayesianinverseproblemsthroughlearningatransportmap.Raissietal. introduce in[140] theconceptofparametric Gaussian processesinordertoencode massiveamounts ofdatainasmallnumberof data points.A remarkablefeature isthat their parametric Gaussian processes quantifythe uncertaintyof thepredictions associatedwiththeprocessimperfections.Surrogatemodelsfacilitatesimplerandfasterwaysofinquiringapproximate so-lutions inthespace ofdesignvariables. Forthe caseofstochastic,high-dimensional andvariable-fidelity-sourcesystems, Yang andPerdikaris [141] presentadeeplearning probabilistic proceduretoconstructthesepredictivedata-driven surro-gates, basedoninput-output pairs,withquantified uncertainty.Other worksarethose ofSoizeandGhanem [142] which proposeaprocedureforgeneratingrealizationsofarandomvectorinanunknownsubsetoftheEuclideanspacewhichis consistentwithobservationaldataofthevector,andthatofSoizeandFarhat[143],whichpresentsafastpredictor-corrector approachforcomputingthevector-valuedhyperparameterforanovelnonparametricprobabilisticmethodwiththepurpose ofquantifyingtheuncertaintiesofthemodel-forminnonlinearcomputationalmechanics;seealsothereinreferences.
7. Conclusions
The 21st Centuryis considered the Century ofBig Data. The change ofcentury hasbrought a historic changein our society. Computers, Internet and the new digital devices are producing a large amount of data. Current computational power andcloudcomputingalsoallowforanunforeseen numberofsimulations.However, insteadofbeingoverwhelmed bysuchamountofrawinformation,wearelearninghowtotakeadvantageofthenewparadigm.Softwarecompanieshave taughtusthatmuchbenefitandkeyinformationmaybeobtainedfromdataanalytics.Then,wearelearningnewwaysof doingthings,amongthem,scienceandengineering.
Data-drivenproceduresfocusondataandtrytoextractvariablesandrelationsdirectlyfromrawdata,givingfrequently more accurate responses without the use of classical analytical laws andequations. However, many open questions re-main,andinsome occasions,drawbackshavebeenfoundasthelackoffulfillmentofsomephysicalprinciples.Then,new physics-baseddata-drivenproceduresaregettingin.
Data-drivenproceduresarealsousedtopredictwhatsciencewillbedoneinscience(afieldknownasScienceofScience [144]),whichmayberelevanttofundingagenciesandtoresearcherslookingforlongtermfunding[145].However,weare atthestart ofanewepoque inwhichdatasciencehasshownmanyremarkablesuccess cases,so data-drivenalgorithms arenotneededforpredictingthatfundingagencieswillincreasinglyinvestmoreandmorefundsindata-drivenscience. Acknowledgements
FJMacknowledgessupportfromAgenciaEstataldeInvestigación ofSpain,grantPGC-2018-097257-B-C32.JNK acknowl-edgessupportfromtheAirForceOfficeofScientificResearch(AFOSR)grantFA9550-17-1-0329.
References
[1] Wikipediateam(Addressed11/22/2018):https://en.wikipedia.org/wiki/Novum_Organum.
[2]F.Mazzocchi,Couldbigdatabetheendofthetheoryinscience?Afewremarksontheepistemology ofdata-drivenscience,EMBORep.16 (10)
(2015)1250–1255.
[3]L.Cao,Datascience:acomprehensiveoverview,ACMComput.Surv.50 (3)(2017)43.
[4]U.Fayyad,G.Piatetsky-Saphiro,P.Smyth,Fromdataminingtoknowledgediscoveryindatabases,AIMag.17 (3)(1996)37–54.
[5]T.Hey,S.Tansley,K.M.Tolle,TheFourthParadigm:Data-IntensiveScientificDiscovery,Vol. 1,MicrosoftResearch,Redmond,WA,2009.
[6]C.M.Bishop,PatternRecognitionandMachineLearning,InformationScienceandStatistics,Springer-VerlagNewYorkInc.,Secaucus,NJ,USA,2006.
[7]P.Langley,J.M.Zytkow,Data-drivenapproachestoempiricaldiscovery,Artif.Intell.40(1989)283–312.
[8]S.L.Brunton,J.L.Proctor,J.N.Kutz,Discoveringgoverningequationsfromdatabysparseidentificationofnonlineardynamicalsystems,Proc.Natl. Acad.Sci.(2016),201517384.
[9]J.-C.Loiseau,S.L.Brunton,ConstrainedsparseGalerkinregression,J.FluidMech.838(2018)42–67.
[10]M.Raissi,P.Perdikaris,G.E.Karniadakis,Physicsinformeddeeplearning(PartII):data-drivendiscoveryofnonlinearpartialdifferentialequations, arXiv:1711.10566,2017.
[11]S.H.Rudy,S.L.Brunton,J.L.Proctor,J.N.Kutz,Data-drivendiscoveryofpartialdifferentialequations,Sci.Adv.3 (4)(2017)e1602614.
[12]P.Angelikopoulos,C.Papadimitriou,P.Loumoutsakos,Datadriven,predictivemolecular dynamicsfornanoscaleflowsimulationsunderuncertainty,
J. Phys.Chem.B117 (47)(2013)14808–14816.
[13]P.E.Bourne,V.Bonazzi,M.Dunn,E.D.Green,M.Guyer,G.Komatsoulis,J.Larkin,B.Russell,TheNIHbigdatatoknowledge(BD2K)initiative,J.Amer. Med.Inform.Assoc.22 (6)(2015)1114.
[14]N.Merchant,E.Lyons,S.Goff,M.Vaughn,D.Ware,D.Mickos,P.Antin,TheiPlantcollaborative:cyberintraestructureforenablingdatatodiscovery forthelifesciences,PLoSBiol.14 (1)(2016)e1002342.
[15]A.Gaudinier,S.M.Brady,Mappingtranscriptionalnetworksinplants:data-drivendiscoveryofnovelbiologicalmechanisms,Annu.Rev.PlantBiol.67
(2016)575–594.
[16]R.Lokers,R.Knapen,S.Janssen,Y.vanRanden,J.Jansen,AnalysisofBigDatatechnologiesforuseinagro-environmentalscience,Environ.Model.
Softw.84(2016)494–504.
[17]M.Schmidt,H.Lipson,Distillingfree-formnaturallawsfromexperimentaldata,Science324 (5923)(2009)81–85.
[18]K.Kaiser,J.N.Kutz,S.L.Brunton,Discoveringconservationlawsfromdataforcontrol,in:57thIEEEConferenceonDecisionandControl,MiamiBeach,
FL,Dec17–19,2018,2018,pp. 6415–6421.
[19]B.A.Le,J.Yvonnet,Q.C.He,Computationalhomogenization ofnonlinearelasticmaterialsusingneuralnetworks,Int.J.Numer.MethodsEng.104
(2015)1061–1084.
[20]F.ElHalabi,D.González,J.A.Sanz-Herrera,M.Doblaré,APGD-basedmultiscaleformulationfornon-linearsolidmechanicsundersmalldeformations,
Comput.MethodsAppl.Mech.Eng.305(2016)806–826.
[21]F.Fritzen,O.Kunc,Two-stagedata-drivenhomogenizationfornonlinearsolidsusingareducedordermodel,Eur.J.Mech.A,Solids69(2018)201–220.
[22]X.Lu,D.G.Giovanis,J.Yvonnet,V.Papadopoulus,F.Detrez,J.Bai,Adata-drivencomputationalhomogenization methodbasedonneuralnetworksfor
thenonlinearanisotropicelectricalresponseofgraphene/polymernanocomposites,Comput.Mech.64(2019)307–321.
[23] N.H.Paulson,M.W.Priddy,D.L.McDowell,S.R.Kalidindi,Data-drivenreduced-ordermodelsforrank-orderingthehighcyclefatigueperformanceof polycrystallinemicrostructures,Mater.Des.154(2018)170–183,https://doi.org/10.1016/j.matdes.2018.05.009.
[24] A.Rovinelli,M.D.Sangid,H.Proudhon,W.Ludwig,Usingmachinelearningandadata-drivenapproachtoidentifythesmallfatiguecrackdriving forceinpolycrystallinematerials,Comput.Mech.4(2018)35,https://doi.org/10.1038/s411524-018-0094-7.
[25]W.Yan,S.Lin,O.L.Kafka,Y.Lian,C.Yu,Z.Liu,J.Yan,S.Wolff,H.Wu,E.Ndip-Agbor,M.Mozaffar,K.Ehmann,J.Cao,G.J.Wagner,W.K.Liu,Data-driven multi-scalemulti-physicsmodelstoderiveprocess-structure-propertyrelationshipsforadditivemanufacturing,Comput.Mech.61(2018)521–541.
[26]K.Wang,W.Sun,Amultiscalemulti-permeabilityporoplasticitymodellinkedbyrecursivehomogenizationsanddeeplearning,Comput.Methods
Appl.Mech.Eng.334(2018)337–380.
[27]I.Temizer,T.I.Zohdi,Anumericalmethodforhomogenizationinnon-linearelasticity,Comput.Mech.40(2007)281–298.
[28]D.Ryckelynck,Apriorihyperreductionmethod:anadaptiveapproach,J.Comput.Phys.202(2005)346–366.
[30]F.Chinesta,A.Leygue,F.Bordeu,J.V.Aguado,E.Cueto,D.Gonzalez,I.Alfaro,ParametricPGDbasedcomputationalvademecumforefficientdesign,
optimizationandcontrol,Arch.Comput.MethodsEng.20 (1)(2013)31–59.
[31]D.Neron,P.Ladeveze,Propergeneralizeddecompositionformultiscaleandmultiphysicsproblems,Arch.Comput.MethodsEng.17(2010)351–372.
[32]M.Cremonesi,P.-A.Neron,D.Guidault,P.Ladeveze,APGD-basedhomogenizationtechniquefor theresolutionofnonlinearmultiscaleproblems,
Comput.MethodsAppl.Mech.Eng.267(2013)275–292.
[33]D.González,A.Badias,I.Alfaro,F.Chinesta,E.Cueto,Modelorderreductionforreal-timedataassimilationthroughextendedKalmanfilters,Comput.
MethodsAppl.Mech.Eng.326(2017)679–693.
[34]M.A.Bessa,R.Bostanabad,Z.Liu,A.Hu,D.W.Apley,C.Brinson,W.Chen,W.K.Liu,Aframeworkfordata-drivenanalysisofmaterialsunderuncertainly:
counteringthecurseofdimensionality,Comput.MethodsAppl.Mech.Eng.320(2017)633–667.
[35]S.Tang,L.Zhang,W.K.Liu,Fromvirtualclusteringanalysistoself-consistentclusteringanalysis:amathematicalstudy,Comput.Mech.62(2018) 1143–1460.
[36]P.Ma,C.I.Castillo-Davis,W.Zhong,J.S.Liu,Adata-drivenclusteringmethodfortimecoursegeneexpressiondata,NucleicAcidsRes.34(2006) 1261–1269.
[37]J.Yvonnet,D.Gonzalez,Q.-C.He,Numericallyexplicitpotentialsforthehomogenization ofnonlinearelastichomogeneousmaterials,Comput.
Meth-odsAppl.Mech.Eng.198(2009)2723–2737.
[38]J.Yvonnet,E.Monteiro,Q.-C.He,Computationalhomogenizationmethodandreduceddatabasemodelforhyperelasticheterogeneousstructures,Int.
J. MultiscaleComput.Eng.11(2013)201–225.
[39]L.Xia,P.Breitkopf,Multiscalestructuraltopologyoptimizationwithanapproximateconstitutivemodelforlocalmaterialmicrostructure,Comput.
MethodsAppl.Mech.Eng.104(2015)1061–1084.
[40] N.H.Paulson,M.W.Priddy,D.L.McDowell,S.R.Kalidindi,Data-drivenreduced-ordermodelsforrank-orderingthehighcyclefatigueperformanceof polycrystallinemicrostructures,Mater.Des.154(2018)170–183,https://doi.org/10.1016/j.matdes.2018.05.009.
[41]D.Ryckelynck,Hyper-reductionframeworkformodelcalibrationinplasticity-inducedfatigue,Adv.Model.Simul.Eng.Sci.3 (15)(2016).
[42]S.R.Kalidindi,M.DeGraef,Materialsdatascience:currentstatusandfutureoutlook,Annu.Rev.Mater.Res.45(2015)171–193.
[43]B.Ganapathysubramanian,N.Zabaras,Modelingdiffusioninrandomheterogeneousmedia:data-drivenmodels,stochasticcollocationandthe
varia-tionalmultiscalemethod,J.Comput.Phys.226(2007)326–353.
[44] R.Ramprasad,R.Batra,G.Pilania,A.Mannodi-Kanakkithodi,C.Kim,Machinelearninginmaterialsinformatics:recentapplicationsandprospects, Comput.Mat.3(2017)54,https://doi.org/10.1038/s41524-017-0056-5.
[45]N.Relun,D.Neron,P.A.Boucard,AmodelreductiontechniquebasedonthePGDforelastic-viscoplasticcomputationalanalysis,Comput.Mech.51
(2013)83–92.
[46]C.Zopf,M.Kaliske,Numericalcharacterizationofuncuredelastomersbyaneuralnetworkbasedapproach,Comput.Struct.182(2017)504–525.
[47]I.Kopal,I.Labaj,M.Harnicarova,J.Valicek,D.Hruby,Predictionofthetensileresponseofcarbonblackfilledrubberblendsbyartificialneural
network,Polymers10 (6)(2018)644.
[48]A.Serafinska,N.Hassoun,M.Kaliske,Numericaloptimizationofwearperformance.Utilizingametamodelbasedfrictionlaw,Comput.Struct.165
(2016)10–23.
[49]W.Graf,M.Gütz,F.Leichsenring,M.Kaliske,Computationalintelligenceforefficientnumericaldesignofstructureswithuncertainparameters,in:
2015IEEESymposiumSeriesonComputationalIntelligence,2015,pp. 1824–1831.
[50]S.Bhattacherjee,K.Matous,Anonlinear manifold-basedreducedordermodelformultiscaleanalysisofheterogeneoushyperelasticmaterials,J.
Com-put.Phys.313(2016)635–653.
[51]T.Sussman,K.J.Bathe,Amodelofincompressibleisotropichyperelasticmaterialbehaviorusingsplineinterpolationsoftension-compressiondata,
Commun.Numer.MethodsEng.25 (1)(2009)53–63.
[52]J.Crespo,M.Latorre,F.J.Montáns,WYPIWYGhyperelasticityforisotropic,compressiblematerials,Comput.Mech.59 (1)(2017)73–92.
[53]J.Crespo,F.J.Montáns,Function-refreshalgorithmsfordeterminingthestoredenergydensityofnonlinearelasticorthotropicmaterialsdirectlyfrom experimentaldata,Int.J.Non-LinearMech.107(2018)16–33.
[54]J.Crespo,F.J.Montáns,Generalsolutionprocedurestocomputethestoredenergydensityofconservativesolidsdirectlyfromexperimentaldata,Int. J. Eng.Sci.141(2019)16–34.
[55]M.Miñano,F.J.Montáns,WYPiWYGdamagemechanicsforsoftmaterials:adata-drivenapproach,Arch.Comput.MethodsEng.25(2018)165–193.
[56]M.Latorre,F.J.Montáns,Fullyanisotropicfinitestrainviscoelasticitybasedonareversemultiplicativedecompositionandlogarithmicstrains,Comput. Struct.163(2016)56–70.
[57]M.Latorre,F.J.Montáns,Strain-leveldependentnonequilibriumanisotropicviscoelasticity:applicationtotheabdominalmuscle, J.Biomech.Eng.
139 (10)(2017)101007.
[58]T.Kichdoerfer,M.Ortiz,Data-drivencomputationalmechanics,Comput.MethodsAppl.Mech.Eng.304(2016)81–101.
[59] A.Leygue,M.Coret,J.Rethore,L.Stainier,E.Verron,Datadrivenconstitutiveidentification,2017,HALID:hal-01452492v2. [60]L.T.K.Nguyen,M-A.Keip,Adata-drivenapproachtononlinearelasticity,Comput.Struct.194(2018)97–115.
[61]R.Ibañez,D.Borzacchiello,J.V.Aguado,E.Abisset-Chavane,E.Cueto,P.Ladeveze,F.Chinesta,Data-drivennon-linearelasticity:constitutivemanifold
constructionandproblemdiscretization,Comput.Mech.60(2017)813–826.
[62]R.Ibañez,E.Abisset-Chavanne,J.V.Aguado,D.Gonzalez,E.Cueto,F.Chinesta,Amanifold-basedmethodologicalapproachtodata-driven
computa-tionalelasticityandinelasticity,Arch.Comput.MethodsEng.25 (1)(2018)47–57.
[63] D.González,F.Chinesta,E.Cueto, Thermodynamicallyconsistentdata-drivencomputationalmechanics,Contin. Mech.Thermodyn.31 (1)(2019) 239–253,https://doi.org/10.1007/s00161-018-0677-z.
[64]T.Kirchdoerfer,M.Ortiz,Datadrivencomputingwithnoisymaterialdatasets,Comput.MethodsAppl.Mech.Eng.326(2017)622–641.
[65]S.F.Gull,J.Skilling,Maximumentropymethodinimageprocessing,IEEProc.F,Commun.RadarSignalProcess.131(1984)646–659.
[66]H.Wang,J.F.O’Brien,R.Ramammoorthi,Data-drivenelasticmodelsforcloth:modelingandmeasurement,ACMTrans.Graph.30 (4)(2011)71.
[67] R.Ibañez,E.Abisset-Chavanne,D.Gonzalez,J.L.Duval,E.Cueto,F.Chinesta,Hybridconstitutivemodeling:data-drivenlearningofcorrectionsto plasticitymodels,Int.J.Mater.Form.12 (4)(2019)717–725,https://doi.org/10.1007/s12289-018-1448-x.
[68]Y.Feng,S.Mitran,Data-drivenreduced-ordermodelofmicrotubulemechanics,Cytoskeleton75(2018)45–60.
[69]J.Ku,J.L.Hicks,T.Hastie,J.Leskivec,C.Re,S.L.Delp,Themobilizecenter:anNIHbigdatatoknowledgecentertoadvancehumanmovementresearch
andimprovemobility,J.Amer.Med.Inform.Assoc.22 (6)(2015)1120–1125.
[70]E.Cai,P.Ranjan,P.Pan,M.Wuebbens,D.Marculescu,Efficientdata-drivenmodellearningfordynamicalsystems,in:ProceedingsoftheIntelligent
SystemsforMolecularBiologyMeeting,ISMB2016,Orlando,July8-12,2016,2016.
[71] J.Hasenauer,N.Jagiella,S.Hross,F.J.Theis,Data-drivenmodelingofbiologicalmulti-scaleprocesses,J.CoupledSyst.MultiscaleDyn.3 (2)(2015) 101–121,https://doi.org/10.1166/jcsmd.2015.1069.
[72]B.Chen,C.Zang,Artificialimmunepatternrecognitionforstructuredamageclassification,Comput.Struct.87(2013)1394–1407.
[73]S.A.Hofmeyr,S.Forrest,Architectureforanartificialimmunesystem,Evol.Comput.8 (4)(2000)443–473.
[75]M.Anaya,D.A.Tibaduiza-Burgos,F.Pozo,Abioinspiredmethodologybasedonanartificialimmunesystemfordamagedetectioninstructuralhealth
monitoring,ShockVib.2015(2014)648097.
[76]M. Anaya,D.A.Tibaduiza,F.Pozo,Detectionandclassificationofstructuralchanges usingartificial immunesystemsandfuzzyclustering,Int.J.
Bio-Inspir.Comput.9 (1)(2017)35–52.
[77]F.J.M.GuerradosSantosCavadas,StructuralHealthMonitoringofBridges:Physics-BasedAssessmentandData-DrivenDamageIdentification,Ph. D.
thesis,UniversidadedoPorto,2016.
[78]A.S.Kompalka,S.Reese,O.T.Bruhns,Experimentalinvestigationofdamageevolutionbydata-drivenstochasticsubspaceidentificationanditerative
finiteelementmodelupdating,Arch.Appl.Mech.77 (8)(2007)559–573.
[79]A.Karpatne,G.Alturi,J.H.Faghmous,M.Steinbach,A.Banerjee,A.Ganguly,S.Shekhar,N.Samatova,V.Kumar,Theory-guideddatascience:anew paradigmforscientificdiscoveryfromdata,IEEETrans.Knowl.DataEng.29(2017)2318–2331.
[80]V.J.Amores,J.M.Benitez,F.J.Montáns,Data-driven,structure-basedhyperelasticmanifolds:amacro-micro-macroapproach,arXiv:1903.11545 [cond -mat.mtrl-sci].
[81]S.Yin,S.X.Ding,X.Xie,H.Luo,Areviewonbasicdata-drivenapproachesforindustrialprocessmonitoring,IEEETrans.Ind.Electron.61 (11)(2014) 6418–6428.
[82] F.Chinesta,E.Cueto,E.Abisset,J.L.Duval,F.ElKhaldi,Virtual,digitalandhybridtwins.Anewparadigmindata-basedengineeringandengineered data,Arch.Comput.MethodsEng.(2019),https://doi.org/10.1007/s11831-018-9301-4,inpress.
[83]J.Wang,Q.Chang,G.Xiao,N.Wang,S.Li,Datadrivenproductionmodelingandsimulationofcomplexautomobilegeneralassemblyplant,Comput.
Ind.62 (7)(2011)765–775.
[84]P.P.Peralta-Yahya,F.Zhang,S.B.DelCardayre,J.D.Keasling,Microbialengineeringfortheproductionofadvancedbiofuels,Nature488 (7411)(2012) 320.
[85]P.Kadlec,B.Gabrys,S.Strandt,Data-drivensoftsensorsintheprocessindustry,Comput.Chem.Eng.33 (4)(2009)795–814.
[86]S.Yin,S.X.Ding,A.H.A.Sari,H.Hao,Data-drivenmonitoringforstochasticsystemsanditsapplicationonbatchprocess,Int.J.Syst.Sci.44 (7)(2013) 1366–1376.
[87]S.A.Vaghefi,M.A.Jafari,J.Zhu,J.Brouwer,Y.Lu,Ahybridphysics-basedanddatadrivenapproachtooptimalcontrolofbuildingcooling/heating systems,IEEETrans.Autom.Sci.Eng.13 (2)(2014)600–610.
[88]K.R.Holdaway,HarnessOilandGasBigDatawithAnalytics:OptimizeExplorationandProductionwithData-DrivenModels,Wiley,NewJersey,2014.
[89]S.Esmaili,S.D.Mohaghegh,Fullfieldreservoirmodelingofshaleassetsusingadvanceddata-drivenanalytics,Geosci.Front.7 (1)(2016)11–20.
[90]Y.Zhang,J.He,C.Yang,J.Xie,R.Fitzmorris,X-H.Wen,Aphysics-baseddata-drivenmodelforhistorymatching,prediction,andcharacterizationof unconventionalreservoirs,Soc.Pet.Eng.J.23 (4)(2018)SPE-191126-PA.
[91] Z.Guo,A.C.Reynolds,H.Zhao,Aphysics-baseddata-drivenmodelforhistorymatching,prediction,andcharacterizationofwaterfloodingperformance, Soc.Pet.Eng.J.(2018),https://doi.org/10.2118/182660-PA.
[92]T.Kaneko,R.Wada,M.Ozaki,T.Inoue,Combiningphysics-basedanddata-drivenmodelsforestimationofWOBduringultra-deepoceandrilling,in:
ASME201837thInternationalConferenceonOcean,OffshoreandArctic Engineering,Madrid,Spain,2018,PaperOMAE2018-78229,V008T11A007.
[93] I.Song,I.H.Cho,R.W.Wong,Anadvancedstatisticalapproachtodata-drivenearthquakeengineering,J.Earthq.Eng.(2018),https://doi.org/10.1080/ 13632469.2018.1461713,inpress.
[94]K.Duraisamy,Z.J.Zhang,A.P.Singh,Newapproachesinturbulenceandtransitionmodelingusingdata-driventechniques,in:53rdAIAAAerospace
SciencesMeeting,AIAASciTechForumAIAA2015-1284,2015,pp. 1–14.
[95]E.J.Parish,K.Duraisamy,Aparadigmfordata-drivenpredictivemodelingusingfieldinversionandmachinelearning,J.Comput.Phys.305(2016)
758–774.
[96]J.Tompson,K.Schlachter,P.Sprechmann,K.Perlin,Acceleratingfluidsimulationwithconvolutionalnetworks,arXiv:1607.03597,2017.
[97]J.Ling,A.Ruiz,G.Lacaze,J.Oefelein,Uncertaintyanddata-drivenmodeladvancesforajet-in-crossflow,J.Turbomach.139 (2)(2016)021008.
[98] K.Duraisamy,G.Laccarino,H.Xiao,Turbulencemodelingintheageofdata,Annu.Rev.FluidMech.51(2019)357–377,https://doi.org/10.1146/ annurev-fluid-010518-040547.
[99]S.Klus,F.Nüske,P.Koltai,H.Wu,I.Kevrekidis,C.Schötte,F.Noé,Data-drivenmodelreductionandtransferoperatorapproximation,J.NonlinearSci.
28 (3)(2018)985–1010.
[100]V.Beltran,S.LeClainche,J.M.Vega,Temporalextrapolationofquasi-periodicsolutionsviaDMD-likemethods,in:2018FluidDynamicsConference,
AIAAAviationForum(AIAA2018-3092),2018.
[101]N.Demo,M.Tezzele,G.Gustin,G.Lavini,G.Rozza,Shapeoptimizationbymeansofproperorthogonaldecompositionanddynamicmode decompo-sition,arXiv:1803.07368v2,2018.
[102]B.Moya,D.González,I.Alfaro,F.Chinesta,E.Cueto,Learningsloshdynamicsbymeansofdata,Comput.Mech.64(2019)511–523.
[103]K.H.Lee,M.G.Choi,Q.Hong,J.Lee,Groupbehaviorfromvideo:adata-drivenapproachtocrowdsimulation,in:SCA’07:Proceedingsofthe2007
ACMSIGGRAPH/EurographicsSymposiumonComputerAnimation,2007,pp. 109–118.
[104]P.Charalambous,Y.Chrysanthou,ThePAGcrowd:agraphbasedapproachforefficientdata-drivencrowdsimulation,Comput.Graph.Forum33 (8)
(2014)95–108.
[105]J.Zhang,H.Liu,Y.Li,X.Qin,S.Wang,Video-drivengroupbehaviorsimulationbasedonsocialcomparisontheory,PhysicaAStatist.Mech.Appl.512
(2018)620–634.
[106]V.Marmarelis,G.Mitsis,Data-DrivenModelingforDiabetes,LectureNotesinBioengineering,Springer-Verlag,Berlin,2014.
[107]O.Zettinig,T.Mansi,D.Neumann,B.Georgescu,S.Rapaka,P.Seegerer,E.Kayvanpour,F.Sedaghat-Hamedani,A.Amr,J.Haas,H.Steen,Data-driven estimationofcardiacelectricaldiffusivityfrom12-leadECGsignals,Med.ImageAnal.18 (8)(2014)1361–1376.
[108]J.R.Jimenez-Alanaiz,V.Medina-Banuelos,O.Yanez-Suarez,Data-drivenbrainMRIsegmentationsupported onedgeconfidenceandaprioritissue information,IEEETrans.Med.Imaging25 (1)(2006)74–83.
[109]J.P.Kassirer,Diagnosticreasoning,Ann.Intern.Med.110 (11)(1989)893–900.
[110]X.H.Wang,B.Zheng,W.F.Good,J.L.King,Y.H.Chang,Computer-assisteddiagnosisofbreastcancerusingadata-drivenBayesianbeliefnetwork,Int. J. Med.Inform.54 (2)(1999)115–126.
[111]Y.Xing,W.Wang,Z.Zhao,Combinationdataminingmethodswithnewmedicaldatatopredictingoutcomeofcoronaryheartdisease,in:
Interna-tionalConferenceonConvergenceInformationTechnology,Gyeongju,SouthKorea,21-23Nov2007,IEEE,2007.
[112]J.Luo,M.Wu,D.Gopukumar,Y.Zhao,Bigdataapplicationinbiomedicalresearchandhealthcare:aliteraturereview,Biomed.Inform.Insights8
(2016)1–10.
[113]K.H.Bleicher,H.J.Böhm,K.Müller,A.I.Alanine,Hitandleadgeneration:beyondhigh-throughputscreening,Nat.Rev.DrugDiscov.2(2003)369–378.
[114]Y.Jing,Y.Bian,Z.Hu,L.Wang,X-Q.S.Xie,Deeplearningfordrugdesign:anartificialintelligenceparadigmfordrugdiscoveryinthebigdataera, AAPSJ.20(2018)58.
[115]J.Dews,Strategictrendsinthedrugindustry,DrugDiscov.Today8(2003)411–420.
[116]N.Kumar,B.S.Hendriks,K.A.Janes,D.deGraaf,D.A.Lauffenburger,Applyingcomputationalmodelingtodrugdiscoveryanddevelopment,Drug
[117]H.P.Fisher,S.Heyse,Fromtargetstoleads:theimportanceofadvanceddataanalysisfordecisionsupportindrugdiscovery,Curr.Opin.DrugDiscov. Devel.8 (3)(2005)334–346.
[118]D.B.Searls,Dataintegration:challengesfordrugdiscovery,Nat.Rev.DrugDiscov.4(2005)45–58.
[119]T.Wunberg,M.Hendrix,A.Hillisch,M.Lobell,H.Meier,C.Schmeck,H.Wild,B.Hinzen,Improvingthehit-to-leadprocess:data-drivenassessment ofdrug-likeandlead-likescreeninghits,DrugDiscov.Today11(2006)175–180.
[120]C.Partl,A.Lex,M.Streit,H.Strobelt,A.-M.Wassermann,H.Pfister,D.Schmalstieg,ConTour:data-drivenexplorationofmulti-relationaldatasetsfor drugdiscovery,IEEETrans.Vis.Comput.Graph.20 (12)(2014)1883–1892.
[121]T.J.Howe,G.Mahieu,P.Marichal,T.Tabruyn,P.Vugts,Datareduction andrepresentationindrugdiscovery,DrugDiscov.Today12 (1–2)(2007) 45–53.
[122]R.Gómez-Bombarelli,J.N.Wei,D.Duvenaud,J.M.Hernández-Lobato,B.Sánchez-Lengeling,D.Sheberla,J.Aguilera-Iparraguirre,T.D.Hirzel,R.P.Adams, A. Aspuru-Guzik,Automaticchemicaldesignusingadata-drivencontinuousrepresentationofmolecules,ACSCent.Sci.4 (2)(2018)268–276.
[123]M.J.Kusner,B.Paige,J.M.Hernández-Lobato,Grammarvariationalautoencoder,arXiv:1703.01925 [stat],2017.
[124]W.Jin,R.Barzilay,T.Jaakkola,Junctiontreevariationalautoencoderformoleculargraphgeneration,arXiv:1802.04364,2018.
[125]H.Dai,Y.Tian,B.Dai,S.Skiena,L.Song,Syntax-directedvariationalautoencoderforstructureddata,arXiv:1802.08786,2018.
[126]K.Popova,O.Isayev,A.Tropsha,Deepreinforcementlearningfordenovodrugdesign,Sci.Adv.4 (7)(2018)eaap7885.
[127]G.L.Guimaraes,B.Sánchez-Lengeling,P.L.C.Farias,A.Aspuru-Guzik,C.Outeiral,P.L.C.Farias,A.Aspuru-Guzik,Objective-reinforcedgenerative adver-sarialnetworks(ORGAN)forsequencegenerationmodels,arXiv:1705.10843,2017.
[128]J.Nicas,J.Creswell,Boeing’s737Max:1960sDesign,1990sComputingPowerandPaperManuals,TheNewYorkTimes,NYEdition,9Apr2019,A1, 2019.
[129]J.Bicevskis,Z.Bicevska,G.Karnitis,Executabledataqualitymodels,Proc.Comput.Sci.104(2017)138–145.
[130]M.S.Marev,E.Compatangelo,W.Vasconcelos,TowardsaContext-DependentNumericalDataQualityEvaluationFramework,Technicalreport,2018, arXiv:1810.09399.
[131]R.Y.Wang,V.C.Storey,C.P.Firth,Aframeworkforanalysisofdataqualityresearch,IEEETrans.Knowl.DataEng.7 (4)(1995)623–639.
[132]J.Liu,J.Li,W.Li,J.Wu,Rethinkingbigdata:areviewonthedataqualityandusageissues,ISPRSJ.Photogramm.RemoteSens.115(2016)134–142.
[133]A.Zaveri,A.Rula,A.Maurino,R.Pietrobon,J.Lehman,S.Auer,Qualityassessmentforlinkeddata:asurvey,Semant.Web7 (1)(2019)63–93.
[134]C.Batini,M.Scannapieco,DataandInformationQuality.Dimensions,PrinciplesandTechniques,SpringerNatureSwitzerlandAG,2018.
[135]I.Kirchen, D.Schütz,J.Folmer, B.Vogel-Heuser,Metricsfor theevaluation ofdata qualityofsignaldatainindustrial processes,in:IEEE15th InternationalConferenceonIndustrialInformatics,INDIN,24-26July2017,2017.
[136]G.Pastorello,D.Agarwal,T.Samak,C.Poindexter,B.Faybishenko,D.Gunter,R.Hollowgrass,D.Papale,C.Trotta,A.Ribeca,E.Canfora,Observational datapatternsfortimeseriesdataqualityassessment,in:ProceedingsoftheIEEE10thInternationalConferenceoneScience,vol. 1,IEEEComputer Society,2014,pp. 271–277.
[137]R.Gitzel,Dataqualityintimeseries data.Anexperiencereport,in:ProceedingsofCBI2016IndustrialTrack,Paris,France,31Aug2016,2016, pp. 41–49.
[138]Y.Chen,F.Zhu,J.Lee,Dataqualityevaluationandimprovementforprognosticmodelingusingvisualassessmentbaseddatapartitioningmethod,
Comput.Ind.64(2013)214–225.
[139]T.Y.Hou,K.C.Lam,P.Zhang,S.Zhang,SolvingBayesianinverseproblemsfromtheperspectiveofdeepgenerativenetworks,Comput.Mech.64(2019) 395–408.
[140]M.Raissi,H.Babaee,G.E.Karniadakis,ParametricGaussianprocessregressionforbigdata,Comput.Mech.64(2019)409–416.
[141]Y. Yang,P.Perdikaris, Conditionaldeep surrogatemodelsfor stochastic,highdimensional,and multifidelitysystems, Comput.Mech. 64(2019) 417–434.
[142]C.Soize,R.Ghanem,Data-drivenprobabilityconcentrationandsamplingonmanifold,J.Comput.Phys.321(2016)242–258.
[143]C.Soize,C.Farhat,Probabilistic learningformmodelingand quantifyingmodel-formuncertaintiesinnonlinearcomputationalmechanics,Int.J.
Numer.MethodsEng.117(2019)819–843.
[144] S.Fortunato,C.T.Bergstrom,K.Börner,J.A.Evans,D.Helbing,S.Milojevic,A.M.Petersen,F.Radicchi,R.Sinatra,B.Uzzi,A.Vespignani,L.Waltman, D. Wang,A-L.Barabasi,Scienceofscience,Science359 (6379)(2018)eaao0185,https://doi.org/10.1126/science.aao0185.