Structure-preserving neural networks

(1)

Science Arts & Métiers (SAM)

is an open access repository that collects the work of Arts et Métiers Institute of

Technology researchers and makes it freely available over the web where possible.

This is an author-deposited version published in:

https://sam.ensam.eu

Handle ID: .

http://hdl.handle.net/10985/19561

To cite this version :

Quercus HERNÁNDEZ, Alberto BADÍAS, David GONZÁLEZ, Francisco CHINESTA, Elías

CUETO - Structure-preserving neural networks - Journal of Computational Physics p.1-16 - 2020

Any correspondence concerning this service should be sent to the repository

(2)

Structure-preserving

neural

networks

✩

Quercus Hernández

a

,

Alberto Badías

a

,

David González

a

, Francisco Chinesta

b

,

Elías Cueto

a

,

∗

a_Aragon_Institute_of_Engineering_Research_(I3A),_Universidad_de_Zaragoza,_Maria_de_Luna_3,_E-50018_Zaragoza,_Spain b_ESI_Chair_and_PIMM_Lab,_ENSAM_ParisTech,₁₅₅_Boulevard_de_{l’Hôpital,}₇₅₀₁₃_Paris,_France

a

b

s

t

r

a

c

t

Keywords:

Scientiﬁcmachinelearning Neuralnetworks Structurepreservation GENERIC

We develop a method to learn physical systems from data that employs feedforward neural networks and whose predictions comply with the ﬁrst and second principles of thermodynamics. The method employs a minimum amount of data by enforcing the metriplectic structure of dissipative Hamiltonian systems in the form of the so-calledGeneralEquationfortheNon-EquilibriumReversible-IrreversibleCoupling,GENERIC (Öttinger and Grmela (1997) [36]). The methoddoes not need to enforce any kind of balanceequation,andthusnopreviousknowledgeonthenatureofthesystemisneeded. Conservationofenergyand dissipationofentropyinthepredictionofpreviouslyunseen situationsariseasanaturalby-productofthestructureofthemethod.

Examplesoftheperformanceofthemethodareshownthatcompriseconservativeaswell asdissipativesystems,discreteaswellascontinuousones.

1. Introduction

Withtheirruptionoftheso-calledfourthparadigmofscience[17] agrowinginterestisdetectedonthemachinelearning ofscientiﬁc laws.Aplethora ofmethodshavebeendevelopedthat are abletoproducemore orlessaccurate predictions about the response of physical systems in previously unseensituations by employing techniques ranging from classical regressiontothemostsophisticateddeeplearningmethods.

Forinstance,recentworksinsolidmechanicshavesubstitutedtheconstitutiveequationswithexperimentaldata[1,25], whileconservingthetraditionalapproachonphysicallawswithhighepistemicvalue(i.e.,balanceequations,equilibrium). Similar approacheshave appliedthis conceptto the unveiling (orcorrection) ofplasticity models [21], while others cre-ated the newconcept of constitutive manifold [20,22]. Other approaches are designedto unveil an explicit,closed form expressionforthephysicallawgoverningthephenomenonathand[3].

Aninterestisobservedintheincorporationofthealreadyexistingscientiﬁcknowledgetothesedata-drivenprocedures. Thisinterestistwo-fold.Indeed,weprefernottogetridofcenturiesofscientiﬁcknowledgeandrelyexclusivelyonpowerful

✩ _This_project_has_been_partially_funded_by_the_ESI_Group_through_the_ESI_Chair_at_ENSAM_ParisTech_and_through_the_project_“Simulated_Reality:_an intelligence augmentationsystembased on hybridtwinsand augmentedreality”at the UniversityofZaragoza. Thesupportofthe SpanishMinistry ofEconomyandCompetitivenessthroughgrantnumberCICYT-DPI2017-85139-C2-1-RandbytheRegionalGovernmentofAragon,throughtheproject T24_20R,andtheEuropeanSocialFund,arealsogratefullyacknowledged.

*

Correspondingauthor.

E-mailaddresses:quercus@unizar.es(Q. Hernández),abadias@unizar.es(A. Badías),gonzal@unizar.es(D. González),francisco.chinesta@ensam.eu (F. Chinesta),ecueto@unizar.es (E. Cueto).

(3)

machinelearningstrategies.Existingtheorieshaveprovedtobeusefulinthepredictionofphysicalphenomenaandarestill inthepositionofhelpingtoproduceveryaccuratepredictions.Thisistheprocedurefollowedintheso-calleddata-driven computational mechanicsapproachmentionedbefore.Ontheotherhand,thesetheorieshelptokeeptheconsumption of datato aminimum.Data are expensivetoproduce andto maintain. Alreadyexisting scientiﬁc knowledgecould alleviate theamountofdataneededtoproduceasuccessfulprediction.

Thementionedworksondata-drivencomputationalmechanicsusuallyrelyontraditionalmachine learningalgorithms, which are very precise and tested butusually computationally expensive. With the recent advances in data processing, computing resources andmachine learning, neural networks have become a powerful tool to analyze traditionally hard problemssuchasimageclassiﬁcation[26,45],speechrecognition[12,18] ordatacompressing[42,46].Thesenewmachine learningmethodsoutperformmanyofthetraditionalones,bothinmodeling capacityandcomputationaltime(oncetrained, certain neuralnetworkscaneasilyhandlerealtimerequirements). Recentworksinthemachinelearning community[29, 31,34] haveshownthatneuralnetworksarealsoversatileinconstraintoptimizations.

Thisistheapproachfollowedbyseveralauthorsinthecontextofphysicalsimulations,whichaimtosolveasetofpartial differential equations(PDEs) in complexdynamical systems. Physical problemsmust satisfy inherentlycertain conditions dictatedbyphysics,oftenformulatedasconservationlaws,andcanbeimposedtoaneuralnetworkusingextralossterms intheconstrainedoptimizationprocess[32].

Similar constraints areimposed in the so-calledphysically-informedneural networksapproach [40,48]. Thisfamily of methodsemploys neuralnetworkstosolvehighly nonlinearpartialdifferentialequations(PDEs)resultinginveryaccurate andnumericallystableresults.However,theyrelyonpriorknowledgeofthegoverningequationsoftheproblem.

Theauthorshaveintroducedtheso-calledthermodynamicallyconsistentdata-drivencomputationalmechanics[7,9,10]. Unlikeother existing works, thisapproachdoesnot imposeanyparticularbalance equation tosolve for.Instead,it relies ontheimposition oftherightthermodynamicstructure oftheresultingpredictions,asdictatedbytheso-calledGENERIC formalism [15]. As will be seen, this ensures conservation ofenergy and the right amount of entropy dissipation, thus givingrisetopredictionssatisfyingtheﬁrstandsecondprinciplesofthermodynamics.Thesetechniques,however,employ regressiontounveilthethermodynamicstructureoftheproblematthesamplingpoints.Forpreviouslyunseensituations, theyemployinterpolationonthematrixmanifolddescribingthesystem.

Recentworksinsymplecticnetworks[23] haveby-passedthosedrawbacksbyexploitingthemathematicalpropertiesof Hamiltoniansystems,sonopriorknowledgeofthesystemisrequired.However, thistechniqueonlyoperateson conserva-tivesystemswithnoentropygeneration.

Theaimofthisworkisthedevelopmentofanewstructure-preservingneuralnetworkarchitecturecapableofpredicting thetimeevolutionofasystembasedonexperimentalobservationsonthesystem,withnopriorknowledgeofitsgoverning equations,to be valid forboth conservativeand dissipativesystems.The key idea is tomerge theproven computational powerofneuralnetworksinhighlynonlinearphysicswiththermodynamicconsistentdata-drivenalgorithms.Theresulting methodology,aswillbeseen,isapowerfulneuralnetworkarchitecture,conceptuallyverysimple—basedonstandard feed-forwardmethodologies—thatexploitstherightthermodynamicstructureofthesystemasunveiledfromexperimentaldata, andthatproducesinterpretableresults[33].

The outline of the paper is as follows. A brief description of the problem setup is presented in Section 2. Next, in Section 3, the methodology is presentedof both the GENERIC formalism and thefeed-forward neural networksused to solvethestatedproblem.Thistechniqueisusedin differentphysicalsystemsofincreasingcomplexity: adouble thermo-elasticpendulum(Section4)andaCouetteﬂowinaviscoelasticﬂuid(Section5).Thepaperiscompletedwithadiscussion inSection6.

2. Problemstatement

Weinan E seems to be the ﬁrst author in interpreting the process oflearning physical systems as the solution of a dynamicalsystem[6].Considerasystemwhosegoverningvariableswillbehereafterdenotedbyz

∈

M

⊆ R

n,with

M

the statespaceofthesevariables,whichisassumedtohavethestructureofadifferentiablemanifoldin

R

n_.

Theproblemoflearningagivenphysicalphenomenoncanthusbeseenastheoneofﬁndinganexpressionforthetime evolutionoftheirgoverningvariables z,

˙

z

=

dz

dt

=

F

(

x

,

z

,

t

),

x

∈ ∈ R

D

_,

_t

_∈

_I

_{= (}

₀

_,

_T

_],

_z

₍

₀

₎

₌

_z

0

,

(1)

wherex andt refertothespaceandtimecoordinateswithinadomainwithD

=

2

,

3 dimensions. F

(

x

,

z

,

t

)

isthefunction thatgives,afteraprescribedtimehorizonT ,theﬂowmap z0

→

z

(

z0

,

T

)

.

Whilethisproblemcanbe seenasageneralsupervisedlearning problem(weﬁxboth z0 andz), whenwehave addi-tionalinformationaboutthephysics beingrepresentedbythesought function F ,itislegitimatetotrytoincludeitinthe searchprocedure.W.EseemstohavebeentheﬁrstinsuggestingtoimposeaHamiltonianstructureon F ifweknowthat energyisconserved,forinstance[6].Veryrecently,twodifferentapproachesfollowthissamerationale[2,23].

For conservativesystems, therefore,imposing a Hamiltonianstructure seems a very appealingwayto obtain thermo-dynamics-awareresults.However, when thesystemisdissipative, thismethod doesnotprovide withvalidresults.Given

(4)

theimportanteofdissipativephenomena (viscoussolids,ﬂuiddynamics,...)weexploretherightthermodynamicstructure toimposetothesearchmethodology.

ThegoalofthispaperistodevelopanewmethodofsolvingEq. (1) usingstateoftheartdeeplearningtools,inorderto predictthetimeevolutionofthestatevariablesofagivensystem.Thesolutionisforcedtofulﬁllthebasicthermodynamic requirementsofenergyconservationandentropyinequalityrestrictionsviatheGENERICformalism, presentedinthenext section.

3. Methodology

Inthissectionwedeveloptheappropriatethermodynamicstructurefordissipativesystems.Classicalsystemsmodeling canbe doneatavariety ofscales. Wecould thinkofthemostdetailed(yetoftenimpractical) scaleofmolecular dynam-ics,whereenergyconservationappliesandtheHamiltonianparadigmcanbeimposed.However,thenumberofdegreesof freedom and, noteworthy,thetime scale,renders thisapproachoflittleinterest formanyapplications.Ontheother side ofthespectrumliesthermodynamics,whereonlyconserved,invariant, quantitiesare describedandthusthereisnoneed forconservationprinciples.Atanyother(mesoscopic)scale,unresolveddegreesoffreedom giverisetotheappearance of ﬂuctuation intheresults(oritsequivalent, dissipation).Atthesescales,traditionalmodelingprocedures implyexpressing physicalinsights intheformofgoverningequations[13].Theseequationsare thenvalidatedfromexperimental observa-tions.

Alternatively,thermodynamicscanbe thoughtofasa meta-physics,inthesense thatitis actuallyatheory oftheories [14]. It provides us withthe right theoretic framework in which basic principlesare met.And, in particular for any of theseintermediate or mesoscopicscales, aso-calledmetriplectic structureemerges. The termmetriplecticcomes forthe combinationofsymplecticandRiemannian(metric)geometryandemphasizes thefactthatthereareconservative aswell as dissipativecontributions to the generalevolution ofsuch a system. Oncesuch a geometric structure is found forthe system,we areinthepositionofﬁxing theframeworkinwhichourneuralnetworkscanlookfortheadequateprediction ofthefuturestatesofthesystem.Theparticularmetriplecticstructurethatweemployforsuchataskisknown,asstated before,asGENERIC.

3.1. TheGENERICformalism

The “GeneralEquation forNon-EquilibriumReversible-Irreversible Coupling”, GENERIC,formalism [15,36] establishesa mathematical frameworkinorder tomodel thedynamicsofa system. Furthermore,itiscompatiblewithclassical equi-librium thermodynamics[35], preserving the symmetries ofthe systemas statedin Noether’stheorem.It has served as thebasisforthedevelopmentofseveralconsistentnumericalintegrationalgorithmsthatexploitthesedesirableproperties [11,41].

TheGENERICstructurefortheevolutioninEq. (1) isobtainedafterﬁndingtwoalgebraicordifferentialoperators L

:

T∗

M

→

T

M

,

M

:

T∗

M

→

TM

,

whereT∗

M

andT

M

represent,respectively,thecotangentandtangentbundlesof

M

.AsingeneralHamiltoniansystems, there will be an energypotential, which we will denotehereafter by E

(

z

)

.In orderto take into account the dissipative effects, asecond potential (the so-calledMassieupotential)is introduced intheformulation.It is,ofcourse, theentropy potential ofthe GENERICformulation, S

(

z

)

. Withall theseingredients,we arriveata descriptionof thedynamicsofthe systemofthetype

dz dt

=

L

∂

E

∂

z

+

M

∂

S

∂

z

.

(2)

AsshowninEq. (2),thetimeevolutionofthesystemdescribedbythenonlinearoperatorF

(

x

,

z

,

t

)

presentedinEq. (1) isnowsplitintwoseparatedterms:

•

ReversibleTerm:Itaccountsforallthereversible(non-dissipative)phenomenaofthesystem.Inthecontextofclassical mechanics,thistermisequivalenttoHamilton’sequationsofmotionthatrelatestheparticlepositionandmomentum. Theoperator L

(

z

)

isthePoissonmatrix—itdeﬁnesaPoissonbracket—andisrequiredtobeskew-symmetric(a cosym-plecticmatrix).

•

Non-ReversibleTerm: The rest of the non-reversible (dissipative) phenomena of the system are modeled here. The operator M

(

z

)

isthefrictionmatrixandisrequiredtobesymmetricandpositivesemi-deﬁnite.

TheGENERICformulationoftheproblemiscompletedwiththefollowingso-calleddegeneracyconditions L

∂

S

∂

z

=

M

∂

E

∂

z

=

0

.

(3)

Theﬁrstconditionexpressesthe reversiblenatureoftheL contributiontothedynamicswhereasthesecondrequirement expressestheconservationofthetotalenergybythe M contribution.Thismeansnootherthingthattheenergypotential

(5)

doesnot contribute tothe production ofentropy and, conversely,that the entropy functionaldoes not contribute to re-versibledynamics.Thismutualdegeneracyrequirementinadditiontothealreadymentioned L andM matrixrequirements ensurethat:

∂

E

∂

t

=

∂

E

∂

z

· ∂

z

∂

t

=

∂

E

∂

z

L

∂

E

∂

z

+

M

∂

S

∂

z

=

0

,

whichexpressestheconservationofenergyinanisolatedsystem,alsoknownastheﬁrstlawofthermodynamics.Applying thesamereasoningtotheentropy S:

∂

S

∂

t

=

∂

S

∂

z

· ∂

z

∂

t

=

∂

S

∂

z

L

∂

E

∂

z

+

M

∂

S

∂

z

=

∂

S

∂

zM

∂

S

∂

z

≥

0

,

whichguaranteestheentropyinequality,thisis,thesecondlawofthermodynamics.

3.2. Proposedintegrationalgorithm

Oncethelearningprocedureisaccomplished,ourneuralnetworkisexpectedtointegratethesystemdynamicsintime, givenpreviouslyunseeninitialconditions.InordertonumericallysolvetheGENERICequation,weformulatethediscretized versionofEq. (2) followingpreviousworks[11]:

zn+1

−

zn

t

=

L

·

DE Dz

+

M

·

DS Dz

.

(4)

The timederivative oftheoriginal equation isdiscretizedwitha forwardEulerschemeintime increments

t, where zn+1

=

zt+t. Land M are the discretized versionsof the Poisson andfriction matrices. Last, DE_D_z and DS_D_z represent the discretegradients,whichcanbeapproximatedinaﬁniteelementsenseas:

DE

Dz

Az

,

DS

Dz

B z

,

(5)

where A andB representthediscretematrixformofthegradientoperators.

Finally,manipulatingalgebraicallyEq. (4) withEq. (5) andincludingthedegeneracyconditionsofEq. (3),theproposed integrationschemeforpredictingthedynamicsofaphysicalsystemisthefollowing

zn+1

=

zn

+

t

(

L

·

Azn

+

M

·

B zn

)

(6)

subjectto:

L

·

B zn

=

0

,

M

·

Azn

=

0

,

ensuringthethermodynamicalconsistencyoftheresultingmodel.

Tosumup, themain objectiveofthiswork isto compute theformofthe A

(

z

)

and B

(

z

)

gradient operatormatrices, subjecttothedegeneracyconditions,inordertointegrate theinitialsystemstatevariables z0 over certaintime steps

t ofthetimeinterval

I

.Usually,theformofmatricesL andM isknowninadvance,giventhevastliteratureintheﬁeld.If necessary,thesetermscanalsobecomputed[11].

3.3. Feed-forwardneuralnetworks

Intheintroductionwealreadymentionedtheintrinsicpowerofneuralnetworksinmanyﬁelds.Themainreasonunder the fact that neural networksare able tolearn andreproduce such a variety ofproblems isthat they are considered to beuniversalapproximators[4,19],meaningthattheyarecapableofapproximatinganymeasurablefunctiontoanydesired degreeofaccuracy.Themainlimitationofthistechnique isthecorrectselectionofthetuningparameters ofthenetwork, alsocalledhyperparameters.

Anotheruniversalapproximatorarepolynomials,astheycanapproximateanyinﬁnitelydifferentiablefunctionasaTaylor powerseriesexpansion.The maindifferenceisthat neuralnetworksrelyoncomposition offunctionsratherthan sumof powerseries:

ˆ

y

= (

f[L]

◦

f[L−1]

◦ ... ◦

f[l]

◦ ... ◦

f[2]

◦

f[1]

)(

x

).

(7)

Eq. (7) showsthatthedesiredoutput y from

ˆ

adeﬁnedinputx ofaneuralnetworkisacompositionofdifferentfunctions

f[l] _as_building_blocks_of_the_network_in_{L total}_layers._The_challenge_is_to_select_the_best_combination_of_functions_in_the correctordersuchthatitapproximatesthesolutionofthestudiedproblem.

(6)

Fig. 1. Representation of a single neuron (left) as a part of a fully connected neural net (right).

The simplestbuilding block of artiﬁcial deep neural network architectures is the neuron or perceptron (Fig. 1, left). Severalneuronsarestackedinamultilayerperceptron(MLP),whichismathematicallydeﬁnedasfollows

x[l]

=

σ

(

w[l]x[l−1]

+

b[l]

),

(8)

wherel istheindexofthecurrentlayer,x[l−1]_and_x[l]_are_the_layer_input_and_output_vector_{respectively,}_w[l] _is_the_weight matrixofthelastlayer,b[l] isthebiasvectorofthelastlayerand

σ

istheactivationfunction.Ifnoactivation functionis applied, theMLPisequivalent to alinearoperator. However,

σ

is chosen tobe anonlinear functioninorder toincrease thecapacityofmodeling morecomplexproblems,whicharecommonlynonlinear.Inclassiﬁcationproblems,thetraditional activation function is the logistic function (sigmoid)whereas inregression problems, Rectiﬁed Linear Unit(ReLU) [8] or hyperbolictangentarecommonlyused.

In thiswork, weuse a deepneural networkarchitecture known asfeed-forward neuralnetwork [44].It consistsofa severallayerofmultilayerperceptronswithnocyclicconnections,asshowninFig.1(right).

The inputoftheneuralnetisthe vectorstate ofagiventimestep zn,andtheoutputsare theconcatenatedGENERIC matrices A_nnetandBnet_n :forasystemwithn statevariablesthenumberofinputsandoutputsare Nin

=

n andNout

=

2n2. Then,usingtheGENERICintegrationscheme,thestatevectoratthenexttimestepznet

n+1isobtained.Thismethodisrepeated forthewholesimulationtimeT withatotalofNT snapshots.

Thestatevariablesofageneraldynamicalsystemmaydifferinseveralordersofmagnitudefromeachother,duetotheir ownphysicalnatureormeasurementunits.Then,apre-processingoftheinputdata(scalingornormalization)canimprove themodelperformanceandstability.

Thenumberofhiddenlayers Nh dependsonthecomplexityoftheproblem.Increasingthenetsizeraisesthe computa-tionalpowerofthenettomodelmorecomplexphenomena.However,itslowsthetrainingprocessandcouldleadtodata overﬁtting,limitingitsgeneralizationandextrapolationcapacity.Thesizeofthehiddenlayersischosen tobethesameas theoutputsizeofthenetNout.

Thecostfunctionforourneuralnetworkiscomposedofthreedifferentterms:

•

Dataloss:Themainlossconditionistheagreementbetweenthenetworkoutputandtherealdata.Itiscomputedas thesquarederrorsum,computedbetweenthepredictedstatevectorznet

n+1 andthegroundtruthsolutionznGT+1foreach timestep.

L

data

n

=

zGTn+1

−

znnet+1

22

.

(9)

•

Fulﬁllmentofthedegeneracyconditions:Thecost functionwill alsoaccount forthedegeneracyconditionsinorder to ensure the thermodynamic consistency of the solution, implemented asthe sum of the squaredelements ofthe degeneracyvectorsforeachtimestep,

L

degen

n

=

L

·

Bnetn znetn

22

+

M

·

Anetn znetn

22

.

(10) Thistermactsasaregularizationofthelossfunctionand,atthesametime,istheresponsibleofensuring thermody-namicconsistency.Sotospeak,itisthecornerstoneofourmethod.

•

Regularization:Inordertoavoidoverﬁtting,anextraL2regularizationterm

L

reg isaddedtothelossfunction,deﬁned asthesumoverthesquaredweightparametersofthenetwork.

L

reg

₌

L

l n[l]

i n

[l+1] j

(

w[_il_,]_j

)

2

.

(11)

The totalcost functioniscomputedasthe sumsquarederror (SSE)ofthedata lossanddegeneracyresidual,in addi-tion tothe regularizationterm, atthe endofthe simulationtime T for eachtrain case. The regularizationloss ishighly

(7)

Fig. 2. Sketch of a structure-preserving neural network training algorithm.

dependentonthesizeofthenetworklayersandhasdifferentscalingwithrespecttotheotherterms,soitiscompensated withtheregularization hyperparameter(weightdecay)

λ

r.Anadditionalweight

λ

d isaddedto thedatalossterm, which accountsfortherelativescalingerrorwithrespecttothedegeneracyconditions.

L

=

NT

n=0

(λ

d

L

datan

+

L

degen n

)

+ λ

r

L

reg

.

(12)

Theusualbackpropagationalgorithm[37] isthenusedtocalculatethegradientofthelossfunctionforeachnet param-eter(weightandbiasvectors),whichareupdated withthegradientdescent technique[43]. Theprocess isthenrepeated foramaximumnumberofepochsnepoch.TheresultingtrainingalgorithmissketchedinFig.2.

The proposedmethodology istestedwithtwodifferentdatabasesofnonlinear physicalsystems,splitina partitionof train cases (Ntrain

=

80% of thedatabase) andtest cases(Ntest

=

20% ofthe database).The net performance isevaluated withthemeansquarederror(MSE)ofthestatevariablesprediction,associatedwiththedatalossterm,Eq. (9),overallthe timesnapshots, MSEdata

(

zi

)

=

1 NT NT

n=0

zGT_i_,_n

−

znet_i_,_n

2

.

(13)

Thesameprocedureisappliedtothedegeneracyconstraint,associatedwiththedegeneracylossterm,Eq. (10),overallthe timesnapshots, MSEdegen

(

zi

)

=

1 NT NT

n=0

L

·

Bnet_i_,_nznet_i_,_n

+

M

·

Anet_i_,_nznet_i_,_n

.

(14)

As a general error magnitude of the algorithm, the average MSE of both the train (N

=

Ntrain) and test trajectories (N

=

Ntest)isalsoreportedforboththedata(m

=

data)anddegeneracy(m

=

degen)constraints,

MSEm

(

z

)

=

1 N N

i=1 MSEm

(

zi

).

(15)

Algorithm1andAlgorithm2showapseudocodeofourproposedalgorithmtoboththetrainingandtestprocesses.The proposedmethodisfullyimplementedinPyTorch[38] andtrainedinanIntelCorei7-8665UCPU.

4. Validationexamples:doublethermo-elasticpendulum

4.1. Description

The ﬁrstexampleisadoublethermo-elasticpendulum(Fig.3) consistingoftwomassesm1 andm2 connectedbytwo springsofvariablelengths

λ

1 and

λ

2 andnaturallengthsatrest

λ

01 and

λ

02.

Thesetofvariablesdescribingthedoublependulumareherechosentobe

S

= {

z

= (

q₁

,

q₂

,

p₁

,

p₂

,

s1

,

s2

)

∈ (R

2

× R

2

× R

2

× R

2

× R × R),

q1

=

0

,

q1

=

q2

},

(16) whereqi,piandsiaretheposition,linearmomentumandentropyofeachmassi

=

1

,

2.

(8)

Algorithm1Pseudocodeforthetrainalgorithm. Loadtraindatabase: zGT_(train_partition),_t,_L,_M;

Deﬁnenetworkarchitecture: Nin,Nout=2N2in,Nh,σj;

Deﬁne hyperparameters:η,λd,λr; Initializewi,j,bj;

for epoch←1,nepochdo

for train_case←1,Ntraindo Initializestatevector:znet

0 ←zGT0 ; Initializelosses:Ldata_,_Ldegen₌_0;

for snapshot←1,NTdo Forwardpropagation:[Anet

n ,Bnnet]←Net(zGTn ); Eq. (8)

Timeintegration:

znet

n+1←znnet+ t(L·Anetn znetn +M·Bnnetznetn ); Eq. (4) Updatedataloss:Ldata_{← L}data_{+ L}data

n ; Eq. (9)

Updatedegeneracyloss:Ldegen_{← L}degen_{+ L}degen

n ; Eq. (10)

end for

SSElossfunction:L← λdLdata+ Ldegen+ λrLreg Eq. (11),Eq. (12) Backwardpropagation;

Optimizerstep;

end for

Learningratescheduler;

end for

Algorithm2Pseudocodeforthetestalgorithm. Loadtestdatabase: zGT_(test_partition),_t,_L,_M;

Loadnetworkparameters; for test_case←1,Ntestdo Initializestatevector:znet

0 ←zGT0 ;

for snapshot←1,NTdo Forwardpropagation:[Anet

n ,Bnetn ]←Net(znetn ); Eq. (8)

Timestepintegration:

znetn+1←znnet+ t(L·Anetn znetn +M·Bnnetznetn ); Eq. (4) Updatestatevector:znet

n ←znetn+1; Updatesnapshot:n←n+1;

end for

ComputeMSEdata_,_MSEdegen_; _{Eq. (}₁₃_),_{Eq. (}₁₄₎

end for

ComputeMSEdata,MSEdegen; Eq. (15)

Fig. 3. Double thermo-elastic pendulum.

Thelengthsofthesprings

λ

1and

λ

2aredeﬁnedsolelyintermsofthepositionsas

λ

1

=

√

q1

·

q1

,

λ

2

=

(

q2

−

q1

)

· (

q2

−

q1

).

ThetotalenergyofthesystemcanbeexpressedasthesumofthekineticenergyofthetwomassesKi andtheinternal energyofthespringseifori

=

1

,

2,

E

=

E

(

z

)

=

i Ki

(

z

)

+

i ei

(λ

i

,

si

),

Ki

=

1 2mi

|

p_i

|

2

.

(17)

(9)

Fig. 4. Lossevolutionofdataanddegeneracyconstraintsforeachepochofthestructure-preservingneuralnetworktrainingprocessofthedoublependulum example.

Thetotalentropyofthedoublependulumisthesumoftheentropiesofthetwomassessi,

S

=

S

(

z

)

=

s1

+

s2

.

(18)

ThismodelincludesthermaleffectsinthestretchingofthespringsduetotheGough-Jouleeffect.Theabsolute temper-aturesTiateachspringareobtained throughEq. (19).Thesetemperaturechangesinduceaheatﬂuxbetweenbothsprings, beingproportionaltothetemperaturedifferenceandaconductivityconstant

κ

>

0,

Ti

=

∂

ei

∂

si

.

(19)

In this case, there is a clear contribution of both conservative Hamiltonian mechanics (mass movement) and non-Hamiltonian dissipative effects (heat ﬂux), resulting in a non-zero Poisson matrix (M

=

0). Thus, the GENERIC matrices associatedwiththisphysicalsystemareknowntobe[11]

L

=

⎡

⎢

⎣

0 0 1 0 0 0 0 0 0 1 0 0

−

1 0 0 0 0 0 0

−

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

⎤

⎥

⎦

,

M

=

⎡

⎢

⎣

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

−

1

/

2 0 0 0 0

−

1

/

2 1

⎤

⎥

⎦

.

(20)

4.2. Databaseandhyperparameters

The training databaseis generatedwithathermodynamically consistenttime-steppingalgorithm [41] in MATLAB.The massesofthedoublependulumaresettom1

=

1 kgandm2

=

2 kg,jointwithspringsofanaturallengthof

λ

01

=

2 mand

λ

0₂

=

1 mandthermalconstantofC1

=

0

.

02 JandC2

=

0

.

2 Jandconductivityconstantof

κ

=

0

.

5.Thesimulationtimeof themovementis T

=

60 sintimeincrementsof

t

=

0

.

3 s(NT

=

200 snapshots).

Thedatabaseconsistsofthestate vector,Eq. (16),of50differenttrajectorieswithrandominitialconditionsofposition qi andlinearmomentum pi ofbothmassesmi(i

=

1

,

2)aroundameanpositionandlinearmomentumofq1

= [

4

.

5

,

4

.

5

]

m, p1

= [

2

,

4

.

5

]

kg

·

m/s, andq2

= [−

0

.

5

,

1

.

5

]

m, p2

= [

1

.

4

,

−

0

.

2

]

kg

·

m/srespectively.Althoughtheinitialconditions ofthesimulationsaresimilar,itresultsinawidevarietyofthemasstrajectoriesduetothechaoticbehaviorofthesystem. This database issplit randomly in 40 train trajectories and10 test trajectories. Thus, there is a total of 80

.

000 training snapshotsand20

.

000 testsnapshots.

Thenetinput andoutputsizeis Nin

=

10 and Nout

=

2N_in2

=

200.The statevectorisnormalizedbasedonthetraining setstatisticalmeanandstandarddeviation.ThenumberofhiddenlayersisNh

=

5 withReLUactivationfunctionsandlinear inthelastlayer.ItisinitializedaccordingtotheKaimingmethod[16] withnormaldistributionandtheoptimizerusedis Adam[24] withaweightdecayof

λ

r

=

10−5 anddatalossweightof

λ

d

=

102.Amultisteplearningrateschedulerisused, starting in

η

=

10−3 _and_decaying_by_a_factor_of

_γ

₌

₀

_.

_{1 in}_epochs ₆₀₀_and_1200._The_training_process_ends_when_a_ﬁxed numberofepochsnepoch

=

1800 isreached.

(10)

Fig. 5. Timeevolutionofthestatevariablesinatesttrajectoryofadoublethermo-elasticpendulumusingatime-steppingsolver(GroundTruth,GT)and theproposedGENERICintegrationscheme(SPNN).Sinceeveryvariablehasavectorialcharacter,bothcomponentsaredepictedandlabeled as X andY , respectively.

Table 1

Meansquarederrorofthedataloss(MSEdata)anddegeneracyloss(MSEdegen)for allthestatevariablesofthedoublependulum.

State Variables MSEdata MSEdegen

Train Test Train Test

q1[m] X 1.95·10−2 3.87·10−2 3.56·10−8 4.43·10−8 Y 2.72·10−2 8.21·10−2 4.74·10−8 5.77·10−8 q2[m] X 2.04·10−2 3.65·10−2 9.28·10−8 8.55·10−8 Y 2.72·10−2 ₃ .73·10−2 ₃ .55·10−8 ₅ .01·10−8 p1[kg·m/s] X 6.43·10−4 1.33·10−4 4.00·10−8 7.08·10−8 Y 1.06·10−3 4.06·10−3 1.21·10−7 1.40·10−7 p2[kg·m/s] X 4.88·10−4 9.84·10−4 6.00·10−8 4.58·10−8 Y 8.47·10−4 1.79·10−4 9.76·10−8 1.20·10−7 s1[J/K] 1.21·10−5 3.51·10−5 1.31·10−7 2.06·10−7 s2[J/K] 1.22·10−5 3.18·10−5 2.40·10−7 2.95·10−7 Table 2

Meansquarederroroftheenergy(MSE(E)) andentropy(MSE(s))ofthedouble pendu-lum.

Variable Train Test E [J] 7.99·10−3 8.86·10−3 S [J/K] 6.52·10−8 ₆

.33·10−8

4.3. Results

Fig. 5 showsthe time evolution of thestate variables (position, momentum andentropy) of each mass givenby the solverandtheneuralnet.

Table1showsthemeansquarederrorofthedataanddegenerationlosstermsforall thestatevariablesofthedouble pendulum.TheresultsarecomputedseparatelyasthemeanoverallthetrainandtesttrajectoriesusingEq. (15).

Fig. 6andFig. 7show the time evolutionof theinternal andkinetic energy (Eq. (17))andthe entropy(Eq. (18)) re-spectivelyforthetwopendulummasses(i

=

1

,

2).Thetotalenergyisconservedandthetotalentropysatisfiestheentropy inequality,fulfillingthefirstandsecondlawsofthermodynamicsrespectively.Themeanerrorforbothtrainandtest trajec-toriesisreportedinTable2.

(11)

Fig. 6. Timeevolutionoftheenergyinatesttrajectoryofadoublethermo-elasticpendulumusingatime-steppingsolver(GroundTruth,GT)andthe proposedGENERICintegrationscheme(Net).

Fig. 7. Timeevolutionoftheentropyinatesttrajectoryofadoublethermo-elasticpendulumusingatime-steppingsolver(GroundTruth,GT)andthe proposedGENERICintegrationscheme(SPNN).

5. CouetteﬂowofanOldroyd-Bﬂuid

5.1. Description

The secondexampleisashear(Couette)flowofanOldroyd-Bfluidmodel.Thisisaconstitutivemodelforviscoelastic fluids,consistingoflinearelasticdumbbells(representingpolymerchains)immersedinasolvent.

The Oldroyd-Bmodelarisesinthe modeling ofﬂows ofdilutedpolymeric solutions.Thismodelcan beobtainedboth fromapurelymacroscopicpointofviewaswellasfromamicroscopicone,bymodeling polymerchainsaslineardumbbells diluted ina Newtoniansubstrate. Alternatively, itcanalso beobtainedby considering thedeviatoricpart T of thestress tensor

σ

(theso-calledextra-stresstensor),tobeoftheform

T

+ λ

1 ∇ T

=

η

0

˙

γ

+ λ

2 ∇

˙

γ

,

(21)

(12)

Fig. 8. Couette ﬂow in an Oldroyd-B ﬂuid.

wherethetriangledenotesthenon-linearOldroyd’s upper-convectedderivative[39].Coeﬃcients

η

0,

λ

1 and

λ

2 aremodel parameters.Itisstandardtodenotethestrainratetensorby

γ

˙

= (∇

s_v

₎

₌

_D.

Finally,thestressinthesolvent(denotedbyasubscripts)andpolymer(denotedbyasubscriptp)aregivenby

T

=

η

s

γ

˙

+

τ

,

sothat

τ

+ λ

1

∇

τ

=

η

p

γ

˙

,

whichistheconstitutiveequationfortheelasticstress.

Pseudo-experimentaldata are obtainedby theCONNFFESSIT technique [27], basedon theFokker-Plank equation [28]. ThisequationissolvedbyconvertingitinitscorrespondingItôstochasticdifferentialequation,

drx

=

∂

v

∂

yry

−

1 2Werx

dt

+

√

1 WedVt

,

dry

= −

1 2Werydt

+

1

√

WedWt

,

(22)

where v istheﬂowvelocity, r

= [

rx

,

ry

]

,rx

=

rx

(

y

,

t

)

thepositionvectorandassuming aCouetteﬂowsothatry

=

ry

(

t

)

dependsonlyontime,we standfor the WeissenbergnumberandVt,Wt aretwoindependentone-dimensionalBrownian motions. Thisequation issolved viaMonteCarlotechniques,by replacingthe mathematicalexpectationby theempirical mean.

Themodelreliesonthemicroscopicdescriptionofthestate ofthedumbbells.Thus,itisparticularlyusefultobasethe microscopicdescriptionontheevolutionoftheconformationtensorc

=

rr

,thisis,thesecondmomentofthedumbbell end-to-enddistancedistributionfunction.Thistensorisingeneralnotexperimentallymeasurableandplaystheroleofan internalvariable.Theexpectedxy stresscomponenttensorwillbegivenby

τ

=

We 1 K K

k=1 rxry

,

where K isthenumberofsimulateddumbbellsand

=

νp

νp istheratioofthepolymertosolventviscosities.

Thestatevariablesselectedforthisproblemarethepositionoftheﬂuidoneachnodeofthemesh,seeFig.8,itsvelocity v inthexdirection,internalenergye andtheconformationtensorshearcomponent

τ

,

S

= {

z

= (

q

,

v

,

e

,

τ

)

∈ (R

2

× R × R × R)}.

(23)

TheGENERICmatricesassociatedwitheachnodeofthisphysicalsystemarethefollowing

L

=

⎡

⎢

⎣

0 0 1 0 0 0 0 0 0 0 0 0

−

1 0 0 0 1

−

1 0 0

−

1 0 0 0 0 0 0 0 0 0

⎤

⎥

⎦

,

M

=

⎡

⎢

⎣

0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0 0 0 0 1

⎤

⎥

⎦

.

(24)

In order to simulatea measurement of real captureddata, Gaussian noise is added to thestate vector, computed as a random variablefollowing a normal distribution with zero meanand standard deviation proportional to the standard deviationofthedatabase

σ

z andnoiselevel

ν

,

(13)

Fig. 9. Loss evolution of data and degeneracy constraints for each epoch of the neural network training process of the Couette ﬂow example.

zGT_noise

=

zGT

+

ν

· σz

· N

(

0

,

1

)

(25)

Theresultsofboththenoise-freeandthenoisydatabasearecomparedwithtwodifferentnetworkarchitectures:

•

Unconstrainednetwork:Thisarchitectureisthesameastheproposednetworkbutremovingthedegeneracyconditions oftheenergyandentropy,Eq. (10),inthelossfunction.Theseconditionsensurethethermodynamicconsistencyofthe resultingintegrator,sonotincludingthemaffectsnegativelyintheaccuracyoftheresults,aswillbeseen.

•

Black-Box network: In thiscase, noGENERIC architecture is imposed, actingas a black-boxintegrator trainedto di-rectlypredictthestatevector timeevolutionzt+1 fromtheprevioustime stepzt.Thisnaiveapproachisshowntobe inappropriate,asnophysicalrestrictionsaregiventothemodel.

5.2. Databaseandhyperparameters

Thetraining databaseforthisOldroyd-B modelisgeneratedinMATLABwithamultiscaleapproach[28] inthe dimen-sionlessform.TheﬂuidisdiscretizedintheverticaldirectionwithN

=

100 elements(101nodes)inatotalheightofH

=

1. Atotalof10,000 dumbellswereconsidered ateachnodallocationinthe model.The lidvelocityisset to V

=

1,the vis-coelastic WeissenbergnumberWe

=

1 andReynoldsnumberofRe

=

0

.

1.ThesimulationtimeofthemovementisT

=

1 in timeincrementsof

t

=

0

.

0067 (NT

=

150 snapshots).

The database consistedof the state vector (Eq. (23))of the 100 nodestrajectories (excludingthe node ath

=

H , for whichano-slipcondition v

=

0 hasbeenimposed).Thisdatabaseissplitin80traintrajectoriesand20testtrajectories.

ThenetinputandoutputsizeisNin

=

5 and Nout

=

2N2_in

=

50.Thenumberofhiddenlayers isNh

=

5 withReLU acti-vationfunctionsandlinearinthelastlayer.ItisinitializedaccordingtotheKaimingmethod[16],withnormaldistribution and theoptimizer used is Adam[24], witha weight decay of

λ

r

=

10−5 and dataloss weight of

λ

d

=

103. A multistep learning ratescheduleris used, starting in

η

=

10−3 anddecaying by a factor of

γ

=

0

.

1 in epochs 500 and1000. The training process endswhenaﬁxed numberofepochsnepoch

=

1500 isreached. Thesameparameters areconsidered also forthenoisydatabasenetwork(

ν

=

1%)andtheunconstrainednetwork.

Theblack-boxnetworktrainingparametersareanalogoustothestructure-preservingnetwork,exceptfortheoutputsize

Nout

=

Nin

=

5.Severalnetworkarchitecturesweretested,andthelowesterrorisachievedwithNh

=

5 hiddenlayersand 25neuronseachlayer.

Thetimeevolutionofthedata

L

data _and_degeneracy

_L

degen_loss_terms_for_each_training_epoch_are_shown_in_Fig.₉_.

5.3. Results

Fig.10showsthetimeevolutionofthestatevariables(positionq,velocityv,internalenergye andconformationtensor shearcomponent

τ

)givenbythesolverandtheneuralnet.There isagoodagreementbetweenbothplots.Moreover,the proposedschemeisabletopredictthetimeevolutionoftheﬂowforseveralsnapshotsbeyondthetrainingsimulationtime

T

=

1,asshowninthesameﬁgure.

Table3showthemeansquarederrorofthedataanddegenerationlosstermsforallthestatevariablesoftheCouette ﬂow ofan Oldroyd-Bﬂuid. Theresultsare computedseparatelyasthemeanover allthe trainandtest trajectoriesusing Eq. (15).

Fig. 11shows a box plotof thedata error(MSEdata) forthe train andtest sets inthe fourstudied architectures.The results of the structure-preserving neural network outperform the other two approaches even withnoisy training data. The errorof the unconstrained neural network is greater than one order of magnitude than our approach, proving the importance ofthedegeneracyconditionsintheGENERICformulation. Last,thenaive black-boxapproachshowstheworst performanceofthefournetworks,asnophysicalrestrictionisconsidered.

(14)

Fig. 10. TimeevolutionofthestatevariablesinﬁvetestnodesofaCouetteﬂowusingasolver(GroundTruth,GT)andtheproposedGENERICintegration scheme(Net).Thedottedverticallinerepresentsthe simulationtimeT=1 ofthetrainingdataset.

Table 3

Meansquarederrorofthedataloss(MSEdata)anddegeneracyloss(MSEdegen) forallthestatevariablesoftheCouetteﬂow.

State Variables MSEdata MSEdegen

Train Test Train Test

q [-] X 5.40·10−6 6.29·10−6 1.72·10−7 1.96·10−7 Y 0.00 0.00 0.00 0.00 v [-] 3.23·10−5 4.75·10−5 1.19·10−6 1.48·10−6 e [-] 7.85·10−6 6.60·10−6 7.06·10−7 9.11·10−7

τ[-] 2.36·10−5 1.26·10−5 1.07·10−6 1.31·10−6

Fig. 11. Box plots for the data integration mean squared error (MSEdata) of the Couette ﬂow in both train and test cases.

Withrespect toour previouswork [11], that employeda piece-wise linearregressionapproach,these examplesshow similar levels ofaccuracy, buta much greater level ofrobustness. Forinstance, this sameexample was included inthe mentioned reference. However, inthat case,the problemhadtobe solved withthe helpof areducedorder modelwith only six degrees of freedom, due to the computational burden of the approach. In our former approach, the GENERIC structurewasidentifiedbypiece-wiselinearregressionforeachofthefewglobalmodes oftheapproximation.Sotospeak, inthat case, we learnt thecharacteristics oftheflow. Here,on thecontrary,the netisable tofindan approximation for anyvelocityvalueatthe101nodesofthemesh—say,fluidparticles—withoutanydifficulty.Inthiscase,wearelearning the

(15)

Table 4

Computationtrainingtimeoftheproposedalgorithm forthe tworeportedexamplesinthenoisefree net-works.

Example Epoch Time Total Time Double Pendulum 2.44 s/epoch 73.18 min Couette Flow 1.22 s/epoch 30.53 min

behaviorofﬂuidparticles.Itwillbe interesting,however,tostudytowhatextenttheemployofvariationalautoencoders, asinBertalanetal. [2],couldhelpinsolvingmoreintricatemodels.Autoencodershelpindeterminingtheactualnumber ofdegreesoffreedomneededtorepresentagivenphysicalphenomenon.

6. Conclusions

In this work we have presented a new methodology to ensure thermodynamic consistency in the deep learning of physicalphenomena.Incontrasttoexistingmethods,thismethodologydoesnotneedtoknowinadvanceanyinformation relatedtobalanceequationsorthepreciseformofthePDEgoverningthephenomenaathand.Themethodisconstructed ontopoftherightthermodynamicprinciplesthatensurethefulﬁllmentoftheenergydissipationandentropyproduction. Itisvalid,therefore,forconservativeaswellasdissipativesystems,thusovercomingpreviousapproachesintheﬁeld.

Whencomparedwithourpreviousworksintheﬁeld (seeGonzalezetal.[11]),thepresentmethodologyshowedtobe morerobust,allowingustoﬁndapproximationsforsystemswithordersofmagnitudemoredegreesoffreedom.Thisnew approachisalsolesscomputationallydemanding.Forthedoublependulumcase,thesnapshotoptimizationoftheGENERIC matricesproposedin[11] hasameasuredperformanceof10minpertrajectory,whichaddupto400minutesconsidering the40studiedtrajectories,whereasournewneural-networkapproachtrainsinonly73.18minutes.Thecomputationaltime oftheotherexamplesisshowninTable4.

Thereportedresultsshow goodagreementbetweenthenetworkoutputandthesyntheticgroundtruth solution,even withmoderatenoisydata.WehavealsoshowntheimportanceofincludingthedegeneracyconditionsoftheGENERIC for-mulationtotheneuralnetworkconstraints,asitensuresthethermodynamicalconsistencyoftheintegrator.The structure-preservingneuralnetworkoutperformsothernaiveblack-boxapproaches,sincethephysicalconstraintsactasaninductive bias,facilitatingthelearningprocess.However,theerrorcanbereducedusingseveraltechniques:

•

Database:As a generalmethod ofincreasing the precision of an Eulerintegration scheme,thetime step

t canbe decreased so the total number of snapshots is increased. On the contrary, the database will be larger, slowing the trainingprocess.Thesameway,thedatabasecanbeenrichedwithawidervarietyofcases,improvingthenetpredictive capabilities.

•

IntegrationScheme: Ahigher order Runge-Kutta integration scheme could be introduced inEq. (4) in order to get highersolutionaccuracy [47].However,itrequiresseveralforwardpassesthroughtheneuralnetforeachtimestep, in-crementingthecomplexityoftheintegrationschemeandthetrainingprocess.Additionally,GENERIC-basedintegration schemeshaveshowedverygoodperformanceevenforﬁrst-orderapproaches. [41]

•

NetArchitecture: Toincrease thecomputational power ofthe net,more andlarger hiddenlayers Nh can be added. However, thiscould leadtoa moreover-ﬁttedsolutionwhichlimit thepredictionpowerandversatility ofthenet.It alsoincreasesthecomputationalcostofboththetrainingprocessandthetestingofthenet.

•

TrainingHyperparameters:Theneuralnetworkstrainedinthisworkcouldbeoptimizedusingseveralhyperparameter tuning methodssuch asrandomsearch, Bayesianoptimizationorgradient-based optimizationtoget amoreeﬃcient solution.

Severalopenquestionsremainasafuturework.Amoreexhaustiveanalysiscanbeperformedtoevaluatetheinﬂuence ofnoisydatatotheintegratorevolution,inordertoaddrobustnesstothemethodandevenpredictwidersimulationtimes usingincrementallearning[5,30].

CRediTauthorshipcontributionstatement

•

QuercusHernandez:software,investigation,validation

•

AlbertoBadias:software,investigation,validation

•

DavidGonzález:data,investigation,validation

•

FranciscoChinesta:methodology,validation

(16)

Declarationofcompetinginterest

The authors declare that they haveno known competingﬁnancial interests or personal relationships that could have appearedtoinﬂuencetheworkreportedinthispaper.

Acknowledgements

This projecthas beenpartially funded by the ESIGroup through the ESIChair at ENSAMArtset MetiersInstitute of Technology,andthroughtheproject2019-0060“SimulatedReality”attheUniversityofZaragoza.Thesupportofthe Span-ish Ministry of Economy and Competitiveness through grant number CICYT-DPI2017-85139-C2-1-R andby the Regional GovernmentofAragon,grantT24_20R,andtheEuropeanSocialFund,arealsogratefullyacknowledged.

References

[1]JacoboAyensa-Jiménez,MohamedH.Doweidar,JoseA.Sanz-Herrera,ManuelDoblaré,Anewreliability-baseddata-drivenapproachfornoisy experi-mentaldatawithphysicalconstraints,Comput.MethodsAppl.Mech.Eng.328(2018)752–774.

[2]TomBertalan,FelixDietrich,IgorMezi ´c,IoannisG.Kevrekidis,OnlearningHamiltoniansystemsfromdata,Chaos:Interdiscip.J.NonlinearSci.29 (12) (Dec2019)121107.

[3]StevenL.Brunton,JoshuaL.Proctor,J.NathanKutz,Discoveringgoverningequationsfromdatabysparseidentiﬁcationofnonlineardynamicalsystems, Proc.Natl.Acad.Sci.(2016).

[4]EuntaeChoi,KyungmiLee,KiyoungChoi,Approximationbysuperpositionsofasigmoidalfunction,arXivpreprint,arXiv:1907.07872,2019. [5]GeorgeCybenko,Autoencoder-basedincrementalclasslearningwithoutretrainingonolddata,Math.ControlSignalsSyst.2 (4)(1989)303–314. [6]WeinanE,Aproposalonmachinelearningviadynamicalsystems,Commun.Math.Stat.5 (1)(Mar2017)1–11.

[7]ChadyGhnatios,IciarAlfaro,DavidGonzález,FranciscoChinesta, EliasCueto,Data-driven genericmodelingofporoviscoelastic materials,Entropy 21 (12)(2019).

[8]XavierGlorot,AntoineBordes,YoshuaBengio,Deepsparserectiﬁerneuralnetworks,in:ProceedingsoftheFourteenthInternationalConferenceon ArtiﬁcialIntelligenceandStatistics,AISTATS,2011,pp. 315–323.

[9]D.González,F.Chinesta,E.Cueto,Consistentdata-drivencomputationalmechanics,AIPConf.Proc.1960 (1)(2018)090005. [10]DavidGonzález,FranciscoChinesta,ElíasCueto,Learningcorrectionsforhyperelasticmodelsfromdata,Front.Mater.6(2019)14.

[11]DavidGonzález,FranciscoChinesta,ElíasCueto,Thermodynamicallyconsistentdata-drivencomputationalmechanics,Contin.Mech.Thermodyn.31 (1) (2019)239–253.

[12]AlexGraves,Abdel-rahmanMohamed,GeoffreyHinton,Speechrecognitionwithdeeprecurrentneuralnetworks,in:2013IEEEInternational Confer-enceonAcoustics,SpeechandSignalProcessing,IEEE,2013,pp. 6645–6649.

[13]MiroslavGrmela,Genericguidetothemultiscaledynamicsandthermodynamics,Comput.Phys.Commun.2 (3)(2018)032001. [14] MiroslavGrmela,VaclavKlika,MichalPavelka,Gradientandgenericevolutiontowardsreduceddynamics,2019.

[15]MiroslavGrmela,HansChristianÖttinger,Dynamicsandthermodynamicsofcomplexﬂuids.I.Developmentofageneralformalism,Phys.Rev.E56 (6) (1997)6620.

[16]KaimingHe,XiangyuZhang,ShaoqingRen,JianSun,Delvingdeepintorectiﬁers:surpassinghuman-levelperformanceonimagenetclassiﬁcation,in: ProceedingsoftheIEEEInternationalConferenceonComputerVision,ICCV,2015,pp. 1026–1034.

[17]TonyHey,StewartTansley,KristinTolle,TheFourthParadigm:Data-IntensiveScientiﬁcDiscovery,MicrosoftResearch,October2009.

[18]GeoffreyHinton,LiDeng,DongYu,GeorgeE.Dahl,Abdel-rahmanMohamed,NavdeepJaitly,AndrewSenior,VincentVanhoucke,PatrickNguyen,Tara N.Sainath,etal.,Deepneuralnetworksforacousticmodelinginspeechrecognition:thesharedviewsoffourresearchgroups,IEEESignalProcess. Mag.29 (6)(2012)82–97.

[19]KurtHornik,MaxwellStinchcombe,HalbertWhite,etal.,Multilayerfeedforwardnetworksareuniversalapproximators,NeuralNetw.2 (5)(1989) 359–366.

[20]RubénIbanez,EmmanuelleAbisset-Chavanne,JoseVicenteAguado,DavidGonzalez,EliasCueto,FranciscoChinesta,Amanifoldlearningapproachto data-drivencomputationalelasticityandinelasticity,Arch.Comput.MethodsEng.25 (1)(2018)47–57.

[21]RubénIbáñez,EmmanuelleAbisset-Chavanne,DavidGonzález,Jean-LouisDuval,EliasCueto,FranciscoChinesta,Hybridconstitutivemodeling: data-drivenlearningofcorrectionstoplasticitymodels,Int.J.Mater.Forming12 (4)(2019)717–725.

[22]RubenIbañez,DomenicoBorzacchiello,JoseVicenteAguado,EmmanuelleAbisset-Chavanne,ElíasCueto,PierreLadevèze,FranciscoChinesta, Data-drivennon-linearelasticity:constitutivemanifoldconstructionandproblemdiscretization,Comput.Mech.60 (5)(2017)813–826.

[23]PengzhanJin,AiqingZhu,GeorgeEmKarniadakis,YifaTang,Symplecticnetworks:intrinsicstructure-preservingnetworksforidentifyingHamiltonian systems,arXivpreprint,arXiv:2001.03750,2020.

[24]DiederikP.Kingma,JimmyBa,Adam:amethodforstochasticoptimization,arXivpreprint,arXiv:1412.6980,2014.

[25]TrentonKirchdoerfer,MichaelOrtiz,Data-drivencomputationalmechanics,Comput.MethodsAppl.Mech.Eng.304(2016)81–101.

[26]AlexKrizhevsky,IlyaSutskever,GeoffreyE.Hinton,Imagenetclassiﬁcationwithdeepconvolutionalneuralnetworks,in:F.Pereira,C.J.C.Burges,L. Bottou,K.Q.Weinberger(Eds.),AdvancesinNeuralInformationProcessingSystems,vol. 25,CurranAssociates,Inc.,2012,pp. 1097–1105.

[27]ManuelLaso,HansChristianÖttinger,Calculationofviscoelasticﬂowusingmolecularmodels:theconnffessitapproach,J.Non-Newton.FluidMech. 47(1993)1–20.

[28]ClaudeLeBris,TonyLelievre,Multiscalemodellingofcomplexﬂuids:amathematicalinitiation,in:MultiscaleModelingandSimulationinScience, Springer,2009,pp. 49–137.

[29]JayYoonLee,SanketVaibhavMehta,MichaelWick,Jean-BaptisteTristan,JaimeCarbonell,Gradient-basedinferencefornetworkswithoutput con-straints,in:ProceedingsoftheAAAIConferenceonArtiﬁcialIntelligence,2009.

[30]ZhizhongLi,DerekHoiem,Learningwithoutforgetting,in:IEEETransactionsonPatternAnalysisandMachineIntelligence,2017.

[31]PabloMárquez-Neila,MathieuSalzmann,PascalFua,Imposinghardconstraintsondeepnetworks:promisesandlimitations,arXivpreprint,arXiv: 1706.02025,2017.

[32]JimMagiera,DeepRay,JanS.Hesthaven, ChristianRohde,Constraint-awareneuralnetworksfor Riemannproblems,J. Comput.Phys.409(2020) 109345.

[33]W.JamesMurdoch,ChandanSingh,KarlKumbier,RezaAbbasi-Asl,BinYu,Interpretablemachinelearning:deﬁnitions,methods,andapplications,arXiv preprint,arXiv:1901.04592,2019.

[34]YatinNandwani,AbhishekPathak,ParagSingla,etal.,Aprimaldualformulationfordeeplearningwithconstraints,in:AdvancesinNeuralInformation ProcessingSystems,2019.

(17)

[35]HansChristianÖttinger,BeyondEquilibriumThermodynamics,JohnWiley&Sons,2005.

[36]HansChristianÖttinger,MiroslavGrmela,Dynamicsandthermodynamicsofcomplexﬂuids.II.Illustrationsofageneralformalism,Phys.Rev.E56 (6) (1997)6633.

[37]AdamPaszke,SamGross,SoumithChintala,GregoryChanan,EdwardYang,ZacharyDeVito,ZemingLin,AlbanDesmaison,LucaAntiga,AdamLerer, Automaticdifferentiationinpytorch,in:AutodiffWorkshop:TheFutureofGradient-BasedMachineLearningSoftwareandTechniques,2017. [38]AdamPaszke,SamGross,FranciscoMassa,AdamLerer,JamesBradbury,GregoryChanan,TrevorKilleen,ZemingLin,NataliaGimelshein,LucaAntiga,

AlbanDesmaison,AndreasKopf,EdwardYang,ZacharyDeVito,MartinRaison,AlykhanTejani,SasankChilamkurthy,BenoitSteiner,LuFang,JunjieBai, SoumithChintala,Pytorch:animperativestyle,high-performancedeeplearninglibrary,in:H.Wallach,H.Larochelle,A.Beygelzimer,F.d’AlchéBuc,E. Fox,R.Garnett(Eds.),AdvancesinNeuralInformationProcessingSystems,vol. 32,CurranAssociates,Inc.,2019,pp. 8026–8037.

[39]R.G.Owens,T.N.Phillips,ComputationalRheology,ImperialCollegePress,2002.

[40]MaziarRaissi,ParisPerdikaris,GeorgeE.Karniadakis,Physics-informedneuralnetworks:adeeplearningframeworkforsolvingforwardandinverse problemsinvolvingnonlinearpartialdifferentialequations,J.Comput.Phys.378(2019)686–707.

[41]IgnacioRomero,Thermodynamicallyconsistenttime-steppingalgorithmsfornon-linearthermomechanicalsystems,Int.J.Numer.MethodsEng.79 (6) (2009)706–732.

[42]JonathanRomero,JonathanP.Olson,AlanAspuru-Guzik,Quantumautoencodersforeﬃcientcompressionofquantumdata,QuantumSci.Technol. 2 (4)(2017)045001.

[43]SebastianRuder,Anoverviewofgradientdescentoptimizationalgorithms,arXivpreprint,arXiv:1609.04747,2016. [44]JürgenSchmidhuber,Deeplearninginneuralnetworks:anoverview,NeuralNetw.61(2015)85–117.

[45]Hoo-ChangShin,HolgerR.Roth,MingchenGao,LeLu,ZiyueXu,IsabellaNogues,JianhuaYao,DanielMollura,RonaldM.Summers,Deepconvolutional neuralnetworksforcomputer-aideddetection:Cnnarchitectures,datasetcharacteristicsandtransferlearning,IEEETrans.Med.Imaging35 (5)(2016) 1285–1298.

[46]LucasTheis,WenzheShi,AndrewCunningham,FerencHuszár,Lossyimagecompressionwithcompressiveautoencoders,arXivpreprint,arXiv:1703. 00395,2017.

[47]Yi-JenWang,Chin-TengLin,Runge-Kuttaneuralnetworkforidentiﬁcationofdynamicalsystemsinhighaccuracy,IEEETrans.NeuralNetw.9 (2)(1998) 294–307.

[48]DongkunZhang,LingGuo,GeorgeEmKarniadakis,Learninginmodalspace:solvingtime-dependentstochasticpdesusingphysics-informedneural networks,SIAMJ.Sci.Comput.42 (2)(2020)A639–A665.