A knowledge‐enhanced deep reinforcement learning‐based shape optimizer for aerodynamic mitigation of wind‐sensitive structures

(1)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 DOI: 10.1111/mice.12655 O R I G I N A L A R T I C L E

A knowledge-enhanced deep reinforcement learning-based

shape optimizer for aerodynamic mitigation of

wind-sensitive structures

Shaopeng

Li

1

_Reda

_Snaiki

1,2

_Teng

_Wu

1

1_{Department of Civil, Structural and} Environmental Engineering, University at Buffalo State University of New York, Buffalo, New York, USA

2_{Department of Applied Sciences,} Université du Québec à Chicoutimi, Chicoutimi, Québec, Canada

Correspondence

Teng Wu, Department of Civil, Structural and Environmental Engineering, Univer-sity at Buffalo, Buffalo, NY 14260, USA. Email:[email protected]

Abstract

Structural shape optimization plays an important role in the design of wind-Q1

sensitive structures. The numerical evaluation of aerodynamic performance for each shape search and update during the optimization process typically involves significant computational costs. Accordingly, an effective shape optimization algorithm is needed. In this study, the reinforcement learning (RL) method with deep neural network (DNN)-based policy is utilized for the first time as a shape optimization scheme for aerodynamic mitigation of wind-sensitive structures. In addition, “tacit” domain knowledge is leveraged to enhance the training effi-ciency. Both the specific direct-domain knowledge and general cross-domain knowledge are incorporated into the deep RL-based aerodynamic shape opti-mizer via the transfer-learning and meta-learning techniques, respectively, to reduce the required datasets for learning an effective RL policy. Numerical exam-ples for aerodynamic shape optimization of a tall building are used to demon-strate that the proposed knowledge-enhanced deep RL-based shape optimizer outperforms both gradient-based and gradient-free optimization algorithms. Q2

1 INTRODUCTION

The rapid increase in height of buildings and span of bridges makes these slender structures extremely sensi-tive to winds. In addition to optimizing structural proper-ties (e.g., Kociecki & Adeli,2014; Park & Adeli,1997) and utilizing structural control techniques (e.g., Kim & Adeli, 2005; Wang & Adeli,2015), various aerodynamic mitiga-tion strategies by modifying external shapes are employed in the design process. The selection of an appropriate aero-dynamic shape is traditionally based on several candi-dates resulting from a designer’s engineering experience and judgment. Usually the iterative procedure to update these baseline geometries is not triggered unless a safety or serviceability issue of the structure under aerodynamic © 2020 Computer-Aided Civil and Infrastructure Engineering

loads is identified. In case an aerodynamic improvement is required, a limited number of aerodynamic mitigation options are available, for example, corner modification or helical twisting for high-rise buildings (Davenport, 1971; Tanaka, Tamura, Ohtake, Nakai, & Kim,2012) and edge fairing or central slot adding for long-span bridges (Nagao, Utsunomiya, Oryu, & Manabe, 1993; Yang, Wu, Ge, & Kareem, 2015). Although this cut-and-try design, essen-tially based on intuition, is routinely used by the engi-neering community as a viable problem-solving approach, a mathematically optimal (or near optimal) aerodynamic configuration and hence a cost-effective shape design is not necessarily acquired. However, the rapid increase of structural height/span (with innovative cross sections) and recent advances of performance-based wind engineering

(2)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

methodology (with various engineering objectives) have placed a demand for more cost-effective aerodynamic designs. To this end, there is a need for an automated pro-cess to facilitate the comprehensive search of shape design space that is rigorously guided by optimization algorithms and the efficient evaluation of aerodynamic performance with each updated structural geometry (Ding & Kareem, 2018). To achieve this goal, the mathematical programming techniques are applied to the aerodynamic shape optimiza-tion process (Topping,1983). The problem formulation of the aerodynamic shape optimization generally consists of assessment of aerodynamic performance, parameteriza-tion of external shape, and specificaparameteriza-tion of a set of geo-metric requirements, respectively, represented by objective functions, design variables, and design constrains (Skinner & Zare-Behtash,2018).

A wind tunnel experiment is considered as one of the most reliable ways to assess structural aerodynamic per-formance. However, a systematic testing procedure involv-ing automated fabrication of structural models, acqui-sition, and processing of input–output data, and con-trol of fan operations is not currently available. Usu-ally, the computational fluid dynamics (CFD) simula-tion, along with its mesh update schemes, is utilized in each search step during the shape optimization process for aerodynamic mitigation of wind-sensitive structures (Elshaer & Bitsuamlak,2018). Due to the extreme complex-ities involved in the bluff-body aerodynamics and wind– structure interactions at large Reynolds numbers, high computational cost is needed for a reliable CFD simula-tion. Although surrogate models can be used to effectively alleviate the computational burden, further investigation may be needed to enhance their performance in terms of interpolation/extrapolation accuracy in the simulation (e.g., using adaptive surrogate models) and this considera-tion is outside the scope of the current study (Peherstorfer, Willcox, & Gunzburger,2018; Yazdi & Neyshabouri,2014). On the other hand, there have been significant efforts on the development of effective optimization algorithms that can achieve the globally optimal solution with a rela-tively small number of iterations (Skinner & Zare-Behtash, 2018).

Among numerous mathematical formulations of var-ious aerodynamic shape search and update rules, the gradient-based optimization algorithms (e.g., basic ent decent, gradient decent with momentum and gradi-ent descgradi-ent with adaptive step size) are widely employed since they are easy to implement and sample efficient. However, the gradient-based algorithms often get trapped in local optima that heavily depend on the start points. The accumulated engineering experience and intuition may be helpful in appropriate selection of initial config-urations (baseline designs); however, they provide little

contribution to intelligently guiding the search of globally optimal solution since effective communication between the human-readable knowledge through cognition pro-cess (e.g., thought, experience, and sense) and machine-readable information for computational algorithms has not been well established yet. To increase the chance of acquiring the global optima from the whole search space, a number of gradient-free optimization algorithms (e.g., genetic algorithm, particle swarm optimization [PSO], and simulated annealing) have been developed at the expense of sample efficiency. It is noted that both gradient-based and gradient-free optimization algorithms are essentially hand-design approaches, where the determination of their parameters for a specific application is usually based on a costly, manual trial-and-error process (Andrychowicz et al., 2016). To further enhance the automation level based on mathematical programming techniques and hence save computational cost, an auto-learned optimiza-tion approach based on increasingly popular deep learn-ing techniques would probably be a better choice. With newer and more powerful learning algorithms, deep learn-ing has been utilized in many engineerlearn-ing fields (e.g., Benito-Picazo, Domínguez, Palomo, & López-Rubio,2020; Simões, Lau, & Reis,2020; Sørensen, Nielsen, & Karstoft, 2020). Despite the existing applications in civil engineering (e.g., Liang,2019; Rafiei & Adeli,2017a; Rafiei, Khushefati, Demirboga, & Adeli,2017), its great potential to improve the optimization scheme for aerodynamic mitigation has not been well explored yet. To this end, the reinforcement learning (RL) methodology will be utilized here for the first time as a data-driven shape optimization scheme for aero-dynamic mitigation of wind-sensitive structures. In RL set-ting, the effective policy (i.e., shape search and update rule) with a goal to efficiently achieve the globally opti-mal solution (i.e., maximizing aerodynamic mitigation) can be learnt by an agent (i.e., structure) through inter-acting with its environment (i.e., wind) based on an auto-mated trial-and-error process (Sutton & Barto, 2018). In addition, the RL policy will be represented by a deep neural network (DNN). The obtained deep RL-based optimizer, by leveraging recent advances in deep learning, shows great promise in structural shape optimization for aerodynamic mitigation that is characterized as a typical nonlinear, high-dimensional, and nonconvex problem (Mnih et al., 2015).

Learning a DNN-based policy usually involves a large amount of data. As mentioned above, the generation of high-quality input–output data of structural aerodynamics (i.e., aerodynamic performance with each set of updated design variables) from CFD simulations is very expen-sive. To reduce the required training datasets, the prior domain knowledge can be leveraged to enhance the reg-ularization mechanism during the training process of a

(3)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

neural network (Psichogios & Ungar,1992). For example, deep learning enhanced by “explicit” domain knowledge in terms of physics-based and/or semiempirical equations has been recently utilized for effective simulations of tropi-cal cyclone winds (Snaiki & Wu,2019) and nonlinear struc-tural dynamics (Wang & Wu,2020) with a small training dataset. It is noted that both the tropical cyclone winds and nonlinear structural dynamics are actually governed by the Newton’s second law, and the corresponding “explicit” domain knowledge is represented by the Navier–Stokes equations and equations of motion, respectively. On the other hand, the no-free-lunch theorem for search and opti-mization indicates that a universal law and associated gov-erning equations for the optimization system may not exist (Wolpert & Macready, 1997). Hence, it is challenging to incorporate the equation-based “explicit” domain knowl-edge into the deep RL-based shape optimizer. Accord-ingly, the equation-free “tacit” domain knowledge will be leveraged here to greatly enhance the training efficiency. In this study, both the specific direct-domain knowledge and general cross-domain knowledge extracted from the low-cost source tasks will be integrated into the deep RL-based shape optimizer for its applications to high-cost tar-get tasks (Min, Sagarna, Gupta, Ong, & Goh, 2017). The former can be efficiently obtained via the transfer-learning technique based on low-fidelity simulations of the current optimization problem (e.g., Pan & Yang,2009; Yan, Zhu, Kuang, & Wang,2019), and the latter is usually acquired via the meta-learning technique based on a group of inexpen-sive tasks generated from a common probability distribu-tion (e.g., multivariate Gaussian distribudistribu-tion) that reflects important high-level structures of the current optimiza-tion problem (e.g., Finn, Abbeel, & Levine,2017; Zhou, Li, & Zare,2017). It is noted that the “tacit” domain knowl-edge usually presents a heuristic nature and its inappropri-ate incorporation into the target task may result in a neg-ative impact (Rosenstein, Marx, Kaelbling, & Dietterich, 2005). Accordingly, suitable relatedness (or similarity) and transferability measures between source and target tasks should be established (Eaton & Lane,2008). In this study, the low-cost source tasks for extracting both direct- and cross-domain knowledge are carefully selected to avoid the negative knowledge transfer. Numerical examples of a simple case study (i.e., shape optimization of a high-rise building cross section to minimize its drag) are carried out by the proposed scheme as well as gradient-based and gradient-free algorithms. The comparison results demon-strate that an improved performance of the developed knowledge-enhanced deep RL-based shape optimizer for aerodynamic mitigation of wind-sensitive structures is achieved.

F I G U R E 1 Typical process of aerodynamic shape optimization

2 AERODYNAMIC SHAPE

OPTIMIZATION SCHEMES

A general optimization problem could be simply formu-lated as

min

𝒙 𝐺 (𝒙) (1a)

where 𝒙 = [𝑥₁, 𝑥₂, … , 𝑥_𝑛] is a vector of n design variables and 𝐺(𝒙) is the objective function. The optimization pro-cess is usually subjected to R_cequality and/or S_cinequality constraints:

𝐶𝑟 (𝒙) = 0 𝑟 = 1, 2, … , 𝑅𝑐, (1b) 𝐷𝑠(𝒙)≤ 0 𝑠 = 1, 2, … , 𝑆𝑐. (1c) In the setting of aerodynamic shape optimization of wind-sensitive structures, the design variables 𝒙 charac-terize the external shape; the objective function 𝐺(𝒙) rep-resents the aerodynamic performance of wind-sensitive structures (e.g., the drag coefficient for tall buildings or the critical flutter wind speed for long-span bridges); the equality and inequality constraints are usually based on practical considerations (e.g., geometric symmetry and structural dimension). A typical aerodynamic shape opti-mization process is shown in Figure1. The optimization process utilizes an optimizer to propose a new design (i.e., to take “action”) based on aerodynamic performance of current external shape (i.e., “state”). As mentioned in the previous section, the evaluations of aerodynamic perfor-mance for bluff-body structures using CFD are very expen-sive due to the nature of turbulent flow field and intense flow separation. Hence, it is highly desirable to reduce the amount of CFD-based performance evaluations for the aerodynamic shape optimization problems. To this end, a sample-efficient optimizer that can achieve the glob-ally optimal solution with a relatively small number of

(4)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

iterations is needed. In this section, the conventional opti-mization schemes used for comparison purposes in this study are first briefly reviewed for the sake of complete-ness. Then, the newly developed deep RL-based optimiza-tion schemes are introduced in the context of aerodynamic shape optimization.

2.1 Conventional optimization schemes

The conventional optimization schemes for aerodynamic shape optimization could be generally classified into gradient-based and gradient-free methods. Based on the argument that the objective function decreases fastest in the direction of negative gradient, the basic gradient descent takes the increment of design variables propor-tional to the negative gradient of objective function at current design (Skinner & Zare-Behtash,2018). The gra-dients here are computed using a simple finite-difference method. It is noted that the computational cost for opti-mization may be reduced by using other approaches to cal-culate gradients (e.g., adjoint method). Since the gradient-based methods follow a deterministic rule to calculate the next design (i.e., moving greedily in direction of steepest descent based on gradient), the optimization process is likely to be stuck in local optima (Skinner & Zare-Behtash, 2018).

To address the issue of being trapped in local optima, the gradient-free methods could be introduced to better search for global optima. Instead of using a deterministic search and update rule as in the gradient-based methods, the gradient-free methods (e.g., genetic algorithm, PSO, and simulated annealing) usually follow rules that allow for random explorations in the design space. Furthermore, they can benefit from working with a population of candi-date designs to search the design space with shared infor-mation among the population (Skinner & Zare-Behtash, 2018). Despite the high chance of finding global optima, the gradient-free methods usually need a significant num-ber of samples.

2.2 Deep reinforcement learning-based

optimization schemes

Considering the inherent limitations resulting from the hand-designed search and update rules for conventional gradient-based and gradient-free methods, it is desir-able to design an auto-learned rule for optimization problems:

𝒙𝑡+1= 𝜋 [𝒙0, 𝒙1, … , 𝒙𝑡, 𝐺 (𝒙0) , 𝐺 (𝒙1) , … , 𝐺 (𝒙𝑡)] , (2)

where function 𝜋 represents a general form of effective optimization scheme in terms of reaching global optima with a relatively small number of iterations. It can be effectively obtained based on the RL methodology without human intervention (Silver et al.,2017).

2.2.1 RL in aerodynamic shape

optimization

The mathematical model for RL in a fully observable envi-ronment is usually based on the Markov decision process characterized by a tuple [S, A, P(s_t, s_t+1, at), R(st, st+1, at)], where S and A denote the set of state s and action a, respec-tively; P(s_t, s_t+1, at) is the state-transition probability from state s_tto state s_t₊₁under action a_t(Sutton & Barto,2018). After moving from s_t to s_t+1 under action at, an immedi-ate reward r_t is received from the environment based on the reward function R(s_t, s_t+1, at). It is noted that P(st,

s_t+1, at) and R(st, st+1, at) are the properties of the envi-ronment. In RL, an agent aims to learn a policy 𝜋 that maps from state to action [i.e., 𝑎 = 𝜋(𝑠)] such that the expected cumulative reward E(∑∞_{𝑘 = 0}𝛾𝑘𝑟_𝑡+𝑘) (also known as return 𝑅_return= ∑∞_𝑘=0𝛾𝑘𝑟_𝑡+𝑘) is maximized, where the discount factor 𝛾 (usually 0≤ 𝛾 ≤ 1) determines the rela-tive importance of future reward compared with immedi-ate reward. The policy maximizing the expected cumula-tive reward is known as the optimal policy 𝜋∗. Unlike a closely related field of dynamic programming with explic-itly given environment dynamics (i.e., the state-transition probability and reward function), the environment in RL is usually unknown and the optimal policy 𝜋∗ is learned based only on the agent’s interaction experiences with the observable environment.

A schematic description of RL to learn an effective search and update rule for aerodynamic shape optimiza-tion 𝜋 is shown in Figure 2. The term “effective” policy is used here instead of “optimal” policy since it is usu-ally difficult to exhaust the policy space to find the best one in practice. It is noted that the strict optimality of the policy is generally not the primary concern considering effectively finding the optimal aerodynamic shape is the focus. Although only 𝒙𝑡and 𝐺(𝒙𝑡) are shown in Figure2 for the sake of simplicity, the state could include all previ-ously evaluated designs 𝒙₀, 𝒙₁, … , 𝒙_𝑡and their performance 𝐺(𝒙0), 𝐺(𝒙1), … , 𝐺(𝒙𝑡) in wind environment as indicated in Equation (2). The action resulting from the policy deter-mines the new design 𝒙𝑡+1. The RL agent (i.e., the struc-ture) interacts with the wind environment to obtain an effective policy 𝜋 such that the optimal aerodynamic shape could be found within limited number of steps, which is learned by maximizing the user-defined rewards based on

(5)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

F I G U R E 2 Schematic of RL in aerodynamic shape optimiza-tion

selected RL algorithm. Among numerous RL algorithms, the value-based and policy-based methods are two most popularly used ones in optimization problems.

2.2.2 Value-based methods

In addition to the automated trial-and-error search through interacting with the environment, another impor-tant feature of RL is the use of delayed reward. In such a case, actions may affect not only the immediate reward but also the next situation, and through that, all subsequent rewards, and accordingly a state-value function 𝑣𝜋(𝑠) is defined as the expected cumulative future reward starting from the state s and following policy 𝜋 afterwards (Sut-ton & Barto,2018). The optimal policy 𝜋_∗corresponds to the optimal state-value function 𝑣𝜋∗(𝑠) that is larger than those following all other policies for all states. However, it is impossible to extract 𝜋∗based only on 𝑣𝜋∗(𝑠) due to the lack of action knowledge. To address this issue, the action-value function 𝑞_𝜋(𝑠, 𝑎) (also known as the state-action value) is introduced as the expected cumulative future reward starting from the state s, taking action a and following policy 𝜋 afterwards (Sutton & Barto,2018). The optimal policy could be identified by searching a greedy action that leads to highest value, that is, 𝑎 = argmax

𝑎 [𝑞𝜋∗(𝑠, 𝑎)] (where 𝑞𝜋∗(𝑠, 𝑎) is optimal action-value function). Most of value-based methods to obtain 𝜋_∗(e.g., Q learning) are based on the Bellman equation, which recursively relates the action value of current state to sum of the immediate reward and the discounted action value of next state (Sutton & Barto,2018):

𝑄𝑘+1 (𝑠𝑡, 𝑎𝑡) = 𝑄𝑘 (𝑠𝑡, 𝑎𝑡) +𝜂𝑄 [ 𝑟𝑡+ 𝛾 max_𝑎 𝑄𝑘(𝑠𝑡+1, 𝑎) − 𝑄𝑘(𝑠𝑡, 𝑎𝑡) ] , (3)

where capital Q indicates an estimate of lower-case q; sub-script “k” represents the iteration number; 𝜂_𝑄is the learn-ing rate.

For a high-dimensional continuous state space, the tab-ular representation of Q functions (e.g., a lookup table) in conventional Q learning could be replaced by function approximators (e.g., a DNN). In a deep Q learning, the input of deep Q network is the high-dimensional contin-uous state and the output is the Q value for each discrete action (Mnih et al.,2015). It is noted that the combination of Q learning with the DNN-based function approxima-tions often suffers from divergence due mainly to two rea-sons, namely strong correlation between the consecutive samples (𝑠𝑡, 𝑎𝑡, 𝑟𝑡, 𝑠𝑡+1,𝑎𝑡+1, 𝑟𝑡+1, …) and nonstationarity of the target [𝑟𝑡+ 𝛾 max_𝑎 𝑄𝑘(𝑠𝑡+1, 𝑎)] in Equation (3) (Mnih et al.,2015). To address the divergence issue from corre-lation, a replay buffer is usually employed to store the past experiences and the randomly sampled experiences from the replay buffer are utilized to update the deep Q network. The use of replay buffer not only removes the strong corre-lation of samples but also improves sample efficiency with repetitively accessed learning experiences. To overcome the divergence issue from nonstationarity, an additional DNN called a target Q network with weights slowly track-ing that of the original Q network is introduced to make the target Q value changing slowly and hence improve stabil-ity (Mnih et al.,2015). Although the deep Q learning shows very promising results in solving complicated tasks with high-dimensional continuous state space due to the pow-erful function approximation ability of DNN, it can only handle the low-dimensional discrete action space due to the curse of dimensionality (Lillicrap et al.,2016). To con-sider general application of RL to aerodynamic shape opti-mization, the policy in Figure 2needs to be represented by a DNN mapping from continuous states to continuous actions.

2.2.3 Policy-based methods

Policy-based methods are popularly employed in RL prob-lems with a continuous action space. In contrast to value-based methods, the policy-value-based algorithms directly learn a parameterized policy 𝑎 = 𝜋(𝑠|𝜽) (where 𝜽 is the pol-icy parameters) mapping states to high-dimensional con-tinuous actions without using the action-value function as the intermediary to compute the policy. In the policy-based methods, the RL agent directly updates the policy parame-ter 𝜽 (e.g., the weights of DNN) by gradient ascent (Sutton & Barto,2018):

(6)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

F I G U R E 3 Incorporation of “tacit” knowledge into deep RL-based aerodynamic shape optimizer

where 𝜂𝑃𝐵is the learning rate; 𝐽(𝜽𝑘) is a performance index of current policy 𝜋(𝑠|𝜽_𝑘) in terms of expected cumulative reward and could be estimated with the cumulative reward (i.e., return 𝑅_return) of sampled sequences using current policy; ∇𝜽𝐽(𝜽𝑘) is the gradient of performance index with respect to policy parameters 𝜽.

Although it is straightforward and effective to adjust the policy parameters in the direction of policy gradient, 𝑅returnand hence obtained gradients usually present high variance in a stochastic environment and learning diffi-culty may occur. To reduce the variance of policy gradi-ents, the action value 𝑄(𝑠, 𝑎) following current policy is employed to estimate 𝐽(𝜽𝑘) (Sutton & Barto, 2018). The obtained “actor-critic” scheme actually inherits essential features from both policy-based and value-based methods, where the “actor” proposes the action 𝑎 = 𝜋(𝑠|𝜽) as in policy-based methods and the “critic” evaluates the qual-ity of the action (i.e., action value 𝑄[𝑠, 𝑎 = 𝜋(𝑠|𝜽)]) as in value-based methods. In the application of deep Q func-tion to the “actor-critic” scheme, the previously menfunc-tioned numerical tricks of replay buffer and target network for deep Q learning should also be adopted to improve the learning performance.

3 KNOWLEDGE-ENHANCED DEEP

RL-BASED SHAPE OPTIMIZER

Deep RL algorithms usually start from a random policy (i.e., DNN with randomly initialized weights), and hence a large amount of interactions with the high-cost CFD envi-ronment may be necessary for convergence to an effec-tive DNN-based policy. To further enhance the search effi-ciency, the domain knowledge will be incorporated into

the current learning problem. Unlike recent attempts of using the “explicit” domain knowledge in terms of physics-based and/or semiempirical equations to enhance train-ing efficiency of conventional deep learntrain-ing (Snaiki & Wu, 2019; Wang & Wu,2020), the domain knowledge utilized to efficiently obtain aerodynamic shape optimization pol-icy is equation-free “tacit” domain knowledge due to the nonexistence of a universal law and associated governing equations for the optimization system according to the no-free-lunch theorem for search and optimization (Wolpert & Macready, 1997). As shown in Figure 3, the “tacit” knowledge extracted from the low-cost environment in this study includes both specific direct-domain knowl-edge and general cross-domain knowlknowl-edge. The former is obtained via the transfer-learning technique based on low-fidelity simulations of the current optimization problem (e.g., Pan & Yang,2009; Yan et al.,2019), and the latter is acquired via the meta-learning technique based on a group of inexpensive tasks generated from a common probabil-ity distribution (e.g., multivariate Gaussian distribution) that reflects important high-level structures of the current optimization problem (e.g., Finn et al., 2017; Zhou et al., 2017).

3.1 Incorporation of specific

direct-domain knowledge via transfer

learning

To incorporate the specific direct-domain knowledge in the deep RL-based shape optimizer, the widely used trans-fer learning in deep learning community, which stores knowledge gained from solving one problem (source task) and applies it to a different but related problem (target

(7)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

F I G U R E 4 Incorporation of specific direct-domain knowledge into deep RL-based shape optimizer via transfer learning

task), is utilized here (Pan & Yang,2009). Among various transfer learning schemes (e.g., instance-based, feature-based, parameter-feature-based, and relational-based ones), this study utilizes the parameter-based approach transferring the knowledge in terms of the weights of DNN-based policy from the source task (e.g., in a low-cost environ-ment based on Reynolds-averaged Navier–Stokes [RANS] scheme) to the target task (e.g., in a high-cost environment based on large-eddy simulation [(LES] scheme). As shown in Figure4, the policy is represented by a fully connected feedforward DNN (i.e., a multilayer perceptron [MLP]). The input (state) of the MLP is the current design 𝒙𝑡and the output (action) is the design variation Δ𝒙𝑡for the cal-culation of the next design 𝒙𝑡+1= 𝒙𝑡 + Δ𝒙𝑡:

Δ 𝒙𝑡= 𝜋MLP (𝒙𝑡𝜽𝜋MLP) , (5) where the 𝜽𝜋MLP_{is the weights of the policy network 𝜋}_MLP_. Since there is a unique objective function in transfer learn-ing, an effective search direction at current design 𝒙𝑡is suf-ficient for the purpose of optimization. Hence, only 𝒙𝑡is utilized here as state and input to the MLP-based policy. In addition, the 𝐺(𝒙_𝑡) can be fully determined by 𝒙_𝑡 and hence not included as state. It is noted that the objective function in the low-cost environment is denoted as 𝐺_𝑙(𝒙) to differentiate from the objective function 𝐺(𝒙) in high-cost environment. The reward 𝑟𝑡received at each step for min-imizing the objective function is chosen to be the aerody-namic performance improvement of proposed new design 𝐺𝑙(𝒙𝑡+1) compared to the baseline design 𝐺𝑙(𝒙0), that is, 𝑟𝑡 = 𝐺𝑙(𝒙0) − 𝐺𝑙(𝒙𝑡+1). In the case where the optimization purpose is to maximize the objective function, a negative sign needs to be added to the reward.

The deep deterministic policy gradient (DDPG) algo-rithm, which has been successfully applied in numerous continuous control tasks with high sample-efficiency (Lill-icrap et al.,2016), is utilized to obtain the effective policy for aerodynamic shape optimization. In addition to the policy network 𝜋MLP(𝒙𝑡|𝜽𝜋MLP), there are three additional DNN in DDPG, namely Q network [𝑄MLP(𝒙𝑡, Δ𝒙𝑡|𝜽𝑄MLP)], target policy network [𝜋MLP′(𝒙𝑡|𝜽𝜋MLP′)] and target Q network [𝑄MLP′(𝒙𝑡, Δ𝒙𝑡|𝜽𝑄MLP

′

)], with 𝜽𝑄MLP_{, 𝜽}𝜋MLP′_{, and 𝜽}𝑄MLP′ representing their corresponding weights. The policy 𝜋MLP(𝒙𝑡|𝜽𝜋MLP) is learned in an “actor-critic” mode. The “actor,” represented by policy network 𝜋_MLP(𝒙_𝑡|𝜽𝜋MLP_), follows a deterministic policy to output the action Δ𝒙𝑡 based on the observed state 𝒙𝑡. The “critic,” represented by the Q network 𝑄MLP(𝒙𝑡, Δ𝒙𝑡|𝜽𝑄MLP), evaluates the actor’s action value to provide valuable update informa-tion by encouraging the acinforma-tions leading to large future rewards (large Q values) and penalizing the actions lead-ing to small future rewards (small Q values). The delayed copies of the policy network 𝜋MLP(𝒙𝑡|𝜽𝜋MLP) and Q net-work 𝑄MLP(𝒙𝑡, Δ𝒙𝑡|𝜽𝑄MLP) are used to compute the tar-get policy network 𝜋MLP′(𝒙𝑡|𝜽𝜋MLP′) and target Q net-work 𝑄_MLP′(𝒙_𝑡, Δ𝒙_𝑡|𝜽𝑄MLP′). The learning details based on DDPG is presented in Algorithm 1. The trained weights of policy networks (and other three networks) are utilized as the initial weights for training progress in the high-cost environment using the same DDPG algorithm.

3.2 Incorporation of general

cross-domain knowledge via meta learning

In the case that the specific direct-domain knowledge extracted from the low-cost environment (e.g., low-fidelity

(8)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

A l g o r i t h m 1 Training MLP-based optimizer using DDPG

simulations) is not available, the meta-learning tech-nique may be utilized to incorporate general cross-domain knowledge extracted from a set of prescribed functions that share high-level similarities with the objective func-tion of the current aerodynamic shape optimizafunc-tion prob-lem into the deep RL-based aerodynamic shape optimizer. Meta learning has recently drawn a great attention due to the fact that knowledge obtained from learning to mas-ter a set of tasks can be generalized to masmas-ter new tasks (Finn et al.,2017), and it is used here to leverage the gen-eralization ability of the trained deep RL-based optimizer based on a set of prescribed low-cost functions for optimiz-ing the high-cost objective functions for an unseen aero-dynamic shape optimization problem. Since the effective optimization policy in meta learning is obtained from and will be utilized for optimizing a large number of func-tions, the current design 𝒙𝑡 and corresponding objective function 𝐺(𝒙𝑡) may not be able to provide enough informa-tion to take next acinforma-tion. In this study, all previously eval-uated designs and associated aerodynamic performance [i.e., 𝒙0, 𝐺(𝒙0), 𝒙1, 𝐺(𝒙1), … , 𝒙𝑡, 𝐺(𝒙𝑡)] are considered as state input to the policy network. Accordingly, the recur-rent neural networks (RNN) is used to parameterize the policy due to its convenience to pass information across time steps (Li, Wu, & Liu,2020; Wu & Kareem,2011). In addition, RNN utilizes shared weights for different time steps and hence greatly reduces the number of weights

to be learned in an optimization problem. To solve the issue of vanishing and exploding gradients in backpropaga-tion for a plain-vanilla RNN, the long-short time memory (LSTM) cell is utilized to replace the conventional neurons in the hidden layers (Chen et al.,2017; Zhou et al.,2017). As shown in Figure5, the proposed next design of aerody-namic shape 𝒙𝑡+1(and associated vector 𝒉𝑡+1representing cell output) is calculated through the RNN with LSTM cell (denoted by 𝜋𝑅𝑁𝑁with weights of 𝜽RNN):

[𝒙𝑡+1, 𝒉𝑡+1] = 𝜋RNN [𝒙𝑡, 𝐺(𝒙𝑡) , 𝒉𝑡|𝜽RNN, (6) where cell output vector 𝒉_𝑡is composed of LSTM cell state and hidden state with the memory contributions from all previously evaluated designs and their associated objective functions [𝒙0, 𝐺(𝒙0), 𝒙1, 𝐺(𝒙1), … , 𝒙𝑡−1, 𝐺(𝒙𝑡−1)]. The effective optimization policy 𝜋RNN is achieved by maxi-mizing the cumulative reward (return) of the 𝑛step opti-mization steps in the low-cost environment, i.e., 𝑅return=

− ∑𝑛step

𝑡 = 1𝐺𝑙(𝒙𝑡). It should be noted that functions used to construct the low-cost environment in meta learning is usually synthesized based on a common probability distri-bution (e.g., multivariate Gaussian distridistri-bution) and easy to learn, and hence the basic policy-gradient algorithm performs well. The learning details based on policy gra-dient used in this study is described in Algorithm2. The trained RNN-based optimizer in the low-cost environment

(9)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

F I G U R E 5 Incorporation of general cross-domain knowledge into deep RL-based shape optimizer via meta learning

A l g o r i t h m 2 Training RNN-based optimizer using policy gradient

is directly utilized in current aerodynamic shape optimiza-tion process without further learning to highlight its gen-eralization ability.

4 NUMERICAL EXAMPLES

Due to the extremely high computational demand for three-dimensional (3D) CFD simulations (e.g., Elshaer & Bitsuamlak, 2018; Kim & Yhim, 2014), a simple case study of aerodynamic shape optimization of the cross sec-tion of a typical high-rise building is used to demonstrate the improved performance of the proposed knowledge-enhanced deep RL-based aerodynamic shape optimizer

compared to conventional based and gradient-free schemes. Since state of the practice in analyzing wind effects on 3D realistic slender structures is essentially based on the two-dimensional (2D) cross-section aerody-namic properties according to strip theory (e.g., Daven-port,1962; Hao & Wu,2018; Hou & Sarkar,2018; Scanlan, 1978), the 2D numerical examples can be considered as the fundamental building blocks for more complex 3D realis-tic scenarios. As shown in Figure6, the baseline design is a square with nondimensional width D = 1 and rounded corners with radius rc = 0.4, and the straight-line seg-ments are fixed while the rounded parts are allowed to change. The optimization task is to minimize the mean drag force coefficient 𝜇𝐶𝑑 by updating the two design

(10)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

F I G U R E 6 Geometric configuration of a typical tall-building cross section

variables Δ𝑦₁∗ and Δ𝑦∗₂ defined as the relative displace-ments in the 𝑦∗direction with respect to the control points in a baseline design located at 𝑥₁∗= –0.16 and 𝑥∗₂= –0.04 (Ding & Kareem,2018). The resulting aerodynamic shape is obtained through interpolation with a cubic spline pass-ing through the new control points. Considerpass-ing the exis-tence of the four axes of symmetry, the geometry could be fully determined by the two design variables Δ𝑦₁∗and Δ𝑦∗₂. The constraints |Δ𝑦∗₁| ≤ 0.1 and |Δ𝑦₂∗| ≤ 0.1 are imposed to limit the maximum allowable geometric change. It is noted that the parameterization scheme used here has been successfully applied to the aerodynamic shape opti-mization of tall buildings (Bernardini, Spence, Wei, & Kareem, 2015; Ding & Kareem, 2018). Although the 2D numerical examples are employed for the sake of conve-nience, it is expected that the knowledge-enhanced deep RL-based shape optimizer will present greater promise in high-dimensional applications due mainly to the inclusion of DNN function approximators (Chen et al., 2017; Yan et al.,2019).

4.1 Specific direct-domain

knowledge-enhanced deep RL-based

optimization

To effectively incorporate the specific direct-domain knowledge into the current aerodynamic shape optimiza-tion process, the low-fidelity low-cost RANS simulaoptimiza-tions and high-fidelity high-cost LES simulations are employed as the source and target tasks, respectively. The RANS-level and LES-level CFD simulations given by Ding and Kareem (2018) are utilized to evaluate the mean drag force coef-ficient 𝜇𝐶𝑑. The CFD simulations are carried out based on the Open Source Field Operation and Manipulation (OpenFOAM) C ++ class library, where the spatial domain is discretized utilizing the Finite volume method. Specifi-cally, a uniform wind flow approaching the building at a

F I G U R E 7 Velocity fields of CFD simulations for a selected shape design

fixed angle of attack is considered. The Reynolds number is 105for both RANS and LES. In a computational domain of 30D × 20D (where D is the cross-section width), structured mesh is utilized with a mesh number of around 435,000 (for RANS) and 1,418,000 (for LES), and the mesh

inde-pendence is checked following the AJI guideline (Tomi- Q3 naga et al.,2008). The k–ω shear stress transport model is

used for RANS, while dynamic Lagrangian subgrid-scale model for LES. The y+ number is around 40 (with stan-dard wall functions) for RANS and 1 for LES. The Courant– Friedrichs–Lewy numbers for RANS and LES are 0.1 and 0.4, respectively. The numerical simulations are termi-nated after four cycles of flow passing the computational domain. The computational time for one 2D shape design is around 10 and 150 h for RANS and LES using 16 CPU cores [Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50 GHz]. The representative velocity fields of RANS and LES for a selected shape design are shown in Figure7. It is noted that the RANS-based simulations are first used for establishing a reliable surrogate model over the design space to save the computational demand (Ding & Kareem,2018). Hence, the computational cost in low-cost environment is consid-ered as negligible compared to that in high-cost environ-ment. In this numerical example, the computational bud-get is restricted to 20 LES-based evaluations of the objec-tive functions for each method. Since each “step” repre-sents one LES-based evaluation, the maximum step n_step for the optimization is 20. The hyperparameters of DDPG used in the numerical example are shown in Table1. The trained weights of the four networks in DDPG in the low-cost RANS environment are used as the initial weights for learning and optimizing processes in the high-cost environment.

(11)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 T A B L E 1 Hyperparameters of DDPG Hyperparameters Values

Number of layers (policy network) 4 Number of neurons (policy network) 40 Learning rate of policy network 𝜂𝜋 0.0001 Activation functions in hidden

layers (policy network)

Rectified linear unit Activation functions in output layer

(policy network)

Hyper tangent Number of layers (Q network) 4

Number of neurons (Q network) 40 Learning rate of Q network 𝜂𝑄 0.001 Activation functions in hidden

layers (Q network)

Rectified linear unit Activation functions in output layer

(Q network)

Linear

Discount factor 𝛾 0.99

Update factor 𝜏 0.001

Batch size 𝑛𝑏𝑎𝑡𝑐ℎ 128

For the sake of simplicity, the basic gradient descent (Skinner & Zare-Behtash, 2018) and PSO (Kennedy & Eberhart,1995) are, respectively, selected as examples of gradient-based and gradient-free optimization schemes, respectively, and used for comparison with deep RL-based shape optimizer. For a fair comparison, the parameters of the basic gradient descent and PSO methods are selected to ensure that both of them present good performance in the optimization based on low-cost simulations. The step size 𝜂_BGDof the basic gradient descent is set to be 0.01, and each step requires extra evaluations of the objective func-tion to compute its gradient. The populafunc-tion size n_popand step size 𝜂PSO of PSO are set to be 5 and 1, respectively. The starting point is taken as (–0.1, –0.1) far from the opti-mal design for effective evaluation of various optimization schemes. The sampled designs for different methods are shown in Figure8a–d, where the cross mark with a step number aside denotes the proposed designs in the mization process. It is noted that the deep RL-based opti-mizer without integrating domain knowledge (i.e., directly interacting with the high-cost environment), as shown in Figure 8c, does not necessarily perform better than con-ventional optimization schemes for this simple 2D case considering the random initialization of the optimization policy. The comparison results in Figure 8findicate that the specific direct-domain knowledge from RANS-level simulations can greatly facilitate the efficient search of deep RL-based optimizer for the optimal (or near opti-mal) design in LES-based simulations, which demonstrate that direct-domain knowledge-enhanced deep RL-based optimizer outperforms both based and

gradient-T A B L E 2 Hyperparameters of policy gradient

Hyperparameters Values

LSTM cell state and hidden state size

100 Learning rate of policy

network η_RNN

0.0001

Batch size n_b 128

free optimization algorithms in this case study. The two design variables are –0.011 and 0.028 for the selected opti-mal shape, which results in a drag coefficient of 0.276. The small oscillations in the sampled designs as indicated in Figure8dmay be attributed to the synchronized learning and optimizing processes in the high-cost environment.

4.2 General cross-domain

knowledge-enhanced deep RL-based

optimization

To effectively incorporate the general cross-domain knowl-edge into the current optimization process, it is impor-tant to select a set of appropriate prescribed functions for constructing a low-cost environment. For the aero-dynamic shape optimization of tall buildings, the objec-tive function is likely to have multiple local optima. Considering the nonconvex continuous Gaussian process functions are often used for the synthesis of functions with multiple local optima (e.g., Chen et al., 2017; Zhou et al.,2017), they are utilized here by assuming that the high-cost objective functions of aerodynamic shape opti-mization could be well approximated by a mixture of Gaussian process functions. Accordingly, a set of sup-porting points 𝑿 = [𝒙1, 𝒙2, … , 𝒙𝑚] (m × d) and their corresponding values 𝑮𝑠= [𝐺1, 𝐺2, … 𝐺𝑚] (m × 1) are first randomly generated, where m represents the number of supporting points and d denotes the dimension of design variables. The synthetic Gaussian process function passing the supporting points is then given by

𝐺𝑙 (𝒙) = (𝑲_𝑿,𝑿−1 𝑮𝑠)𝑇𝑲𝑿(𝒙) (7) where the essential element in matrix 𝑲𝑿,𝑿 (m by m) is exp(−|𝒙𝑖−𝒙𝑗|2

2𝑙2 ) (for i = 1, 2,..., m and j = 1, 2,..., m) and the essential element in matrix 𝑲𝑿(𝒙) (m by 1) is exp(−|𝒙𝑘−𝒙|

2 2𝑙2 ) (for k = 1, 2,. . . , m). The parameters m and l are set to be 6 and 0.5, respectively, in this case. A total of 4000 synthetic Gaussian process functions are utilized to train the RNN-based shape optimizer using policy-gradient algorithm, and the used hyperparameters are shown in Table2. The obtained RNN-based shape optimizer and the

(12)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

F I G U R E 8 Optimization of 𝜇_𝐶𝑑using different schemes with the starting point (–0.1, –0.1) corresponding effective policy are directly employed in the

current aerodynamic shape optimization process without further training.

The proposed designs by the general cross-domain knowledge-enhanced deep RL-based shape optimizer in the current aerodynamic optimization process are shown in Figure 8e. It is shown that the developed optimizer first explores in design space for a few steps and then efficiently march towards the global optimum. The com-parison results in Figure 8f indicate that the general cross-domain knowledge from a mixture of Gaussian pro-cess functions can greatly facilitate the efficient search of deep RL-based optimizer for the optimal (or near opti-mal) design in LES-based simulations, which demonstrate that the general cross-domain knowledge-enhanced deep RL-based optimizer outperforms both gradient-based and gradient-free optimization schemes in this case study. In addition to the starting point (–0.1, –0.1), the comparative study with another starting point (–0.1, 0.1) is further car-ried out to more comprehensively demonstrate the com-putational advantage of the proposed aerodynamic shape optimizer. The simulation results are given in Figure9, and it is shown that both specific direct-domain and general cross-domain knowledge-enhanced deep RL-based opti-mizers still present improved performance compared to conventional methods in the case of a different starting point. Other starting points (e.g., points of (0.1, 0.1) and (0.1, –0.1)) are also investigated, and it is found that the knowledge-enhanced deep RL-based optimizer can always quickly reach to optimum. It is noted that the performance

F I G U R E 9 Comparison of various methods with the starting point (–0.1, 0.1)

of each optimization scheme used in this numerical exam-ple may vary with model parameters, wind conditions (e.g., angles of attack), and other factors. Hence, a comprehen-sive parametric study is needed before a more general con-clusion in terms of optimization efficiency can be obtained.

5 CONCLUSION

This study developed a novel aerodynamic shape optimizer for wind-sensitive structures using knowledge-enhanced deep RL. The RL approach with a DNN-based policy (for shape search and update) is utilized as a data-driven shape optimization scheme for aerodynamic mitigation of wind-sensitive structures, and the equation-free domain

(13)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

knowledge is leveraged to remarkably enhance the train-ing efficiency. It is shown that the specific direct-domain knowledge learnt from low-fidelity computational fluid dynamics simulations (with RANS equations) and general cross-domain knowledge learned from a mixture of Gaus-sian process functions can be incorporated into the deep RL-based aerodynamic shape optimizer via the transfer-learning and meta-transfer-learning techniques, respectively, and both greatly facilitated the efficient search of an effective RL policy to obtain the optimal (or near optimal) design in expensive high-fidelity CFD simulations (i.e., large-eddy simulations). Numerical examples for aerodynamic shape optimization of the cross section of a typical tall build-ing demonstrated the improved performance of both spe-cific direct-domain and general cross-domain knowledge-enhanced deep RL-based shape optimizers compared to conventional gradient-based and gradient-free optimiza-tion algorithms. To simultaneously incorporate both spe-cific direct-domain and general cross-domain knowledge into the same deep RL-based shape optimization for a further improved performance would be an interesting research topic to explore. Other advanced deep learn-ing schemes (e.g., enhanced probabilistic neural network, Ahmadlou & Adeli, 2010; neural dynamic classification, Rafiei & Adeli,2017b; dynamic ensemble learning, Alam, Siddique, & Adeli, 2020; and finite element machine, Pereira, Piteri, Souza, Papa, & Adeli,2020) should be also explored to enhance simulation accuracy and efficiency of the aerodynamic shape optimization. Another direc-tion of future work is to extend current simple 2D aero-dynamic shape optimization to more complex nonlinear, high-dimensional and nonconvex problems for aerody-namic mitigation of large-scale wind-sensitive structures.

A C K N O W L E D G M E N T S

The support from Institute of Bridge Engineering at Uni-versity at Buffalo is gratefully acknowledged. The authors are also thankful to Ms. Fei Ding from University of Notre Dame for the valuable assistance on CFD simulations.

F U N D I N G I N F O R M A T I O N

Institute of Bridge Engineering at University at Buffalo

R E F E R E N C E S

Ahmadlou, M., & Adeli, H. (2010). Enhanced probabilistic neural network with local decision circles: A robust classifier. Integrated Computer-Aided Engineering, 17(3), 197–210.

Alam, K. M. R., Siddique, N., & Adeli, H. (2020). A dynamic ensem-ble learning algorithm for neural networks. Neural Computing and Applications, 32(12), 8675–8690.

Andrychowicz, M., Denil, M., Gomez, S., Hoffman, M. W., Pfau, D., Schaul, T., . . . De Freitas, N. (2016). Learning to learn by gradient descent by gradient descent. Paper presented at the Proceedings of

30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain.

Benito-Picazo, J., Domínguez, E., Palomo, E. J., & López-Rubio, E. (2020). Deep learning-based video surveillance system man-aged by low cost hardware and panoramic cameras. Integrated Computer-Aided Engineering, 27(4), 373–387.

Bernardini, E., Spence, S. M., Wei, D., & Kareem, A. (2015). Aero-dynamic shape optimization of civil structures: A CFD-enabled Kriging-based approach. Journal of Wind Engineering and Indus-trial Aerodynamics, 144, 154–164.

Chen, Y., Hoffman, M. W., Colmenarejo, S. G., Denil, M., Lillicrap, T. P., Botvinick, M., & de Freitas, N. (2017). Learning to learn without gradient descent by gradient descent. Paper presented at the Pro-ceedings of the 34th International Conference on Machine Learn-ing(ICML 2017), Sydney, Australia: .

Davenport, A. G. (1962). Buffeting of a suspension bridge by storm winds. Journal of the Structural Division, 88(3), 233–268.

Davenport, A. G. (1971). The response of six building shapes to turbu-lent wind. Philosophical Transactions of the Royal Society of Lon-don. Series A, Mathematical and Physical Sciences, 269(1199), 385– 394.

Ding, F., & Kareem, A. (2018). A multi-fidelity shape optimization via surrogate modeling for civil structures. Journal of Wind Engineer-ing and Industrial Aerodynamics, 178, 49–56.

Eaton, E., & Lane, T. (2008). Modeling transfer relationships between learning tasks for improved inductive transfer. In W. Daelemans et al. (Eds.), Joint european conference on machine learning and knowledge discovery in databases(pp. 317–332). LNAI 5211. Berlin, Germany: Springer.

Kennedy, J., &. Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the IEEE international conference on neural net-works Perth, Australia(Vol. 4, pp. 1942–1948). Piscataway, NJ: IEEE Press.

Elshaer, A., & Bitsuamlak, G. (2018). Multiobjective aerodynamic optimization of tall building openings for wind-induced load reduction. Journal of Structural Engineering, 144(10), 04018198. Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic

meta-learning for fast adaptation of deep networks. In Proceedings of 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia(pp. 1126–1135). Proceedings of Machine Learn-ing Research (Vol. 70). Copenhagen, Denmark: MLR Press. Hao, J., & Wu, T. (2018). Downburst-induced transient response of a

long-span bridge: A CFD-CSD-based hybrid approach. Journal of Wind Engineering and Industrial Aerodynamics, 179, 273–286. Hou, F., & Sarkar, P. P. (2018). A time-domain method for predicting

wind-induced buffeting response of tall buildings. Journal of Wind Engineering and Industrial Aerodynamics, 182, 61–71.

Kim, H., & Adeli, H. (2005). Wind-induced motion control of 76-story benchmark building using the hybrid damper-TLCD system. Jour-nal of Structural Engineering, 131(12), 1794–1802.

Kim, B. C., & Yhim, S. S. (2014). Buffeting analysis of a cable-stayed bridge using three-dimensional computational fluid dynamics. Journal of Bridge Engineering, 19(11), 04014044.

Kociecki, M., & Adeli, H. (2014). Two-phase genetic algorithm for topology optimization of free-form steel space-frame roof structures with complex curvatures. Engineering Applications of Artificial Intelligence, 32, 218–227.

Li, T., Wu, T., & Liu, Z. (2020). Nonlinear unsteady bridge aerody-namics: Reduced-order modeling based on deep LSTM networks.

(14)

3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55

Journal of Wind Engineering and Industrial Aerodynamics, 198, 104116.

Liang, X. (2019). Image based post disaster inspection of reinforced concrete bridge systems using deep learning with Bayesian opti-mization. Computer Aided Civil and Infrastructure Engineering, 34(5), 415–430.

Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., . . . Wierstra, D. (2016). Continuous control with deep reinforce-ment learning. Paper presented at Proceedings of 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico.

Min, A. T. W., Sagarna, R., Gupta, A., Ong, Y. S., & Goh, C. K. (2017). Knowledge transfer through machine learning in aircraft design. IEEE Computational Intelligence Magazine, 12(4), 48–60. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J.,

Belle-mare, M. G., . . . Petersen, S. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. Nagao, F., Utsunomiya, H., Oryu, T., & Manabe, S. (1993).

Aerody-namic efficiency of triangular fairing on box girder bridge. Journal of Wind Engineering and Industrial Aerodynamics, 49(1–3), 565– 574.

Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345– 1359.

Park, H. S., & Adeli, H. (1997). Data parallel neural dynamics model for integrated design of large steel structures, Computer-Aided Civil and Infrastructure Engineering, 12(5), 311–326.

Peherstorfer, B., Willcox, K., & Gunzburger, M. (2018). Survey of mul-tifidelity methods in uncertainty propagation, inference, and opti-mization. SIAM Review, 60(3), 550–591.

Pereira, D. R., Piteri, M. A., Souza, A. N., Papa, J. P., & Adeli, H. (2020). FEMa: A finite element machine for fast learning. Neural Comput-ing and Applications, 32(10), 6393–6404.

Psichogios, D. C., & Ungar, L. H. (1992). A hybrid neural network-first principles approach to process modeling. AIChE Journal, 38(10), 1499–1511.

Rafiei, M. H., Khushefati, W. H., Demirboga, R., & Adeli, H. (2017). Supervised deep restricted Boltzmann machine for estimation of concrete. ACI Materials Journal, 114(2), 237–244.

Rafiei, M. H., & Adeli, H. (2017a). A novel machine learning-based algorithm to detect damage in high-rise building structures. The Structural Design of Tall and Special Buildings, 26(18), e1400. Rafiei, M. H., & Adeli, H. (2017b). A new neural dynamic

classifica-tion algorithm. IEEE Transacclassifica-tions on Neural Networks and Learn-ing Systems, 28(12), 3074–3083.

Rosenstein, M. T., Marx, Z., Kaelbling, L. P., & Dietterich, T. G. (2005). To transfer or not to transfer. Paper presented at NIPS 2005 work-shop on transfer learning, British Columbia, Canada.

Scanlan, R. H. (1978). The action of flexible bridges under wind, I: Flutter theory. Journal of Sound and Vibration, 60(2), 187–199. Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A.,

Guez, A., . . . Chen, Y. (2017). Mastering the game of go without human knowledge. Nature, 550(7676), 354–359.

Simões, D., Lau, N., & Reis, L. P. (2020). Exploring communication protocols and centralized critics in multi-agent deep learning. Inte-grated Computer-Aided Engineering, 27(4), 333–351.

Skinner, S. N., & Zare-Behtash, H. (2018). State-of-the-art in aerody-namic shape optimisation methods. Applied Soft Computing, 62, 933–962.

Snaiki, R., & Wu, T. (2019). Knowledge-enhanced deep learning for simulation of tropical cyclone boundary-layer winds. Journal of Wind Engineering and Industrial Aerodynamics, 194, 103983. Sørensen, R. A., Nielsen, M., & Karstoft, H. (2020). Routing in

con-gested baggage handling systems using deep reinforcement learn-ing. Integrated Computer-Aided Engineering, 27(2), 139–152. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An

intro-duction. Cambridge, MA: MIT Press.

Tanaka, H., Tamura, Y., Ohtake, K., Nakai, M., & Kim, Y. C. (2012). Experimental investigation of aerodynamic forces and wind pres-sures acting on tall buildings with various unconventional config-urations. Journal of Wind Engineering and Industrial Aerodynam-ics, 107, 179–191.

Tominaga, Y., Mochida, A., Yoshie, R., Kataoka, H., Nozu, T., Yoshikawa, M., & Shirasawa, T. (2008). AIJ guidelines for prac-tical applications of CFD to pedestrian wind environment around buildings. Journal of Wind Engineering and Industrial Aerodynam-ics, 96(10-11), 1749–1761.

Topping, B. H. V. (1983). Shape optimization of skeletal structures: A review. Journal of Structural Engineering, 109(8), 1933–1951. Wang, N., & Adeli, H. (2015). Robust vibration control of wind-excited

highrise building structures. Journal of Civil Engineering and Man-agement, 21(8), 967–976.

Wang, H., & Wu, T. (2020). Knowledge-enhanced deep learning for wind-induced nonlinear structural dynamic analysis. Journal of Structural Engineering, 146(11), 04020235.

Wolpert, D. H., & Macready, W. G. (1997). No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 1(1), 67–82.

Wu, T., & Kareem, A. (2011). Modeling hysteretic nonlinear behavior of bridge aerodynamics via cellular automata nested neural net-work. Journal of Wind Engineering and Industrial Aerodynamics, 99(4), 378–388.

Yan, X., Zhu, J., Kuang, M., & Wang, X. (2019). Aerodynamic shape optimization using a novel optimizer based on machine learning techniques. Aerospace Science and Technology, 86, 826–835. Yang, Y., Wu, T., Ge, Y., & Kareem, A. (2015). Aerodynamic

stabiliza-tion mechanism of a twin box girder with various slot widths. Jour-nal of Bridge Engineering, 20(3), 04014067.

Yazdi, J., & Neyshabouri, S. S. (2014). Adaptive surrogate modeling for optimization of flood control detention dams. Environmental Modelling & Software, 61, 106–120.

Zhou, Z., Li, X., & Zare, R. N. (2017). Optimizing chemical reac-tions with deep reinforcement learning. ACS Central Science, 3(12), 1337–1344.

How to cite this article: Li S, Snaiki R, Wu T. A

knowledge-enhanced deep reinforcement learning-based shape optimizer for aerodynamic mitigation of wind-sensitive structures. Comput

Aided Civ Inf. 2020;1–14.