Identification of Dynamic Models in Complex Networks With Prediction Error Methods: Predictor Input Selection

(1)

HAL Id: hal-01355016

https://hal.archives-ouvertes.fr/hal-01355016

Submitted on 25 Apr 2019

HAL

is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire

HAL, est

destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.

Copyright

With Prediction Error Methods: Predictor Input Selection

Arne Dankers, Paul M. J. van den Hof, Xavier Bombois, Peter S. C. Heuberger

To cite this version:

Arne Dankers, Paul M. J. van den Hof, Xavier Bombois, Peter S. C. Heuberger. Identification of Dynamic Models in Complex Networks With Prediction Error Methods: Predictor Input Selection.

IEEE Transactions on Automatic Control, Institute of Electrical and Electronics Engineers, 2016, 61

(4), pp.937-952. �10.1109/TAC.2015.2450895�. �hal-01355016�

(2)

Identification of Dynamic Models in Complex Networks with Prediction Error Methods - Predictor Input Selection

Arne Dankers, Member, IEEE,Paul M. J. Van den Hof, Fellow, IEEE, Xavier Bombois and Peter S. C.

Heuberger

Abstract—This paper addresses the problem of obtaining an estimate of a particular module of interest that is embedded in a dynamic network with known interconnection structure. In this paper it is shown that there is considerable freedom as to which variables can be included as inputs to the predictor, while still obtaining consistent estimates of the particular module of interest. This freedom is encoded into sufficient conditions on the set of predictor inputs that allow for consistent identification of the module. The conditions can be used to design a sensor placement scheme, or to determine whether it is possible to obtain consistent estimates while refraining from measuring particular variables in the network. As identification methods the Direct and Two Stage Prediction-Error methods are considered.

Algorithms are presented for checking the conditions using tools from graph theory.

Index Terms—System identification, closed-loop identification, graph theory, dynamic networks, linear systems.

I. INTRODUCTION

S

YSTEMS IN ENGINEERING are becoming more complex and interconnected. Consider for instance, power systems, telecommunication systems, and distributed control systems. Since many of these systems form part of the foundation of our modern society, their seamless operation is paramount. However, the increasing complexity and size of the systems poses real engineering challenges (in maintaining sta- bility of the electrical power grid, increasing data throughput of telecommunication networks, etc.). These systems cannot be operated, designed, and maintained without the help of models.

Tools fromsystem identificationare well suited to construct models using measurements obtained from a system. However, the field of system identification is primarily focused on identifying open and closed-loop systems. Recently, there has been a move to consider more complex interconnection structures.

The literature on identification and dynamic networks can be split into two categories based on whether the interconnection structure of the network is assumed to be known or not. In the latter the objective is generally to detect the topology of the network, whereas in the former the focus has mainly been to identify (part of) the dynamical transfers in the network based on open-loop and closed-loop identification techniques.

The work of Arne Dankers is supported in part by the National Science and Research Council (NSERC) of Canada.

A. Dankers is with the Delft Center for Systems and Control, Delft University of Technology,[email protected]

P.M.J. Van den Hof is with the Department of Electrical Engineering, Eindhoven University of Technology,[email protected]

P.S.C. Heuberger is with the Department of Mechanical Engineering, Eindhoven University of Technology,[email protected]

X. Bombois is with Laboratoire Amp`ere, Ecole Centrale de Lyon (Ecully, France),[email protected]

The topology detection literature is primarily based on the methods of Granger andGranger Causality[1]. In [2], [3] it is shown that it is possible to distinguish between open and closed-loop systems (using a parametric approach). Recently, this line of reasoning has been extended to more general networks in [4], [5] (using a non-parametric approach). Several methods have appeared that automate Granger’s method for detection of causal relations by using regularization terms to set certain links in the network to zero. For instance, [6], [7]

directly implement an`0norm, whereas [8] uses the LASSO ([9]), and [10] uses a compressed sensing approach. In [11]

a Bayesian approach for topology detection is presented. The main features that these algorithms have in common is that all internal variables in the network are assumed to be known, each internal variable is driven by an independent stochastic variable, and most papers assume all transfer functions in the network are strictly proper. Under these conditions it is shown that topology detection is possible.

Although the structure detection problem is very interesting, the underlying identification techniques, even for the case that the network structure is known, have not been fully developed yet. In particular if we consider situations that go beyond the rather restrictive conditions mentioned above.

As a result, identification of (particular modules in) dynamic networks for a given interconnection structure is a relevant problem to address. Moreover, for a large number of systems in engineering the interconnection structure of the network is known (power systems, telecommunication systems etc.).

In the identification of dynamic networks attention has been given to the study ofspatially distributed systems, where each node is connected only to its direct neighbors and the modules are assumed to be identical [12], [13] or not [14], [15], [16].

In these papers emphasis is on numerically fast algorithms.

In [17] closed-loop prediction-error identification methods have been extended to the situation of dynamic networks and analyzed in terms of consistency properties. The interconnection topology is very general and goes beyond the spatially distributed topology. The approach is to focus on identifying a single module embedded in a network with known interconnection structure and with general conditions on noise disturbances. Both noise and known user-defined signals (called reference signals) can drive or excite the network, while the presence of reference signals can be used to relax assumptions on the noise in the system. In the analysis of [17] it is required thatallsignals that directly map into the output of the considered module are taken as predictor inputs, and therefore they all need to be measured.

In this paper we consider an extension of the problem

(3)

setting in [17]. The objective is to identify a particular module embedded in a dynamic network, and to analyze the flexibility that exists in which selection of measured variables leads to consistent identification of the module of interest. The variables that are measured are available to use as predictor inputs, i.e. variables that are used to predict the value of a particular internal variable. Specifically, the question addressed in this paper is: given a dynamic network with known interconnection structure, for which selection of predictor inputs can we guarantee that a particular module of interest can be estimated consistently?

Our approach is actually a local approach where only a limited number of variables need to be measured in order to identify the object of interest. The resulting algorithms can be applied to small to medium scale networks, or to large networks with sparse interconnection structures. It can also be used to design a sensor placement scheme tailored specifically to identifying a particular module in the network. Thus, it may be possible to avoid measuring variables that are expensive, difficult or unsafe to measure.

In order to make the step towards a selection of predictor input variables, the dynamics that appear between a selection of measured variables in a network is described in a so-called immersed network. The conditions for consistent module estimates are derived in a general context, and then specified for the Direct and Two-Stage Prediction-Error Methods, as formalized for a dynamic network case in [17]. This paper is based on the preliminary results of [18], [19] but developed and formulated here in a stronger and unifying framework, by relying predominantly on an analysis that is independent of the particular identification algorithm.

In Section II dynamic networks are defined. In Section III the prediction-error identification framework is presented, including generalizations of the Direct and Two-Stage identification methods. In Section IV an immersed network is defined as the network that is constructed by discarding nonmeasured node variables. Additionally general conditions are formulated on the predictor input variables to ensure consistent estimation of the module dynamics. In Sections V and VI the conditions on predictor inputs are specified for each identification method separately. In Section VII an algorithm based on graph theory is presented to check the conditions.

II. SYSTEMDEFINITION ANDSETUP

A. Dynamic Networks

The networks that are considered in this paper are built up ofLelements (or nodes), related toLscalarinternal variables wj, j= 1, . . . , L. It is assumed that each internal variable is such that it can be written as:

w_j(t) = X

k∈Nj

G⁰_jk(q)w_k(t) +r_j(t) +v_j(t) (1) where G⁰_jk(q), k∈ Nj is a proper rational transfer function, q⁻¹ is the delay operator (i.e.q⁻¹u(t) =u(t−1)) and,

• Nj is the set of indices of internal variables with direct causal connections to w_j, i.e.i∈ Nj iffG⁰_ji6= 0;

• v_j is an unmeasured disturbance variablethat is a sta- tionary stochastic process with rational spectral density:

vj =H_j⁰(q)ejwhereej is a white noise process, andH_j⁰ is a monic, stable, minimum phase transfer function;

• rj is an external variable that is known and can be manipulated by the user; it is an important variable that can provide deliberate (user-chosen) excitation to the network.

It may be that the disturbance and/or external variable are not present at some nodes. The entire network is defined by:





 w₁ w₂ ... w_L







=







0 G⁰₁₂ · · · G⁰_1L G⁰₂₁ 0 . .. ...

... . .. . .. G⁰_L−1 _L G⁰_L1 · · · G⁰_{L L−1} 0











 w₁ w₂ ... w_L





 +





 r₁ r₂ ... r_L





 +





 v₁ v₂ ... v_L





 ,

whereG⁰_jkis non-zero if and only ifk∈ Nj for rowj. Using an obvious notation results in:

w=G⁰w+r+v (2) where, w,r, andv are vectors. If an external or disturbance variable is absent at nodei, theith entry ofrorvrespectively is0. Eq. (2) is thedata generating system.

There exists a path from wi to wj if there exist integers n1, . . . , nksuch thatG⁰_jn

1G⁰_n₁_n₂· · ·G⁰_n

kiis non-zero. Likewise there exists a path from ri to wj (or vi towj) if there exist integersn1, . . . , nk such thatG⁰_jn₁G⁰_n₁_n₂· · ·G⁰_n

ki is non-zero.

The following sets will be used throughout the paper:

• R and V denote the sets of indices of all external and disturbance variables respectively present in the network.

• Rj andVj denote the sets of indices of all the external and disturbance variables respectively with a path towj. A directed graph of a dynamic network can be used to represent a network. A directed graph is a collection of nodes connected by directed edges. A directed graph of a dynamic network can be constructed as follows:

1. Let allwk,k∈ {1, . . . , L} be nodes.

2. Let allv_k,k∈ V andr_m,m∈ Rbe nodes.

3. For all i, j ∈ {1, . . . , L} if G_ji 6= 0, then add a directed edge from nodew_i to nodew_j.

4. For all k∈ V add a directed edge fromv_k tow_k. 5. For all k∈ R add a directed edge fromr_k tow_k.

More concepts from graph theory will be used throughout the paper, but they will be presented where they are applicable.

The following is an example of a dynamic network.

Example 1:Consider a network defined by:





 w₁ w₂ w₃ w4

w5

w6







=







0 0 0 G⁰₁₄ 0 0

G⁰₂₁ 0 G⁰₂₃ 0 0 0

0 G⁰₃₂ 0 0 0 0

0 0 0 0 0 G⁰₄₆

0 G⁰₅₂ 0 G⁰₅₄ 0 G⁰₅₆ 0 0 G⁰₆₃ 0 G⁰₆₅ 0











 w₁ w₂ w₃ w4

w5

w6





 +





 v₁ v₂ v₃ v4

v5

v6







shown in Fig. 1a. Its graph is shown in Fig. 1b.

All networks are assumed to satisfy the following conditions.

Assumption 1:

(4)

w1

G⁰₂₁

w2

G⁰₂₃ G⁰₃₂

w3

G⁰₆₃ w6

G⁰₅₆ G⁰₆₅

w5

G⁰₅₄ w4

G⁰₄₆

G⁰14

G⁰₅₂ v1

v2

v3

v4

v5

v6

w1

w2

w3

w4

w5

w6

v1

v2

v3

v4

v5

v6

(a) (b)

Fig. 1. A diagram (a) and graph (b) of the network for Examples 1 and 2. In (a), each rectangle represents a transfer function, and each circle represents a summation. For clarity labels of the wi’s have been placed inside the summations indicating that the output of the sum is the variablewi.

(a) The network is well-posed in the sense that all principal minors of lim_z→∞(I−G⁰(z))are non-zero.

(b) (I−G⁰)⁻¹ is stable.

(c) All rm,m∈ Rare uncorrelated to allvk,k∈ V.¹ The well-posedness property [20] ensures that bothG⁰and (I−G⁰)⁻¹ only contain proper (causal) transfer functions, and still allows the occurrence of algebraic loops.

In this paper the set of internal variables chosen as predictor inputs plays an important role. For this reason, it is convenient to partition (2) accordingly. Let Dj denote the set of indices of the internal variables that are chosen as predictor inputs.

Let Zj denote the set of indices not in {j} ∪ Dj, i.e.

Zj = {1, . . . , L} \ {{j} ∪ Dj}. Let wD denote the vector [w_k₁ · · · w_k_n]^T, where {k1, . . . , k_n} = Dj. Let rD denote the vector [r_k₁ · · · r_k_n]^T, where {k1, . . . , k_n} = Dj, and where the`th entry is zero if r_` is not present in the network (i.e. ` /∈ R). The vectors wZ, vD, vZ and rZ are defined analogously. The ordering of the elements of wD,vD, andrD

is not important, as long as it is the same for all vectors. The transfer function matrix between wD andwj is denoted G⁰_j_D. The other transfer function matrices are defined analogously.

By this notation, the network equations (2) are rewritten as:



 wj

wD

wZ



=





0 G⁰_j_D G⁰_j_Z G_D⁰_j G_DD⁰ G_DZ⁰ G_Z⁰_j G_ZD⁰ G_ZZ⁰







 wj

wD

wZ



+



 vj

vD

vZ



+



 rj

rD

rZ



, (3) whereG_DD⁰ andG_ZZ⁰ have zeros on the diagonal.

III. PREDICTIONERRORIDENTIFICATION AND

EXTENSION TODYNAMICNETWORKS

In this section, the prediction-error framework is presented with a focus on using the techniques in a network setting.

It is an identification framework based on the one-step-ahead predictor model [21].

1Throughout this paper r uncorrelated to v will mean that the cross- correlation functionRrv(τ)is zero for allτ.

A. Prediction Error Identification

Letw_jdenote the variable which is to be predicted, i.e. it is the output of the module of interest. Thepredictor inputsare those (known) variables that will be used to predict w_j. The sets D_j andP_j are used to denote the sets of indices of the internal and external variables respectively that are chosen as predictor inputs -wk is a predictor input iff k∈ Dj, andrk

is a predictor input iffk ∈ Pj. The one-step-ahead predictor for wj is then [21]:

ˆ

wj(t|t−1, θ) =H_j⁻¹(q, θ) X

k∈Dj

Gjk(q, θ)wk(t)

+X

k∈Pj

Fjk(q, θ)rk(t) +

1−H_j⁻¹(q, θ)

wj(t) (4) where H_j(q, θ) is a monic noise model, G_jk(θ) models the dynamics between w_k to w_j,k ∈ D_j, and F_jk(q, θ)models the dynamics between rk towj, k∈ Pj. The importance of includingFjk(q, θ)will become evident later in the paper. The prediction error is then:

εj(t, θ) =wj(t)−wˆj(t|t−1, θ)

=Hj(θ)⁻¹

wj−X

k∈Dj

Gjk(θ)wk−X

k∈Pj

Fjk(θ)rk

(5) where arguments q and t have been dropped for notational clarity. The parameterized transfer functionsGjk(θ),k∈ Dj, Fjk(θ),k∈ Pj, and Hj(θ)are estimated by minimizing the sum of squared (prediction) errors:

V_j(θ) = 1 N

N−1

X

t=0

ε²_j(t, θ). (6) where N is the length of the data set. Let θˆ_N denote the minimizer of (6). Under standard (weak) assumptions ([21]) θˆ_N →θ^∗ with probability 1 asN → ∞where

θ^∗= arg min

θ∈Θ

E¯[ε²_j(·, θ)] and E¯ := lim

N→∞

1 N

N−1

X

t=0

E,

and E is the expected value operator [21]. The function E¯[ε²_j(t, θ)]is denoted V¯j(θ). IfGjk(q, θ^∗) =G⁰_jk the module transfer is estimated consistently.

As in closed-loop identification, identification in networks may have the problem that the disturbance affecting the “output”w_jis correlated to one or more of the predictor inputs. In the closed-loop identification literature several methods have been developed to deal with this problem such as the Direct and Two Stage Methods [22], [23], [24]. Both methods have been extended to a network setting [17]. Generalizations of both methods to allow for a flexible choice of predictor inputs are presented in the following subsections.

B. The Direct Method

The Direct Method for identifyingG⁰_ji(q)is defined by the following algorithm.

Algorithm 1:Direct Method.

1. Selectw_j as the output variable to be predicted.

(5)

2. Choose the internal and external variables to include as predictor inputs (choose D_j andP_j).

3. Construct the predictor (4).

4. Obtain estimates G_jk(q,θˆ_N),k∈ D_j,F_jk(q,θˆ_N),k∈ P_j andHj(q,θˆN) by minimizing the sum of squared prediction errors (6).

In [17] Step 2 of the algorithm is replaced by a fixed choice, namely,Dj=Nj, andPj=∅.

C. Two Stage Method

In the Two Stage Method, the predictor inputs are not internal variables, but projections of internal variables. The projection of w_k onto an external variable r_k is defined as follows. Any variable w_k can be written as:

w_k = X

m∈Rk

F_km⁰ r_m+ X

m∈Vk

H_km⁰ v_m. (7)

whereF_km⁰ andH_km⁰ are proper stable transfer functions. Let w_k^(r^m⁾:=F_km⁰ rm. The termw^(r_k^m⁾is the projection ofwkonto causally time shifted versions ofrm(referred to as simply the projection ofw_kontor_m). If there are more external variables available, then w_k can be projected onto a set of external variablesr_m,m∈ T_j, which is denoted by

w_k⁽^T^j⁾:= X

m∈Tj

w^(r_k^m⁾= X

m∈Tj

F_km⁰ rm. (8)

An estimate of w_k⁽^T^j⁾ can be obtained by estimating F_km⁰ , m∈ Tj (using a Prediction-Error Method for instance) using a parametrized model Fkm(q, γ) with γ a parameter vector, resulting in an estimated model Fkm(q,γˆN). This model is used to generate the simulated signal:

ˆ

w_k⁽^T^j⁾(ˆγ_N) = X

m∈Tj

F_km(q,ˆγ_N)r_m(t).

The Two Stage Method is defined as follows.

Algorithm 2: Two Stage Method.

1. Select wj as the output variable to be predicted.

2. Choose the external variables to project onto (choose Tj).

3. Choose the internal and external variables to include as predictor inputs (choose Dj andPj).

4. Obtain estimates wˆ⁽_k^T^j⁾ of w_k⁽^T^j⁾ for eachk∈ Dj. 5. Construct the predictor

ˆ

wj(t|t−1, θ) = X

k∈D_j

Gjk(θ) ˆw_k⁽^T^j⁾+ X

k∈P_j

Fjk(θ)rk. (9)

6. Obtain estimates G_jk(q,θˆ_N),k∈ Dj andF_jk(q,θˆ_N),k∈ Pjby minimizing the sum of squared prediction errors (6).

This algorithm is a generalization of the one in [17].

Remark 1: In Step 5 of the algorithm a noise model is optional. For simplicity it is not included in (9).

IV. CONSISTENT IDENTIFICATION ON THE BASIS OF A SUBSET OF PREDICTOR INPUT VARIABLES

When only a subset of all node variables in a network is available from measurements, a relevant question becomes:

what are the dynamical relationships between the nodes in this subset of measured variables? In Section IV-A it is shown that when only a selected subset of internal variables is considered, the dynamic relationships between these variables can be described by an immersed network. Several properties of the immersed network are investigated. In Section IV-B it is shown under which conditions the dynamics that appear between two internal variables remain invariant when reducing the original network to the immersed one. In Section IV-C the results of identification in networks are characterized. It is shown that it is the dynamics of the modules in the immersed network that are being identified, and conditions for consistency of general identification results are formulated. The results presented in this section are independent of an identification method.

A. The Immersed Network

In this subsection, we show that there exists a unique dynamic network consisting only of a given subset of internal variables, that still exactly describes the dynamics between the selected variables. Moreover, we show that this network can be constructed by applying an algorithm from graph theory for constructing an immersed graph. Given the selected variables wk, k∈ {j} ∪ Dj, the remaining variables wn, n∈ Zj are sequentially removed from the network.

The following proposition shows that there is a unique char- acterization of the dynamics between the selected variables.

Proposition 1: Consider a dynamic network as defined in Section II-A that satisfies Assumption 1. Consider the set of internal variables{wk},k∈ Dj∪ {j}. There exists a network:

wj(t) wD(t)

= ˘G⁰(q,Dj) wj(t)

wD(t)

+ ˘F⁰(q,Dj)





rj(t)+vj(t) rD(t)+vD(t) rZ(t)+vZ(t)



, (10) where G˘⁰ and F˘⁰ are unique transfer matrices of the form (using a notation analogous to that of (3)):

G˘⁰=

"

0 G˘⁰_j_D G˘_D⁰_j G˘_DD⁰

#

and F˘⁰=

F˘_jj⁰ 0 F˘_j⁰_Z 0 F˘_DD⁰ F˘_DZ⁰

, (11) whereG˘_DD⁰ has zeros on the diagonal,F˘_DD⁰ is diagonal, and if there is an index ` such that both v` andr` are not present, then the corresponding column ofF˘⁰ is set to all zeros.

See Appendix X-A for the proof. Proposition 1 is in line with the result of [25] where conditions have been formulated for the existence of a unique interconnection matrix G˘⁰ on the basis of a transfer function from external inputs to node signals. The conditions in [25] typically reflect that enough entries of G˘⁰ and F˘⁰ are known (or set to zero as in our case). Proposition 1 is formulated for a particular structure, that matches with our dynamic network setup. EnforcingG˘⁰ to have zeros on the diagonal results in a network that does not have any “self-loops”, i.e. no paths that enter and leave the same node. This matches the assumptions imposed on the data

(6)

w1

w2

w3

w4

w5

w6

v1

v2

v3

v4

v5

v6

lift paths through

w3

w1

w2

w3

w4

w5

w6

v1

v2

v3

v4

v5

v6

lift paths through

w6

w1

w2

w3

w4

w5

w6

v1

v2

v3

v4

v5

v6

Fig. 2. Example of constructing an immersion graph. In Step 1 internal variable w3 is removed, and in step 2 variable w6. Edges between w’s have been emphasized in thick black lines since these connections define the interconnection structure of the corresponding dynamic network.

generating system (2). Enforcing the leading square matrix of F˘⁰ to be diagonal results in a network where each rk, k ∈ Dj∪ {j} only has a path to the corresponding internal variable w_k (matching the interconnection structure of (2)).

The effect of the remaining external variables is encoded in F_j⁰_Z andF_DZ⁰ without any pre-defined zero entries.

Denote the noise in (10) as:

v˘_j

˘ vD

=

F˘_jj⁰ 0 0 F˘_DD⁰

v_j vD

+

F˘_j⁰_Z F˘_DZ⁰

vZ. (12) Then by the Spectral Factorization Theorem [26], there exists a unique, monic, stable, minimum phase spectral factorH˘⁰:

˘vj

˘ vD

=

"

H˘_jj⁰ H˘_j⁰_D H˘_D⁰_j H˘_DD⁰

#

˘ ej

˘ eD

. (13)

where[˘ej ˘e_D^T]^T is a white noise process.

In the following text it is shown that a network of the form (10) can be constructed using ideas from graph theory.

In graph theory, one way to remove nodes from a graph is by constructing animmersedgraph. A graph G⁰ is an immersion of G if G⁰ can be constructed from G by lifting pairs of adjacent edges and then deleting isolated nodes [27]. Lifting an edge is defined as follows. Given three adjacent nodes a, b,c, connected by edgesab andbc, the lifting of path abcis defined as removing edgesabandbcand replacing them with the edge ac. In Fig. 2 an immersed graph of the network of Example 1 is constructed by first removing the node w₃ and connecting v₃ to w₂ andw₆, and subsequently removing w₆ and connecting v₆ tow₅ andw₄.

In this way an immersed network can be constructed by an algorithm that manipulates the dynamics of the network iteratively. To keep track of the changes in the transfer functions iteratively, let G⁽ⁱ⁾mn and Fmn⁽ⁱ⁾ denote the transfer functions of the direct connections wn to wm and from rn

andvn towm, respectively, at iterationi of the algorithm.

Algorithm 3: Constructing an immersed network.

1. Initialize. Start with the original network:

• G⁽⁰⁾mn=G⁰_mn for allm, n∈ {1, . . . , L}, and

• F_kk⁽⁰⁾= 1, for allk∈ R ∪ V,Fmn⁽⁰⁾ = 0otherwise.

2. Remove eachw_k,k∈ Z_j from the network, one at a time.

Letd=card(Z_j). LetZ_j={k₁, . . . , k_d}.

fori= 1 :d

(a) Let Iki denote the set of internal variables with edges to w_k_i. Let Oki denote the set of nodes with edges from w_k_i. Lift all paths w_n → w_k_i →w_m, n∈ I_k_i, m∈ O_k_i. The transfer function of each new edge from wn →wm isG⁽ⁱ⁾mn=G⁽ⁱ⁻¹⁾_mk

i G⁽ⁱ⁻¹⁾_k

in . (b) LetI_k^r

i denote the set of external or disturbance variables with edges tow_k_i. Lift all paths r_n → w_k_i → wm,n∈ I_k^r

i,m∈ Ok_i. The transfer function for each new edge fromr_n→w_mis Fnm⁽ⁱ⁾ =F_nk⁽ⁱ⁻¹⁾

i G⁽ⁱ⁻¹⁾_k

in . (c) If there are multiple edges between two nodes, merge

the edges into one edge. The transfer function of the merged edge is equal to the sum of the transfer functions of the edges that are merged.

(d) remove the nodewki from the network.

end

3. Remove all self-loops from the network. If node wm has a self loop, then divide all the edges enteringwm by(1− G^(d)mm(q))(i.e. one minus the loop transfer function).

Let G˘ⁱ⁰ and F˘ⁱ⁰ denote the final transfer matrices of the immersed network.

Remark 2:Algorithm 3 has a close connection to Mason’s Rules [28], [29]. However, Mason was mainly concerned with the calculation of the transfer function from thesources (external and noise variables) to asink(internal variable). This is equivalent to obtaining the immersed network withDj =∅, i.e. all internal variables except one are removed. Importantly, Algorithm 3 is an iterative algorithm which allows for easy implementation (even for large networks), whereas Mason’s rules are not iterative and complicated even for small networks.

w1

G⁰₂₁

w2

G⁰23 G⁰32

w3

G⁰63

w6

G⁰56

G⁰65

w5

G⁰₅₄ w4

G⁰₄₆

G⁰₁₄

G⁰₅₂ v1

v2

v3

v4

v5

v6

w1

G˘⁰₂₁

w2

G˘⁰₅₂ w5

G˘⁰₅₄ G˘⁰₄₅ w4 G˘⁰₁₄

G˘⁰₄₂

˘ v1

˘ v2

˘ v4

˘ v5

(a) (b)

Fig. 3. (a)Original dynamic network considered in Example 2.(b)Immersed network withw3andw6removed.

Example 2:Consider the dynamic network shown in Figure 3a. The graph of this network is shown in the first graph of Fig.

2. Supposew3andw6are to be removed from the network (i.e.

Zj={3,6}). By Algorithm 3 the network shown in Figure 3b

(7)

results. The transfer functions of the immersed network are:

G˘ⁱ⁰(q,Dj)=







0 0 G⁰₁₄ 0

G⁰₂₁

1−G⁰₂₃G⁰₃₂ 0 0 0

0 G⁰₃₂G⁰₄₆G⁰₆₃ 0 G⁰₄₆G⁰₆₅ 0 ^G⁰⁵²_1−G^+G⁰⁵⁶0^G⁰⁶³^G⁰³²

56G⁰₆₅

G⁰₅₄

1−G⁰₅₆G⁰₆₅ 0







F˘ⁱ⁰(q,Dj)=







1 0 0 0 0 0

0 _1−G¹₀

23G⁰₃₂ 0 0 _1−G^G₀⁰²³

23G⁰₃₂ 0 0 0 1 0 G⁰₄₆G⁰₆₃ G⁰₄₆ 0 0 0 _1−G¹0

56G⁰₆₅

G⁰₅₆G⁰₆₃ 1−G⁰₅₆G⁰₆₅

G⁰₅₆ 1−G⁰₅₆G⁰₆₅





 .

Note that the immersed network (shown in Figure 3b) is represented by the last graph shown in Figure 2.

Interestingly, the matrix F˘ⁱ⁰ in Example 2 has the same structure as that of F˘⁰ in Proposition 1. This alludes to a connection between the network characterized in (10) and immersed networks as defined by Algorithm 3.

Proposition 2: The matrices G˘⁰ and F˘⁰ of the network characterized by (10) and the matrices G˘ⁱ⁰ andF˘ⁱ⁰ defined

by Algorithm 3 are the same.

The proof is in Appendix X-B. Since, by Proposition 2 the matrices in (10) are the same as those of the immersed network, the superscript i will be dropped from this point on in the matrices defined by Algorithm 3. An important consequence of Proposition 2 is that (by Proposition 1) the immersed network is unique.

Instead of calculating the matrices of the immersed network iteratively, it is also possible to derive analytic expressions for the matrices G˘⁰andF˘⁰.

Proposition 3: Consider a dynamic network as defined in (2) that satisfies Assumption 1. For a given set {j} ∪ Dj the transfer function matricesG˘⁰andF˘⁰of the immersed network are:²

"

0 G˘⁰_j_D G˘_D⁰_j G˘_DD⁰

#

= 1−G˜jj

I−diag( ˜G_DD⁰ )

⁻¹ 0 G˜⁰_j_D G˜_D⁰_j G˜_DD⁰ −diag( ˜G_DD⁰ )

F˘_jj⁰ 0 F˘_j⁰_Z 0 F˘_DD⁰ F˘_DZ⁰

= 1−G˜jj

I−diag( ˜G_DD⁰ ) ⁻¹

1 0 F˜_j⁰_Z 0 I F˜_DZ⁰

where G˜jj G˜jD

G˜Dj G˜DD

=

0 G⁰_j_D G_D⁰_j G_DD⁰

+ G⁰_j_Z

G_DZ⁰

(I−G_ZZ⁰ )⁻¹

G_Z⁰_j G_ZD⁰ ,

F˜jZ

F˜DZ

= G⁰_j_Z

G_DZ⁰

(I−G_ZZ⁰ )⁻¹.

The proof is in Appendix X-C. The transfer functionsG˜_mn correspond to G^(d)mn in Step 3 of Algorithm 3.

The immersed network inherits some useful properties from the original network.

Lemma 1: Consider a dynamic network as defined in (2) that satisfies Assumption 1 and a given set {j} ∪ Dj. 1. Consider the paths from wn to wm, n, m∈ Dj that pass

only through nodesw`,`∈ Zj in the original network. If

2The argumentsq orDj (or both) of G˘⁰_jk(q,Dj) andF˘_jk⁰(q,Dj)are sometimes dropped for notational clarity.

w1

G⁰₂₁ G⁰₁₂ w2

G⁰₃₁ w3

G⁰₂₃

G⁰₄₁ w4

G⁰₅₄ w5 G⁰25

G⁰₁₅

v1 r1

v2

v3

v4

v5

r5

Fig. 4. Network analyzed in Examples 3 and 7.

all these paths andG⁰_mn(q) have a delay (are zero), then G˘⁰_mn(q,Dj)has a delay (is zero).

2. Consider the paths fromrn towm(orvn towm),n∈ Zj, m∈ Dj. If all these paths pass through at least one node w`,`∈ Dj thenF˘_mn⁰ (q,Dj) = 0.

For a proof see Appendix X-D.

B. Conditions to EnsureG˘⁰_ji(q,Dj) =G⁰_ji(q)

A central theme in the previous section was that the transfer function G˘⁰_ji(Dj) in the immersed network may not be the same as the transfer function G⁰_ji in the original network.

In other words, by selecting a subset of internal variables to be taken into account, the dynamics between two internal variables might change. In this section conditions are presented under which the module of interest,G⁰_ji, remains unchanged in the immersed network, i.e.G˘⁰_ji(q,Dj) =G⁰_ji(q).

The following two examples illustrate two different phe- nomena related to the interconnection structure that can cause the dynamicsG˘⁰_ji(q,Dj)to be different fromG⁰_ji(q).

Example 3:Consider the dynamic network





 w1

w₂ w₃ w₄ w₅







=







0 G⁰₁₂ 0 0 G⁰₁₅ G⁰₂₁ 0 G⁰₂₃ 0 G⁰₂₅ G⁰₃₁ 0 0 0 0 G⁰₄₁ 0 0 0 0

0 0 0 G⁰₅₄ 0











 w1

w₂ w₃ w₄ w₅





 +





 v1+r1

v₂ v₃ v₄ v₅+r₅





 .

shown in Fig. 4. The objective of this example, is to chooseD2

such that in the immersed network G˘⁰₂₁(D2) =G⁰₂₁ (denoted in green). A key feature of the interconnection structure in this example is that there are multiple paths fromw1 tow2: w1→w2,w1→w3→w2,w1→w4→w5→w2, etc..

Start by choosingD2={1}, then by Proposition 3, G˘⁰₂₁(q,{1}) =G⁰₂₁(q)+G⁰₂₃(q)G⁰₃₁(q)+G⁰₂₅(q)G⁰₅₄(q)G⁰₄₁(q).

Two of the terms comprising this transfer function correspond to the two paths fromw₁tow₂that pass only throughw_k,k∈ Z₂ (Z₂={3,4,5}). From Algorithm 3 this is not surprising since the paths G⁰₂₃G⁰₃₁ and G⁰₂₅G⁰₅₄G⁰₄₁ must be lifted to remove the nodesw₃, w₄ andw₅ from the original network.

Clearly, for this choice ofD2,G˘⁰₂₁(D2)6=G⁰₂₁. Now choose D₂={1,5}. By Proposition 3

G˘⁰₂₁(q,{1,5}) =G⁰₂₁(q) +G⁰₂₃(q)G⁰₃₁(q).

Again, one of the terms comprisingG˘⁰₂₁(q,{1,5})corresponds to the (only) path fromw1tow2that passes only throughwk, k∈ Z2 (Z2={3,4}).

(8)

Finally, choose D₂ = {1,3,5}. By Proposition 3 G˘⁰₂₁(q,{1,3,5}) =G⁰₂₁(q)as desired. Note that for this choice of D2 every path exceptG⁰₂₁ from w2 tow1 is ”blocked” by

a node inD2.

In general, one internal variable wk from every independent path wi to wj must be included in Dj to ensure that G˘⁰_ji(q,Dj) =G⁰_ji(q). This is proved later in Proposition 4.

However, before presenting the proposition, there is a sec- ond phenomenon related to the interconnection structure of the network that can cause the dynamicsG˘⁰_ji(q,D_j)to be different from G⁰_ji(q), as illustrated in the next example.

w1

G⁰₂₁

G⁰₁₂ w2

G⁰₃₂

G⁰₂₃ w3

v1

r1

v2

v3

Fig. 5. Network that is analyzed in Example 4.

Example 4: Consider the network shown in Fig. 5. The objective of this example, is to choose D₂ such that in the immersed network G˘⁰₂₁(D2) =G⁰₂₁(denoted in green).

Note that in this network there is only one independent path from w1 tow2. ChooseD2={1}. By Proposition 3

G˘⁰₂₁(q,{1}) = G⁰₂₁(q) 1−G⁰₂₃(q)G⁰₃₂(q)

which is not equal toG⁰₂₁(q)as desired. The reason the factor

1

1−G⁰₂₃G⁰₃₂ appears is because when lifting the pathG23G32a self-loop fromw2tow2results. Thus, in step 3 of Algorithm 3 the transfer functions of the edges coming intow2are divided by the loop transfer function.

For the choice D2={1,3}, G˘⁰₂₁({1,3}) =G⁰₂₁ as desired.

Note that in for this choice of D₂, all paths fromw₂ to w₂

are ”blocked” by a node in D₂.

In general, ifD_j is chosen such that no self-loops fromw_j towj result due to the lifting of the paths when constructing the immersed network, the denominator in Step 3 of Algorithm 3 is reduced to 1. From these two examples we see that:

• Every parallel path fromwi towj should run through an input in the predictor model, and

• Every loop on the outputwj should run through an input in the predictor model.

This is formalized in the following proposition.

Proposition 4: Consider a dynamic network as defined in Section II-A that satisfies Assumption 1. The transfer function G˘⁰_ji(q,Dj)in the immersed network is equal to G⁰_ji(q)ifDj

satisfies the following conditions:

(a) i∈ Dj,j /∈ Dj,

(b) every pathwitowj, excluding the pathG⁰_ji, goes through a nodewk,k∈ Dj,

(c) every loop wj towj goes through a nodewk,k∈ Dj. The proof is in Appendix X-E. The formulated conditions are used to make appropriate selections for the node variables that are to be measured and to be used as predictor inputs. In the following section it is shown that it is possible to identify the dynamics of the immersed network.

C. Estimated Dynamics in Predictor Model

In this section it is shown that the estimated dynamics between the predictor inputs and the module output w_j, are equal to G˘⁰_jk(Dj). The result confirms that the estimated dynamics are a consequence of the interconnection structure and the chosen predictor inputs. In addition conditions are presented that ensure that the estimates of G˘⁰_jk(Dj) are consistent. The results in this section are not specific to a particular identification method.

To concisely present the result, it is convenient to have a notation for a predictor which is a generalization of both the Direct and Two Stage Methods. Consider the predictor

ˆ

w_j(t|t−1, θ) =H_j⁻¹(q,θ) X

k∈Dj

G_jk(q,θ)w^(X)_k (t)

+ X

k∈Pj

F_jk(q,θ)r_k(t) +

1−H_j⁻¹(q,θ)

w_j(t) (14) where X denotes a (sub)set of the variables r_k, v_k, k ∈ {1, . . . , L}. Note that both predictors (4) and (9) are special cases of the predictor (14). For the Direct Method, choose X = {rk₁, . . . , rk_n, v`₁, . . . , v`_n}, where {k1, . . . , kn} = R, and {`1, . . . , `_n} = V. Then w_k⁽^X⁾ =w_k. For the Two Stage Method, choose X ={r_k₁, . . . , r_k_n}, where {k₁, . . . , k_n} = T_j.

A key concept in the analysis of this section is theoptimal output error residual, which will be discussed next. From (10), w_j can be expressed in terms ofw_k,k∈ D_j as

wj=X

k∈Dj

G˘⁰_jkwk+X

k∈Zj∩Rj

F˘_jk⁰rk+X

k∈Zj∩Vj

F˘_jk⁰vk+vj+rj. (15)

Note that by Lemma 1 someF˘_jk⁰(q,Dj)may be zero depend- ing on the interconnection structure. Letwk be expressed in terms of a component dependent on the variables in X, and a component dependent on the remaining variables, denoted w_k=w^(X)_k +w^(⊥X)_k . In addition, split the sum involving the r_k-dependent terms according to whether r_k is in P_j or not.

Then, from (15):

wj=X

k∈Dj

G˘⁰_jkw_k^(X)+X

k∈Dj

G˘⁰_jkw^(⊥X)_k +X

k∈Pj

F˘_jk⁰rk

+ X

k∈((Zj∪{j})∩Rj)\Pj

F˘_jk⁰r_k+ X

k∈Zj∩Vj

F˘_jk⁰v_k+v_j. (16)

When choosing an Output Error predictor (i.e.Hj(q, θ) = 1), with predictor inputs w_k^(X⁾,k∈ Dj andr_k,k∈ Pj, the part of (16) that is not modeled can be lumped together into one term. This term is theoptimal output error residualofw_j, and is denotedp_j:

pj(Dj) := X

k∈Dj

G˘⁰_jkw^(⊥X_k ⁾+ X

k∈((Zj∪{j})∩Rj)\Pj

F˘_jk⁰rk+ ˘vj, (17)

wherev˘_jis given byP

k∈Zj∩Vj

F˘_jk⁰v_k+v_jin accordance with (12). Consequently,wj equals:

wj = X

k∈Dj

G˘⁰_jkw^(X)_k + X

k∈Pj

F˘_jk⁰rk+pj. (18)