55
Chapter 3
First principles models of bioprocesses versus Hybrid neural network models
In this chapter, we attempt to answer the following question: “Which kind of macroscopic models must we use to obtain the best possible mathematical representation of a bioprocess?”.
In chapter 2, three kinds of macroscopic models were presented: the first principles (section 2.2.3.1), the black box (section 2.2.3.2) and the hybrid models (section 2.2.3.3). All these modelling techniques have their advantages and their drawbacks. Nevertheless, chapter 3 proposes to compare only two of them. The black box models are not considered. Indeed, numerous studies already compare different neural networks (the most common black boxes) with the corresponding serial hybrid models (Psichogios and Ungar, 1992; James et al., 2002). The results of these works converge to the same conclusion: hybrid models, which combine the available prior knowledge with the neural network nonlinear mapping capabilities, provide better results in terms of extrapolation.
Hence, this chapter rejects pure neural networks without any physical meaning, in order to evaluate models which exploit the available prior knowledge, the well known following mass balances:
) ( ) ( ) ( ) ( ) , ) (
( t D t t t t
dt t
d ξ K φ ξ ξ F Q
− +
−
= (3.1)
However, all the hybrid architectures are not considered. Indeed, we put the parallel approach aside as it is not currently used. Serial hybrid modelling is usually preferred since it allows to avoid the difficult selection of a reaction scheme and/or a kinetic model structure. Hence, we propose to compare first principles models with hybrid neural network models in which the neural network replaces only the kinetics or the whole reaction term, as shown in the following equations
) ( ) ( ) ( ) ( ) , ) (
( t D t t t t
dt t
d ξ K NN ξ ξ F Q
− +
−
= (3.2)
) ( ) ( ) ( ) ( ) , ) (
( t D t t t t
dt t
d ξ NN ξ ξ F Q
− +
−
= (3.3)
where NN ( ξ , t ) represents the neural network.
Regarding the neural networks in these expressions, we test the MultiLayer Perceptron and the Radial Basis Function Network, which are the most widespread universal approximators. More precisely, the selected MLP is a sigmoid neural network with one sigmoid hidden layer and a linear output layer (2.39). Cybenko (1989) demonstrated its capability to approximate any nonlinear continuous function arbitrary accurately.
In order to compare efficiently the white and grey box versions, an efficient first principles model has to be considered. Hence, we select the general kinetic model structure (2.32) proposed by Bogaerts (1999), which allows to represent any activation and/or inhibition effects from each component in the culture and disposes of a systematic identification procedure based on its linearization.
Concerning the determination of a good reaction scheme, we adopt the
systematic procedure (Hulhoven et al., 2005) to generate and compare a group
of reaction schemes given a set of component measurements. It is useful when
no knowledge is available on the relations between the macroscopic species to
be considered.
This comparison, which relies on direct and cross validation results as well as the parameter estimation errors, is based on three different databases. The two first datasets are obtained thanks to a simulator which reproduces the behaviour of a simple microbial growth process. The first database contains batch cultures while the second one consists of fed-batch cultures. Tests on these kinds of databases allow to show how first principles and hybrid models manage simple case studies. On the other hand, capabilities of macroscopic models in real complex cases is tested with the third database which consists of 7 fed-batch bacterial cultures from a bioprocess used by one of our industrial partners to produce an enzyme.
This chapter begins with the description of the different systematic parameter identification methods. First, we present the systematic procedure of Hulhoven et al. (2005) for the determination of a good reaction scheme. Afterwards, we describe how to identify the kinetic parameters of the general model (2.32).
Finally, we propose systematic identification methods for the neural network parameters. After this discussion about parameter estimation (section 3.1), we present our three databases (section 3.2) before considering the comparison of the different macroscopic models (section 3.3). Finally, some conclusions are drawn in section 3.4.
3.1 Systematic parameter identification methods
This section describes the different parameter identification methods used in
this work. First, we explain the systematic procedure of Hulhoven et al. (2005)
which allows to generate C-identifiable reaction schemes, given a set of
components whose measurements are available. The comparison of the obtained
pseudo-stoichiometries leads to the best possible reaction scheme identifiable
independently of the kinetics.
Afterwards, we expose how to estimate the kinetic parameters of the model (2.32) and finally how to identify the parameters of serial hybrid models based on a MultiLayer Perceptron or a Radial Basis Function network. This section ends with remarks on the optimization algorithms used in this work to identify the parameters.
3.1.1 Systematic identification of the pseudo-stoichiometry
The systematic procedure developed in Hulhoven et al. (2005) to generate and compare C-identifiable reaction schemes relies on the decoupled identification method proposed in Bastin and Dochain (1990), which allows the pseudo-stoichiometric coefficients to be estimated independently of the kinetics.
Moreover, it uses the necessary and sufficient condition of identifiability of the pseudo-stoichiometric coefficients presented in Chen and Bastin (1996). Based on maximum likelihood estimators proposed in (Bogaerts et al., 2003; Bogaerts and Hanus, 2000), this systematic procedure estimates easily the pseudo- stoichiometric coefficients of each identifiable reaction scheme taking all the measurements errors into account. For each given number of reactions, the final values of the cost function are used to compare and rate the several candidate reaction schemes.
The next section describes the decoupling method and condition which allow
to estimate the pseudo-stoichiometry and the kinetics independently. Section
3.1.1.2 talks about systematic generation of the C-identifiable reaction schemes
while section 3.1.1.3 tackles the problem of parameter estimation.
3.1.1.1 Identification of the pseudo-stoichiometry independently of the kinetics
The decoupling method, which allows to estimate the pseudo-stoichiometry independently of the kinetics, relies on a structural property of the general dynamic model (3.1) and a state-space transformation. This section briefly exposes their principles.
If K ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
NxMis of rank M ( M ≤ ≤ ≤ ≤ N )
= M ) (
rank K (3.4)
then there always exists a partition of the state vector ξ
[
Tb]
T a
T
ξ ξ
ξ = (3.5)
so that the corresponding partition of K
[
Tb]
T a
T
K K
K = (3.6)
involves a matrix K
a∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
MxMof full rank
a
) = M (
rank K (3.7)
Given such a partition of K , the following matrix equation
M M N b
a
+ K = 0
− ,CK (3.8)
has a unique solution C ∈ ℜ
(N−M)xM.
It is therefore possible to define an auxiliary vector z ∈ ℜ
N−M: )
( ) ( )
( t Cξ
at ξ
bt
z = + (3.9)
whose dynamics are independent of the kinetics φ ( ξ , t ) :
) ( ) ( )
( ) ) (
( D t t t t
dt t d
b
a
u
Cu z z
+ +
−
= (3.10)
where the corresponding partition of u = F − Q is given by
[
Tb]
T a
T
u u
u = (3.11)
An estimate C ˆ of the matrix C can be computed on the basis of the equations (3.9-10) and measurements of ξ
aand ξ
b. Finally, estimates of K
aand K
bcan be deduced from C ˆ using (3.8). This last estimation will be unique if the necessary and sufficient condition of C-identifiability (Chen and Bastin, 1996) is fulfilled:
Let k
ibe a vector containing the unknown elements in the i
thcolumn of the matrix K . k
iis C-identifiable if and only if there exists at least one partition
MxM a
∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
K with rank ( K
a) = M , which does not contain any element of k
i.
From this condition, a sufficient condition of C-identifiability can be deduced, which will be useful in the following:
A reaction scheme (and the associated pseudo-stoichiometry matrix) is C- identifiable if there exists a partition [
Tb]
T a
T
K K
K = where K
a∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
MxMis of
full rank and does not contain any unknown coefficients of K .
Note that if this condition is fulfilled, then K
ais known and the estimate K ˆ
bis simply deduced from (3.8) by
a
b
C K
K ˆ = − ˆ (3.12)
This condition will be further exploited in the next section in order to develop a systematic generation and evaluation procedure of C-identifiable reaction schemes.
3.1.1.2 Systematic generation of C-identifiable reaction schemes
The aim of this procedure is to generate, given a set of N macroscopic species and a number of M reactions, the C-identifiable reaction schemes associated to a diagonal submatrix K
a. As the number of reactions is a priori unknown, all the solutions with M ≤ N have to be tested systematically.
For a given number of reactions M , candidate reaction schemes are characterized by a pseudo-stoichiometry matrix (of rank M ) for which there exists a partition [
Tb]
T a
T
K K
K = where K
a∈ ℜ
MxMis of full rank and does not contain any unknown coefficient of K (according to the sufficient condition of C-identifiability).
Within the yield coefficient matrix K , each column j ∈ [ 1 , M ] contains null
coefficients (w.r.t. species not involved in the reaction j ), unknown coefficients
and one coefficient K ( i , j ) whose value is fixed to ± 1 (denoting that the j
threaction is normalized w.r.t. the i
thcomponent whose yield coefficient is equal
to + 1 if it is produced and − 1 if it is consumed). As the sufficient condition of
C-identifiability requires that K
adoes not contain any unknown coefficient of
K , all the unknown coefficients belong to K
b. In turn, each column of K
aonly contains null coefficients and one coefficient equal to ± 1 . Moreover, the C-identifiability condition imposes a rank condition on the matrix K
a( rank ( K
a) = M ). In order to respect this latter condition, each row of K
aonly contains one nonzero element. Therefore, the matrix K
atakes the following form
{ ± 1 , ± 1 , , ± 1 }
= diag L
K
a(3.13)
As Hulhoven et al. (2005) precise it: From this definition of the matrix K
a, it appears that the number of reactions has to be smaller than the number of macroscopic species ( M < N ). Indeed, M = N implies K
a= K . In this particular case, the reaction scheme would correspond to N reactions involving only one macroscopic species. Such a reaction scheme is meaningless from a biological point of view.
Which elements of ξ do we have to select in order to define the partition
[
Tb]
T a
T
ξ ξ
ξ = inducing the partition [
Tb]
T a
T
K K
K = where K
ais of the form (3.13)? The idea is to test all the possible combinations of a set of M elements ( ξ
a∈ ℜ
M) from the set of N available macroscopic species. The number of possible combinations is thus given by:
)!
(
!
! M N M
N M
N
= −
(3.14)
The total number of combinations, for all the possible numbers of reactions, is the following
2 )! 2
(
!
1
!
1 1
1
−
− =
=
∑
∑
−=
−
=
N N
M N
M
M N M
N M
N (3.15)
Once a combination is selected, the choice of the values + 1 or − 1 for the diagonal elements of K
ahas to be made. This choice is obvious when we know a priori whether a component is always consumed (value − 1 ) or always produced (value + 1 ). Such a component is then characterized by an indicator
1
δξ
j= = = = − − − − or δξ
j= = = = + + + + 1 , respectively. When this a priori knowledge is not available, Hulhoven et al. (2005) suggest a computational procedure which deduces from measurements of ξ
aand ξ
b, the corresponding values of D , u
a, u
b, the equations (3.1) and (3.13) as well as from the inequality φ
j( t ) ≥ 0 ∀ j , t that
t t
u t t dt D
t d
j j
j
( ) + + + + ( ) ξ ( ) − − − − ( ) ≥ ≥ ≥ ≥ 0 ( ≤ ≤ ≤ ≤ 0 ) ∀ ∀ ∀ ∀
ξ (3.16)
) 1 ( 1 δξ = = = = + + + + − − − −
→
→
→
→
j(3.17)
and
j a
( j , j ) = = = = δξ
K (3.18)
However, the time derivative in (3.16) has to be approximated and the sign of the inequality be tested statistically on the basis of the noisy measurements.
Moreover, certain components do not satisfy one of the two inequalities
(3.16): they are consumed in some reaction(s) and produced in other one(s). The
concentration of such a component cannot belong to the vector ξ
aas it is not
possible to reproduce simultaneous consumption and production of a component
by only one reaction. Such a component is then characterized by δξ
j= = = = 0 and
has to belong to the vector ξ
b. This reduces the total number of possible
solutions (3.15) to
∑ ∑
∑ ∑
∑ ∑ ∑
∑
−−−−−−−−===
=
−
−
−
−
−
−−
−
==
==
− − − − − − − −
−
−
−
= −
=
=
=
− − − −
00 1
1 0
0 1
1
0
)!
(
!
)!
N
(
N
M N
N
M
M N N M
N N M
N
N (3.19)
where N
0is the number of components which are concurrently consumed and produced.
Until now, possible partitions and their corresponding indicators δξ
jof the component concentrations have been determined. The next step is to write the corresponding matrices K
aunder the form (3.13). These matrices satisfying the C-identifiability condition, it is possible to estimate the corresponding matrices C and to compute, from equation (3.12), the unknown yield coefficients contained in K
b. A maximum likelihood estimator (Bogaerts et al., 2003), taking all the measurement errors on the different component concentrations into account, can be used to obtain the different estimates. It is presented in the following section.
For a given number of reactions, the different schemes have to be compared on the basis of the numerical values taken by the maximum likelihood cost function. The “best” scheme corresponds to the lowest value of this criterion.
This comparison is however not possible between reaction schemes with different numbers of reactions M . The selection between “best” candidate reaction schemes with different values of M has to be considered from another, more global, point of view, e.g. using the numerical values of a cost function for the identification of a complete simulation model, including pseudo- stoichiometry, kinetics and initial conditions (Bogaerts and Hanus, 2000).
Besides the sign of the diagonal elements of all the possible matrices K
a, it
is possible to have some a priori knowledge on the sign of the elements of K
b.
Let K
b,constrbe the matrix of sign constraints on the unknown pseudo- stoichiometric coefficients of K
b. The elements of K
b,constrimpose the following constraints on the elements of K
b0 ) , ( 1
) ,
,constr
( i j = − →
bi j ≤
b
K
K (3.20)
0 ) , ( 1
) ,
,constr
( i j = + →
bi j ≥
b
K
K (3.21)
→
= 0 ) ,
,constr
( i j
K
bno constraints on K
b( i , j ) (3.22)
Since the objective of the optimization problem is to estimate the matrix C , the constraints imposed on the elements of K
bhave to be transposed to the elements of C . A matrix C
constrcan simply be deduced from the following relation:
1 ,
−
−=
bconstr aconstr
K K
C (3.23)
and an estimation of C can be obtained thanks to an optimization under the unilateral constraints imposed by C
constr.
To conclude, Hulhoven et al. (2005) summarize the procedure in the following steps:
(1) select a set of measurable macroscopic species which are relevant for system description and in agreement with the intended purpose of the model;
(2) for each component ξ
jj ∈ [ 1 , N ] , assign a value of δξ
jequal to −1 if it is
always consumed, +1 if it is always produced, and 0 if both situations may
occur. This analysis results from a priori knowledge on the system or from
a simple analysis based on equations (3.16);
(3) for all the possible numbers of reactions M ∈ ∈ ∈ ∈ [[[[ 1 , N − − − − 1 − − − − N
0]]]] , the following steps have to be repeated:
a) for a given number M , determine the number of possible combinations of M macroscopic species from a set of N − N
0available species characterized by δξ
j≠ ≠ ≠ ≠ 0 ;
b) for each of these combinations,
I. perform the corresponding permutations in the state vector ξ in order to determine the vector ξ
a;
II. define the corresponding partition of the matrix K , with the matrix K
aof the form K
a= = = = diag {{{{ }}}} δξ
j;
III. compute an estimate C ˆ , under the constraints defined in (3.23);
IV. compute the corresponding estimate K ˆ
bon the basis of (3.12);
c) classify the resulting reaction schemes according to the value of the cost function.
3.1.1.3 Maximum likelihood estimation of the pseudo- stoichiometry
To perform the step (3.b.III) of the above described systematic generation of
the pseudo-stoichiometry, a maximum likelihood estimator is employed in
Hulhoven et al. (2005); such a criterion allows to take all the measurement
errors on all the signals and at each measurement time into account.
The authors identify the elements of the matrix C under unilateral constraints (inequality constraints imposed by C
constr). Bogaerts et al. (2003) propose to force the sign constraints by identifying the unknown elements of K
bafter a logarithmic transformation. Hence, if k
+and −k
−are respectively vectors containing positive and negative elements of K
b, the new procedure identifies the vectors ln( k
+) and ln( k
−) , without unilateral constraints, and with respect to the sign constraints on the elements of k
+and −k
−.
The method proposed by Bogaerts et al. (2003) is presented here.
On the basis of (3.9) and (3.10), measurements of ξ
aand ξ
band the knowledge of D , u
aand u
b, it is possible to estimate the matrix K
bindependently of the kinetics. Indeed, the solution of (3.10) is given by
(((( )))) ∫∫∫∫
∫∫∫∫
+ + + + + + +
+
= =
= =
−
−
−
∫∫∫∫
−t
d
t D d D
b
a
e d e
t
0 0) (
0
) (
) ( ) ( )
0 ( ) (
κ κ κ
κ
τ τ
τ
τ
u Cu
z
z (3.24)
which can be substituted into (3.9) to obtain the following equation:
(((( ( ) ( ) )))) ( )
) 0 ( )
(
0 0) (
0
) (
t e
d e
t
ad
t D d D
b a
b
t
Cξ u
Cu
ξ z ∫∫∫∫ − − − −
∫∫∫∫
+ + + + +
+ + +
=
=
=
=
−
−−
∫∫∫∫ τ τ
κ κτ
− κ κ τ(3.25)
or
+ ∫
−
=
−
t
d D a
b
t t e
0) (
) 0 ( ) ( )
(
κ κ
z Cη
η (3.26)
where
at = = = =
at − − − − e
−−−−∫∫∫∫
D d∫∫∫∫
t ae ∫∫∫∫
D dd
t
0
) ( )
(
0
0
( )
) ( )
( τ τ
τ κ κ κ
κ
ξ u
η (3.27)
and
bt = = = =
bt − − − − e
−−−−∫∫∫∫
D d∫∫∫∫
t be ∫∫∫∫
D dd
t
0
) ( )
(
0
0
( )
) ( )
( τ τ
τ κ κ κ
κ
ξ u
η (3.28)
The model structure (3.26), evaluated at the measurement sampling times
k
t
s,of an experiment s ( s ∈ [ ] 1 , S , k ∈ [ 1 , N
s] ) can finally be rewritten in the general form:
k s st T s k
s,
θ ( θ ) φ
,Y = (3.29)
with
M N k s s b k
s,
= η
,( t
,) ∈ ℜ
−Y (3.30)
) 1 ( 1 )
( ,
, ,
,
)
0(
++++−−−
−
ℜ ℜ ℜ ℜ
∈
∈ ∈
∈
∫∫∫∫
−
−
−
−
=
=
=
=
xMd D k
s T
s a T
k s
k ts
e t
κ κ
η
φ (3.31)
and
) 1 ( ) (
) ( :)
, (
) 1 ( :)
, 1 (
) 0 ( )
(
) 0 ( )
( )
(
− +−
−
ℜ
∈
=
N M xMM N s st M N
s st
st T s
z z
θ C
C θ θ
θ M M (3.32)
where η
b,s, η
a,sand z
sare the cell culture conditions for the experiment s .
(i,:)
C is the i
throw of C , z
(is)is the i
thcomponent of z
sand θ
stis the vector of unknown parameters.
[ ln( ) ln( )
1( 0 )
TS( 0 ) ]
T T
st
k k z z
θ =
+ −K (3.33)
where k
+is the vector containing all the unknown coefficients of K
bunder positivity constraints, −k
−is the vector containing the other coefficients, and
) 0
T
(
z
icorresponds to the initial condition of z for the i
thexperience.
The maximum likelihood estimation of θ
stis given by
( )
( ) (
st msk)
T s k s m st s k s st T s k s S
s N
k
T k s m st T s k s m st
S
st
ArgMin
, , ,
, 1 ,
, ,
,
1 1
, , ,
,
) ( )
( )
(
) 2 (
ˆ 1
φ θ θ θ Y
θ θ Q
θ Q
φ θ θ θ Y
φ Y
θ
− +
−
=
−
= =
∑ ∑ (3.34)
where Y
m,s,kand φ
m,s,kare measurements, differing from the true values Y
s,kand φ
s,kdue to the presence of noises which are assumed to corrupt only the concentration measurements contained in ξ
band ξ
a. These noises are assumed to be white, normally distributed with zero mean and known time varying covariances Q
Y,s,kand Q
φ,s,k:
k s k s k s
m,,
Y
,ε
Y,,Y = + (3.35)
k s k s k s
m,,
φ
,ε
φ,,φ = + (3.36)
0 ] [
,s,k=
E ε
Y(3.37)
0 ] [
,s,k=
E ε
φ(3.38)
l k t s k s T
l t k
E [ ε
Y,s,ε
Y,,] = Q
Y, ,δ
,δ
,(3.39)
l k t s k s T
l t k
E [ ε
φ,s,ε
φ,,] = Q
φ,,δ
,δ
,(3.40)
t s l k
E [ ε
Y,s,kε
Tφ,t,l] = = = = 0 ∀ ∀ ∀ ∀ , , , (3.41)
where E represents the mathematical expectation and δ
k,lis the Kronecker’s
symbol.
As developed in Bogaerts (1999), an estimation of the covariance matrix of the parameter estimation errors is given by
(((( ))))
11 1
, 1
, , ,
,
,
) ( ˆ ) ( ˆ ) ( ˆ , ˆ )
, ˆ ( ˆ
~ ] [ ~ ˆ
−
−
−
−
=
=
=
= ====
−
−−
−
+ + + +
≈ ≈
≈ ≈ ∑ ∑ ∑ ∑∑
S∑ ∑ ∑
s N
k
k s st T s st s k s st T s k s k s st s T
st st
s
E θ θ Θ θ φ Q
Yθ θ Q
φθ θ Θ θ φ
(3.42) where
st st
st
θ θ
θ = ˆ −
~ (3.43)
) ( dim ,
ˆ ) (:, ,
ˆ ) 1 (:,
,
) ˆ ˆ
, ˆ
( ˆ
T sk xN Mst T M N s k
T s st T s k
s st s
st
st st st
st
−
−−
−
==
==
−
−
−
−
==
==
ℜ ℜ ℜ ℜ
∈ ∈
∈ ∈
∂
∂∂
∂
∂
∂∂
∂
∂
∂∂
∂
∂
∂∂
= ∂
= =
=
θθ θ θ
θ
θ φ φ θ
θ φ θ
θ
Θ K (3.44)
and
( )
(
st msk)
T s k s m
st s k s st T s k s st s k s k s m k s
, , ,
,
1 ,
, ,
, ,
, , , ,
ˆ ) (
ˆ ) ( ˆ )
( ˆ )
ˆ (
φ θ θ Y
θ θ θ Q
θ θ Q
θ φ Q
φ
Y φ− + +
=
ϕ −(3.45)
Note that the nonlinear optimization problem (3.34) only guarantees a unique solution if the measurement noises are time invariant (which means that
Y
Y
Q
Q
,s,k= , Q
φ,s,k= Q
φ, ∀ s, k ). However, it is possible to obtain a unique first
initial guess of θ
stin a systematic way. It consists in considering time invariant
matrices for the measurement noise covariance or in reducing the problem to a
Markov estimate where the covariance matrix Q
Y,s,kis diagonal and the
covariance matrix Q
φ,s,k= 0 , ∀ s, k . Of course this simplification may only
serve as unique initial guess as it relies on the assumption that all the
measurements contained in φ
s,kare not corrupted by noise. Although this
assumption is most of the time definitely unacceptable, this kind of error is often
made in the literature.
3.1.2 First principles kinetic parameter identification
Once a pseudo-stoichiometry is determined, a kinetic model structure must be chosen and their parameters identified. As mentioned in the introduction of this chapter, we select the model structure developed by Bogaerts and Hanus (2000) to model the kinetics:
[[[[ M ]]]]
j e
t t
l
t h
h j j
l lj
hj
( ) 1 ,
) ξ ,
( = = = = α ∏ ∏ ∏ ∏
γ∏ ∏ ∏ ∏
−−β−− ξ()∈ ∈ ∈ ∈
ϕ ξ (3.46)
Indeed, this model is general since it allows to represent the activation as well as the inhibition effects from each component in a culture. Moreover, it presents the advantage of a systematic identification procedure.
In this section, we recall this latter procedure which proceeds in two steps:
• a first linear estimation based on the mass balances (3.1), the components concentration measurements, the knowledge of the inputs (dilution rate or feeding rate), the previous obtained pseudo-stoichiometric coefficients and finally an estimation of the reaction rates from discrete measurements and an interpolation model;
• and, based on these first parameter estimates, a maximum likelihood
estimation of the kinetic parameters and the initial concentrations of the
experiments used for the identification.
3.1.2.1 Linear first estimation
The kinetic model structure (3.46) can be linearized with respect to its parameters thanks to a logarithmic transformation:
[[[[ M ]]]]
j t t
t
l l lj h
h hj j
j
( , ) ln ln ξ ( ) ξ ( ) 1 ,
ln ϕ ξ = = = = α + + + + ∑ ∑ ∑ ∑ γ − − − − ∑ ∑ ∑ ∑ β ∈ ∈ ∈ ∈ (3.47)
This allows to find a linear least squares estimate of the kinetic coefficients (which necessarily exists, is unique and independent of any initial guess):
( ) j [ M ]
ArgMin
S
s N
k
j cin T j
k s j
k s m j
cin
S
j cin
, 2 1
ˆ 1
1 1
) 2 ( ) (
, ) (
, , )
(
) (
∈
−
= ∑ ∑
= =
θ φ θ Y
θ
(3.48)
where
) ˆ (
ln
,) (
, j sk
j k
s
= = = = ϕ t
Y (3.49)
[[[[ 1 ln ξ (
,) ξ (
,) ]]]]
) (
, h sk l sk
T j
k
s
= = = =
Kt − − − −
Kt
φ (3.50)
[[[[
j h j l j]]]]
T j
cin)
ln α γ
Kβ
K(
= = = =
θ (3.51)
under the positivity constraints [[[[ γ
hKjβ
lKj]]]] ≥ ≥ ≥ ≥ 0 (3.52)
As for the estimates of the reaction rate, we can obtain φ ˆ ( t
s,k) thanks to the following equation
− −
− − + + +
+
=
= =
= ˆ
−−−−( ) ( ) ( ) ( ) ))
(
ˆ (
, , ,^ 1 ,
, sk a sk a sk
k s a a k
s
D t t t
dt t
t d ξ ξ u
ξ K
φ (3.53)
where ξ
a, K
a, u
aare vectors defined for the decoupled identification method (section 3.1.1.1). Regarding the estimate of the time derivative
dt t d ξ
a(
s,k)
, it can be computed by the numerical derivation of an interpolation model (a cubic spline, for instance) of the vector ξ
a(t ) .
These first estimates of ˆ
(j)θ
cinare based on unreliable assumptions on the measurement errors and on estimates of the signal derivatives. Therefore, these estimates are just considered as an initial guess for the second step of the identification.
3.1.2.2 Markov estimation
On the basis of the identified pseudo-stoichiometry and the first estimation of the kinetics parameters, a new estimation of the kinetics with the initial concentrations of the experiments used for the identification has to be done.
The simulation model {3.1, 3.46} consists of a nonlinear differential system of the form
) );
( ), ( ) (
( x u θ
x f t t dt
t
d = (3.54)
where
[ ( ) ( ) ]
) ( )
( t
Tt
Tat
Tbt
T
ξ ξ ξ
x = = (3.55)
is the state vector containing the concentrations of the components involved in the selected reaction scheme (2.6)
[[[[ ( ) ( ) ]]]]
)
( t D t
Tt
T
F
u = = = = (3.56)
is here the input vector containing the dilution rate and the external feed rates;
[[[[
j h j l j T]]]] j [[[[ M ]]]]
T
= = = = ln ξ ( 0 ) ∈ ∈ ∈ ∈ 1 ,
θ α γ
Kβ
K(3.57)
is the vector of the parameters to be identified (logarithm of the kinetic coefficients and initial concentrations in order to ensure their positivity);
and f is the model structure corresponding to relations {3.1, 3.46}.
Let
) );
0 ( ), ( , ( )
( u ξ θ
ξ t = = = = g t t (3.58)
be the solution (generally obtained by numerical solving) of the differential system (3.54) starting from the initial concentrations ξ ( 0 ) . The sampled measurements assumed to be corrupted by a white noise ε
y,s,k, normally distributed with zero mean and covariance matrix Q
s,kcan be written as
k s y s
k s k s k s
m,,
g ( t
,, u ( t
,), x ( 0 ); θ ) ε
, ,y = = = = + + + + (3.59)
( t
s,kbeing the k
thsample time of the s
thexperiment). As the noise is assumed to corrupt only the concentration measurements in ξ , the maximum likelihood estimate of θ can then be deduced from a nonlinear Markov estimator
(((( ))))
(((( ( , ( ), ( 0 ); ) ))))
) );
0 ( ), ( , 2 (
ˆ 1
, , ,
,
1 ,
1 1
, , ,
,
θ x u y
θ x u y
θ
θ
s k s k s k s m
k s S
s N
k
T s
k s k s k s m
t t g
Q t
t g ArgMin
S
−
− −
−
−
− −
−
=
=
=
=
−−−−==
== ====
∑ ∑ ∑
∑ ∑ ∑ ∑ ∑ (3.60)
The initial guess of θ consists, on the one hand, of the first estimate of the
kinetic parameters deduced from the previous linear estimation step and, on the
other hand, of the measurements of ξ at the initial time.
The covariance matrix of the parameter estimation errors can also be estimated in this last step (Bogaerts, 1999):
1
1 1
, , 1 , ,
,
, ( ), ( 0 ); ˆ ) ( , ( ), ( 0 ); ˆ )
(
~ ] [ ~ ˆ
−
−−
−
=
=
=
= ====
−
−
−
−
≈
≈
≈ ≈ ∑ ∑∑ ∑ ∑
S∑ ∑ ∑
s N
k
s k s k s T k s s
k s k s T
s
t t t
t
E θ θ G
θu x θ Q G
θu x θ (3.61)
where
θ
θ
θθ x θ u
x u G
ˆ ,
, ,
,
) );
0 ( ), ( , ) (
); ˆ 0 ( ), ( , (
∂
== ∂
sk sk ss k s k s
t t t g
θ
t (3.62)
This Jacobian is obtained by solving (together with the simulation model (3.54)) the sensitivity equations
θ θ u θ x
x u x G
θ u θ x
x u
G ∂
+ ∂
∂
= ∂
∂
∂ ( , ( ); )
) );
0 ( ), ( , ) ( );
( , ) ( );
0 ( ), ( ,
( f t
t t t
t f
t
θt
θ(3.63)
with the initial condition
0
) 0 ) ( );
0 ( ), 0 ( , 0
(
θθ
G
θ θ x x u
G =
∂
= ∂ (3.64)
where G
θ0is a matrix whose elements are all equal to zero except the ones corresponding to the partial derivative of the elements of ξ ( 0 ) ∈ x ( 0 ) with respect to their homologous ln ξ ( 0 ) ∈ ∈ ∈ ∈ θ , these partial derivatives being equal to
) 0 ( ξ .
The Jacobian G
θ( t
s,k, u ( t
s,k), x
s( 0 ); θ ˆ ) involved in relation (3.61) is thus
obtained by evaluating the numerical solution G
θ( t , u ( t ), x ( 0 ); θ ) of the system
{(3.54), (3.63)} for t = t
s,kand θ = θ ˆ .
At the end of this identification step, all the parameters have been identified:
the pseudo-stoichiometric coefficients (3.34) and the kinetic parameters (3.60).
Hence, the model is completely determined but has of course to be validated (cross validation, study of the correlation matrix of the parametric errors, etc.).
3.1.2.3 Covariance matrix and parameter reduction
The study of the covariance matrix (3.61) can help reducing the number of kinetic parameters. Indeed, hardly assessable coefficients can be suppressed after checking that their information is covered by other parameters. In concrete terms, coefficients with high variance and a sufficient correlation with other parameters can be cancelled out. This cancellation reduces the number of parameters, and in turn the effect of a component (e.g. activation or inhibition).
Developed thanks to the trial-and-error method, the following parameter reduction procedure is therefore proposed:
1. First, the parameters with a variance and a covariance respectively larger than 10
3and 10
6are cancelled out.
2. Then, the remaining parameters are re-estimated and the parameters with a variance and a covariance respectively larger than 100 and 50 are cancelled out.
3. The same operation is repeated with variance and covariance levels of 4 and 2 , respectively.
4. Finally, a last round is achieved with variance and covariance levels of 5
.
1 and 1 , respectively.
Note that the selected thresholds are applied to every parameter, irrespective
of its dimension. Indeed, the parameter positivity is ensured through a
logarithmic transformation (3.57). The elements of (3.61) can therefore be regarded as relative errors.
Concerning the numerous reduction steps, they are necessary to efficiently isolate hardly assessable parameters. Indeed, the inaccurate estimation of one unimportant parameter can have a significant influence on the estimation of an essential parameter, i.e., this influence can lead to significant variance and correlation of this last coefficient. In order to avoid such a problem, we propose a step-by-step reduction procedure.
However, the reduction procedure has to be refined: kinetic constants and activation coefficients of reactants cannot be suppressed. Only inhibition coefficients and activation parameters of products can be cancelled out.
3.1.3 Neural Network parameter identification
In order to identify systematically neural network parameters, an estimation procedure which is inspired by the RBF estimation technique of Vande Wouwer et al. (2004) has been developed. It proceeds in four steps whatever the kind of hybrid models (3.2 or 3.3) or the considered neural network.
1. A first estimation of the hidden parameters is carried out:
a. for a MultiLayer Perceptron, we assign random values to the weights and biases of the sigmoid hidden layer. There exists no systematic method to determine a first estimation of these coefficients.
b. for a Radial Basis Function network, the centres of the Gaussians are
determined in a unsupervised learning phase in order to cover the
dataspace at best. A clustering algorithm, as k-means in MatLab
©,
classifies the data according to their similarities, organizes them into
groups and computes the centres of each group. Then, the widths are
chosen as the mean distances between the different centres (Suykens et al. 1996).
2. On the basis of these values of the hidden parameters, a least squares estimator allows to determine the output parameters. This linear estimation is based on component concentration measurements, the knowledge of the inputs, the pseudo-stoichiometric coefficients (if required) and finally an estimation of the reaction rates (or the reaction term) from discrete measurements and an interpolation model.
3. Starting from the previous parameter values , a nonlinear identification step gives a new estimation. This time, the identification is based on the simulation of the complete hybrid model and an estimator using a slightly modified Markov criterion, the regularization criterion (2.75) which takes the measurements errors into account and improves the generalization of the considered neural network (its ability to respond satisfactorily to unknown inputs).
4. A second nonlinear identification step re-estimates the neural parameters and initial conditions of the various cultures used for the identification. This last step uses the same estimator as in the third step and can be achieved using the simplex search method.
Since the first estimations of the hidden parameters are carried out, in this
work, thanks to Matlab
©functions (randn.m or kmeans.m), we propose to begin
this section with the linear estimation of the output parameters before presenting
nonlinear identification steps. We conclude this section with some remarks
about the optimal size of a neural network and other comments about the
identification.
3.1.3.1 Linear identification of the output parameters
In the considered hybrid neural network models (3.2) and (3.3), the neural expressions of the kinetics and the reaction term are the following
M j
b h w t
nhl
i
j i ji k
s
j
( ) 1 , ,
1
,
= = = = ∑ ∑ ∑ ∑ + + + + = = = = K
==
==
φ (3.65)
and (((( )))) t w h b j M
nhl
i
j i ji k
j
(
s) 1 , ,
1
,
= = = = ∑ ∑ ∑ ∑ + + + + = = = = K
=
=
=
=
Kφ (3.66)
where
∑∑∑∑ ++++
−
−
−
−
===
+
=+ + +
=
= =
=
Nq viq q βi
i
e h
1
ξ
1
1 for a MLP (3.67)
and
2ξ 2 i
i
r
i
e
h
−C
−
−
− −
−
−
−
=
=
=
= for a RBF (3.68)
Each φ
jand Kφ
jare linear in w
jiand b
j. Hence, it is possible to find a linear least squares estimate of the output parameters (which necessarily exists-if enough independent measurements are available- is unique and independent of any initial guess):
(((( )))) i [[[[ M ]]]]
ArgMin
S
s N
k
j NN T j
k s j
k s m j
NN
S
i NN
, 2 1
ˆ 1
1 1
) 2 ( ) (
, ) (
, , )
(
) (
∈
∈ ∈
∈
−
−
−
−
=
=
=
= ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑
==
== ====
θ φ θ Y
θ
(3.69)
where
) ˆ (
,) (
, j sk
j k
s
φ t
Y = = = = or (((( )))) (
,)
) ^ (
, j sk
j k
s
Kφ t
Y = = = = (3.70)/(3.71)
[[[[
11 ]]]]
) (
, nhl
T j
k
s
= = = = h K h
φ (3.72)
[[[[
ji j]]]] [[[[
hl]]]]
T j
NN)
w b i 1 , , n
(
= = = =
K∈ ∈ ∈ ∈ K
θ (3.73)
As for the estimates of φ ˆ and ( Kφ
^) , we can obtain them thanks the following equations
−
−
−
− +
+ +
+
=
= =
=
−−−−( ) ( ) ( ) ( ) )) ˆ
(
ˆ (
, , ,^ 1 ,
, sk a sk a sk
k s a a k
s
D t t t
dt t
t d ξ ξ u
ξ K
φ (3.74)
) ( ) ( ) ) (
)) ( (
(
, , ,^ , ,
^
k s k s k s k
s k
s