Chapter 3 First principles models of bioprocesses versus Hybrid neural network models

(1)

55

Chapter 3 First principles models of bioprocesses versus Hybrid neural network models

In this chapter, we attempt to answer the following question: “Which kind of macroscopic models must we use to obtain the best possible mathematical representation of a bioprocess?”.

In chapter 2, three kinds of macroscopic models were presented: the first principles (section 2.2.3.1), the black box (section 2.2.3.2) and the hybrid models (section 2.2.3.3). All these modelling techniques have their advantages and their drawbacks. Nevertheless, chapter 3 proposes to compare only two of them. The black box models are not considered. Indeed, numerous studies already compare different neural networks (the most common black boxes) with the corresponding serial hybrid models (Psichogios and Ungar, 1992; James et al., 2002). The results of these works converge to the same conclusion: hybrid models, which combine the available prior knowledge with the neural network nonlinear mapping capabilities, provide better results in terms of extrapolation.

Hence, this chapter rejects pure neural networks without any physical meaning, in order to evaluate models which exploit the available prior knowledge, the well known following mass balances:

) ( ) ( ) ( ) ( ) , ) (

( t D t t t t

dt t

d ξ K φ ξ ξ F Q

− +

−

= ^(3.1)

(2)

However, all the hybrid architectures are not considered. Indeed, we put the parallel approach aside as it is not currently used. Serial hybrid modelling is usually preferred since it allows to avoid the difficult selection of a reaction scheme and/or a kinetic model structure. Hence, we propose to compare first principles models with hybrid neural network models in which the neural network replaces only the kinetics or the whole reaction term, as shown in the following equations

) ( ) ( ) ( ) ( ) , ) (

( t D t t t t

dt t

d ξ K NN ξ ξ F Q

− +

−

= ^(3.2)

) ( ) ( ) ( ) ( ) , ) (

( t D t t t t

dt t

d ξ NN ξ ξ F Q

− +

−

= ^(3.3)

where NN ( ξ , t ) represents the neural network.

Regarding the neural networks in these expressions, we test the MultiLayer Perceptron and the Radial Basis Function Network, which are the most widespread universal approximators. More precisely, the selected MLP is a sigmoid neural network with one sigmoid hidden layer and a linear output layer (2.39). Cybenko (1989) demonstrated its capability to approximate any nonlinear continuous function arbitrary accurately.

In order to compare efficiently the white and grey box versions, an efficient first principles model has to be considered. Hence, we select the general kinetic model structure (2.32) proposed by Bogaerts (1999), which allows to represent any activation and/or inhibition effects from each component in the culture and disposes of a systematic identification procedure based on its linearization.

Concerning the determination of a good reaction scheme, we adopt the

systematic procedure (Hulhoven et al., 2005) to generate and compare a group

of reaction schemes given a set of component measurements. It is useful when

no knowledge is available on the relations between the macroscopic species to

be considered.

(3)

This comparison, which relies on direct and cross validation results as well as the parameter estimation errors, is based on three different databases. The two first datasets are obtained thanks to a simulator which reproduces the behaviour of a simple microbial growth process. The first database contains batch cultures while the second one consists of fed-batch cultures. Tests on these kinds of databases allow to show how first principles and hybrid models manage simple case studies. On the other hand, capabilities of macroscopic models in real complex cases is tested with the third database which consists of 7 fed-batch bacterial cultures from a bioprocess used by one of our industrial partners to produce an enzyme.

This chapter begins with the description of the different systematic parameter identification methods. First, we present the systematic procedure of Hulhoven et al. (2005) for the determination of a good reaction scheme. Afterwards, we describe how to identify the kinetic parameters of the general model (2.32).

Finally, we propose systematic identification methods for the neural network parameters. After this discussion about parameter estimation (section 3.1), we present our three databases (section 3.2) before considering the comparison of the different macroscopic models (section 3.3). Finally, some conclusions are drawn in section 3.4.

3.1 Systematic parameter identification methods

This section describes the different parameter identification methods used in

this work. First, we explain the systematic procedure of Hulhoven et al. (2005)

which allows to generate C-identifiable reaction schemes, given a set of

components whose measurements are available. The comparison of the obtained

pseudo-stoichiometries leads to the best possible reaction scheme identifiable

independently of the kinetics.

(4)

Afterwards, we expose how to estimate the kinetic parameters of the model (2.32) and finally how to identify the parameters of serial hybrid models based on a MultiLayer Perceptron or a Radial Basis Function network. This section ends with remarks on the optimization algorithms used in this work to identify the parameters.

3.1.1 Systematic identification of the pseudo-stoichiometry

The systematic procedure developed in Hulhoven et al. (2005) to generate and compare C-identifiable reaction schemes relies on the decoupled identification method proposed in Bastin and Dochain (1990), which allows the pseudo-stoichiometric coefficients to be estimated independently of the kinetics.

Moreover, it uses the necessary and sufficient condition of identifiability of the pseudo-stoichiometric coefficients presented in Chen and Bastin (1996). Based on maximum likelihood estimators proposed in (Bogaerts et al., 2003; Bogaerts and Hanus, 2000), this systematic procedure estimates easily the pseudo- stoichiometric coefficients of each identifiable reaction scheme taking all the measurements errors into account. For each given number of reactions, the final values of the cost function are used to compare and rate the several candidate reaction schemes.

The next section describes the decoupling method and condition which allow

to estimate the pseudo-stoichiometry and the kinetics independently. Section

3.1.1.2 talks about systematic generation of the C-identifiable reaction schemes

while section 3.1.1.3 tackles the problem of parameter estimation.

(5)

3.1.1.1 Identification of the pseudo-stoichiometry independently of the kinetics

The decoupling method, which allows to estimate the pseudo-stoichiometry independently of the kinetics, relies on a structural property of the general dynamic model (3.1) and a state-space transformation. This section briefly exposes their principles.

If K ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ

^NxM

is of rank M ( M ≤ ≤ ≤ ≤ N )

= M ) (

rank K (3.4)

then there always exists a partition of the state vector ξ

[

^Tb

]

T a

T

ξ ξ

ξ = ^(3.5)

so that the corresponding partition of K

[

^Tb

]

T a

T

K K

K = (3.6)

involves a matrix K

_a

∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ

^MxM

of full rank

a

) = M (

rank K (3.7)

Given such a partition of K , the following matrix equation

M M N b

a

+ K = 0

− ,

CK (3.8)

has a unique solution C ∈ ℜ

⁽^N⁻^M⁾^xM

.

(6)

It is therefore possible to define an auxiliary vector z ∈ ℜ

^N⁻^M

: )

( ) ( )

( t Cξ

_a

t ξ

_b

t

z = + (3.9)

whose dynamics are independent of the kinetics φ ( ξ , t ) ^:

) ( ) ( )

( ) ) (

( D t t t t

dt t d

b

a

u

Cu z z

+ +

−

= ^(3.10)

where the corresponding partition of u = F − Q is given by

[

^Tb

]

T a

T

u u

u = ^(3.11)

An estimate C ˆ of the matrix C can be computed on the basis of the equations (3.9-10) and measurements of ξ

_a

^and ξ

b

. Finally, estimates of K

_a

and K

_b

can be deduced from C ˆ using (3.8). This last estimation will be unique if the necessary and sufficient condition of C-identifiability (Chen and Bastin, 1996) is fulfilled:

Let k

_i

be a vector containing the unknown elements in the i

^th

column of the matrix K . k

_i

is C-identifiable if and only if there exists at least one partition

MxM a

∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ

K with rank ( K

_a

) = M , which does not contain any element of k

_i

.

From this condition, a sufficient condition of C-identifiability can be deduced, which will be useful in the following:

A reaction scheme (and the associated pseudo-stoichiometry matrix) is C- identifiable if there exists a partition [

^Tb

]

T a

T

K K

K = ^where K

_a

∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ

^MxM

is of

full rank and does not contain any unknown coefficients of K .

(7)

Note that if this condition is fulfilled, then K

_a

is known and the estimate K ˆ

b

is simply deduced from (3.8) by

a

b

C K

K ˆ = − ˆ (3.12)

This condition will be further exploited in the next section in order to develop a systematic generation and evaluation procedure of C-identifiable reaction schemes.

3.1.1.2 Systematic generation of C-identifiable reaction schemes

The aim of this procedure is to generate, given a set of N macroscopic species and a number of M reactions, the C-identifiable reaction schemes associated to a diagonal submatrix K

_a

. As the number of reactions is a priori unknown, all the solutions with M ≤ N have to be tested systematically.

For a given number of reactions M , candidate reaction schemes are characterized by a pseudo-stoichiometry matrix (of rank M ) for which there exists a partition [

^Tb

]

T a

T

K K

K = where K

_a

∈ ℜ

^MxM

is of full rank and does not contain any unknown coefficient of K (according to the sufficient condition of C-identifiability).

Within the yield coefficient matrix K , each column ^j ∈ [ 1 , ^M ] contains null

coefficients (w.r.t. species not involved in the reaction j ), unknown coefficients

and one coefficient K ( i , j ) whose value is fixed to ± 1 (denoting that the j

^th

reaction is normalized w.r.t. the i

^th

component whose yield coefficient is equal

to + 1 if it is produced and − 1 if it is consumed). As the sufficient condition of

C-identifiability requires that K

_a

does not contain any unknown coefficient of

(8)

K , all the unknown coefficients belong to K

_b

. In turn, each column of K

_a

only contains null coefficients and one coefficient equal to ± 1 . Moreover, the C-identifiability condition imposes a rank condition on the matrix K

_a

( rank ( K

_a

) = M ). In order to respect this latter condition, each row of K

_a

only contains one nonzero element. Therefore, the matrix K

_a

takes the following form

{ ± 1 , ± 1 , , ± 1 }

= diag L

K

a

^(3.13)

As Hulhoven et al. (2005) precise it: From this definition of the matrix K

_a

, it appears that the number of reactions has to be smaller than the number of macroscopic species ( M < N ). Indeed, M = N implies K

_a

= K . In this particular case, the reaction scheme would correspond to N reactions involving only one macroscopic species. Such a reaction scheme is meaningless from a biological point of view.

Which elements of ξ do we have to select in order to define the partition

[

^Tb

]

T a

T

ξ ξ

ξ = inducing the partition [

^Tb

]

T a

T

K K

K = ^where K

_a

is of the form (3.13)? The idea is to test all the possible combinations of a set of M elements ( ξ

_a

∈ ℜ

^M

) from the set of N available macroscopic species. The number of possible combinations is thus given by:

)!

(

!

! M N M

N M

N

= −

 

 



 (3.14)

The total number of combinations, for all the possible numbers of reactions, is the following

2 )! 2

(

!

1

!

1 1

1

−

− =

 =





 

 ∑

∑

⁻

=

−

=

N N

M N

M

M N M

N M

N (3.15)

(9)

Once a combination is selected, the choice of the values + 1 or − 1 for the diagonal elements of K

_a

has to be made. This choice is obvious when we know a priori whether a component is always consumed (value − 1 ) or always produced (value + 1 ). Such a component is then characterized by an indicator

1 δξ

_j

= = = = − − − − or δξ

_j

= = = = + + + + 1 , respectively. When this a priori knowledge is not available, Hulhoven et al. (2005) suggest a computational procedure which deduces from measurements of ξ

_a

^and ξ

_b

, the corresponding values of D , u

_a

, u

b

, the equations (3.1) and (3.13) as well as from the inequality φ

_j

( t ) ≥ 0 ∀ j , t that

t t

u t t dt D

t d

j j

j

( ) + + + + ( ) ξ ( ) − − − − ( ) ≥ ≥ ≥ ≥ 0 ( ≤ ≤ ≤ ≤ 0 ) ∀ ∀ ∀ ∀

ξ (3.16)

) 1 ( 1 δξ = = = = + + + + − − − −

→

_j

^(3.17)

and

j a

( j , j ) = = = = δξ

K ^(3.18)

However, the time derivative in (3.16) has to be approximated and the sign of the inequality be tested statistically on the basis of the noisy measurements.

Moreover, certain components do not satisfy one of the two inequalities

(3.16): they are consumed in some reaction(s) and produced in other one(s). The

concentration of such a component cannot belong to the vector ξ

_a

as it is not

possible to reproduce simultaneous consumption and production of a component

by only one reaction. Such a component is then characterized by δξ

_j

= = = = 0 and

has to belong to the vector ξ

_b

. This reduces the total number of possible

solutions (3.15) to

(10)

∑ ∑

∑ ∑ ∑

∑

⁻⁻⁻⁻⁻⁻⁻⁻

===

=

−

−−

−

==

− − − − − − − −

−

= −

=

 =









 − − − −

⁰

0 1

1 0

0 1

1

0

)!

(

!

)!

N

(

N

M N

N

M

M N N M

N N M

N

N (3.19)

where N

₀

is the number of components which are concurrently consumed and produced.

Until now, possible partitions and their corresponding indicators δξ

_j

^{of the} component concentrations have been determined. The next step is to write the corresponding matrices K

_a

under the form (3.13). These matrices satisfying the C-identifiability condition, it is possible to estimate the corresponding matrices C and to compute, from equation (3.12), the unknown yield coefficients contained in K

_b

. A maximum likelihood estimator (Bogaerts et al., 2003), taking all the measurement errors on the different component concentrations into account, can be used to obtain the different estimates. It is presented in the following section.

For a given number of reactions, the different schemes have to be compared on the basis of the numerical values taken by the maximum likelihood cost function. The “best” scheme corresponds to the lowest value of this criterion.

This comparison is however not possible between reaction schemes with different numbers of reactions M . The selection between “best” candidate reaction schemes with different values of M has to be considered from another, more global, point of view, e.g. using the numerical values of a cost function for the identification of a complete simulation model, including pseudo- stoichiometry, kinetics and initial conditions (Bogaerts and Hanus, 2000).

Besides the sign of the diagonal elements of all the possible matrices K

_a

, it

is possible to have some a priori knowledge on the sign of the elements of K

_b

.

(11)

Let K

_b,_constr

be the matrix of sign constraints on the unknown pseudo- stoichiometric coefficients of K

_b

. The elements of K

_b,_constr

impose the following constraints on the elements of K

_b

0 ) , ( 1

) ,

,_constr

( i j = − →

_b

i j ≤

b

K

K (3.20)

0 ) , ( 1

) ,

,_constr

( i j = + →

_b

i j ≥

b

K

K ^(3.21)

→

= 0 ) ,

,_constr

( i j

K

b

no constraints on K

_b

( i , j ) ^(3.22)

Since the objective of the optimization problem is to estimate the matrix C , the constraints imposed on the elements of K

_b

have to be transposed to the elements of C . A matrix C

_constr

can simply be deduced from the following relation:

1 ,

−

=

_b_constr _a

constr

K K

C ^(3.23)

and an estimation of C can be obtained thanks to an optimization under the unilateral constraints imposed by C

_constr

.

To conclude, Hulhoven et al. (2005) summarize the procedure in the following steps:

(1) select a set of measurable macroscopic species which are relevant for system description and in agreement with the intended purpose of the model;

(2) for each component ξ

_j

^j ∈ [ 1 , ^N ] , assign a value of δξ

_j

equal to −1 if it is

always consumed, +1 if it is always produced, and 0 if both situations may

occur. This analysis results from a priori knowledge on the system or from

a simple analysis based on equations (3.16);

(12)

(3) for all the possible numbers of reactions M ∈ ∈ ∈ ∈ [[[[ 1 , N − − − − 1 − − − − N

0

]]]] , the following steps have to be repeated:

a) for a given number M , determine the number of possible combinations of M macroscopic species from a set of N − N

₀

available species characterized by δξ

_j

≠ ≠ ≠ ≠ 0 ;

b) for each of these combinations,

I. perform the corresponding permutations in the state vector ξ in order to determine the vector ξ

_a

^;

II. define the corresponding partition of the matrix K , with the matrix K

a

of the form K

a

= = = = diag {{{{ }}}} δξ

j

;

III. compute an estimate C ˆ , under the constraints defined in (3.23);

IV. compute the corresponding estimate K ˆ

_b

on the basis of (3.12);

c) classify the resulting reaction schemes according to the value of the cost function.

3.1.1.3 Maximum likelihood estimation of the pseudo- stoichiometry

To perform the step (3.b.III) of the above described systematic generation of

the pseudo-stoichiometry, a maximum likelihood estimator is employed in

Hulhoven et al. (2005); such a criterion allows to take all the measurement

errors on all the signals and at each measurement time into account.

(13)

The authors identify the elements of the matrix C under unilateral constraints (inequality constraints imposed by C

_constr

). Bogaerts et al. (2003) propose to force the sign constraints by identifying the unknown elements of K

b

after a logarithmic transformation. Hence, if k

⁺

and −k

⁻

are respectively vectors containing positive and negative elements of K

_b

, the new procedure identifies the vectors ln( k

⁺

) ^and ln( k

⁻

) , without unilateral constraints, and with respect to the sign constraints on the elements of k

⁺

and −k

⁻

.

The method proposed by Bogaerts et al. (2003) is presented here.

On the basis of (3.9) and (3.10), measurements of ξ

_a

^and ξ

b

and the knowledge of D , u

_a

and u

_b

, it is possible to estimate the matrix K

_b

independently of the kinetics. Indeed, the solution of (3.10) is given by

(((( )))) ^∫∫∫∫

 







 





 ∫∫∫∫

+ + + + + + +

+

= =

−

∫∫∫∫

−

t

d

t D d D

b

a

e d e

t

⁰ ⁰

) (

0

) (

) ( ) ( )

0 ( ) (

κ κ κ

κ

τ τ

τ

u Cu

z

z ^(3.24)

which can be substituted into (3.9) to obtain the following equation:

(((( ( ) ( ) )))) ( )

) 0 ( )

(

⁰ ⁰

) (

0

) (

t e

d e

t

_a

d

t D d D

b a

b

t

Cξ u

Cu

ξ z ∫∫∫∫ − − − −

 







 





 ∫∫∫∫

+ + + + +

+ + +

=

−

−−

∫∫∫∫ ^τ ^τ

^κ ^κ

^τ

− ^κ ^κ τ

(3.25)

or

+ ∫

−

=

−

t

d D a

b

t t e

⁰

) (

) 0 ( ) ( )

(

κ κ

z Cη

η ^(3.26)

where

^a

^t ⁼ ⁼ ⁼ ⁼

^a

^t ⁻ ⁻ ⁻ ⁻ ^e

⁻⁻⁻⁻

^∫∫∫∫

^D ^d

∫∫∫∫

^t ^a

^e ^∫∫∫∫

^D ^d

^d

t

0

) ( )

(

0

( )

) ( )

( τ τ

τ κ κ κ

κ

ξ u

η ^(3.27)

(14)

and

^b

^t ⁼ ⁼ ⁼ ⁼

^b

^t ⁻ ⁻ ⁻ ⁻ ^e

⁻⁻⁻⁻

^∫∫∫∫

^D ^d

∫∫∫∫

^t ^b

^e ^∫∫∫∫

^D ^d

^d

t

0

) ( )

(

0

( )

) ( )

( τ τ

τ κ κ κ

κ

ξ u

η ^(3.28)

The model structure (3.26), evaluated at the measurement sampling times

k

t

s_,

of an experiment s ( ^s ∈ [ ] 1 , ^S , k ∈ [ 1 , N

s

] ) can finally be rewritten in the general form:

k s st T s k

s,

θ ( θ ) φ

,

Y = (3.29)

with

M N k s s b k

s_,

= η

_,

( t

_,

) ∈ ℜ

⁻

Y (3.30)

) 1 ( 1 )

( ,

, ,

,

)

0

(

⁺⁺⁺⁺

−−−

−

ℜ ℜ ℜ ℜ

∈

∈ ∈

∈

 







 





 ∫∫∫∫

−

=

^x^M

d D k

s T

s a T

k s

k ts

e t

κ κ

η

φ ^(3.31)

and

) 1 ( ) (

) ( :)

, (

) 1 ( :)

, 1 (

) 0 ( )

(

) 0 ( )

( )

(

⁻ ⁺

−

ℜ

∈

 







 







=

^N ^M ^x^M

M N s st M N

s st

st T s

z z

θ C

C θ θ

θ M M ^(3.32)

where η

_b,_s

^, η

_a,_s

^and z

_s

are the cell culture conditions for the experiment s ^.

(i,:)

C is the i

^th

row of C , z

⁽ⁱ_s⁾

^{is the} i

^th

component of z

_s

and θ

_st

is the vector of unknown parameters.

[ ln( ) ln( )

₁

( 0 )

^TS

( 0 ) ]

T T

st

k k z z

θ =

⁺ ⁻

K (3.33)

(15)

where k

⁺

is the vector containing all the unknown coefficients of K

_b

^under positivity constraints, −k

⁻

is the vector containing the other coefficients, and

) 0

T

(

z

i

corresponds to the initial condition of z ^{for the i}

^th

experience.

The maximum likelihood estimation of θ

_st

is given by

( )

( ) (

st msk

)

T s k s m st s k s st T s k s S

s N

k

T k s m st T s k s m st

S

st

ArgMin

, , ,

, 1 ,

, ,

,

1 1

, , ,

,

) ( )

( )

(

) 2 (

ˆ 1

φ θ θ θ Y

θ θ Q

θ Q

φ θ θ θ Y

φ Y

θ

− +

−

=

−

= =

∑ ∑ (3.34)

where Y

_m_,_s_,_k

^and φ

_m_,_s_,_k

are measurements, differing from the true values Y

_s,_k

and φ

_s,_k

due to the presence of noises which are assumed to corrupt only the concentration measurements contained in ξ

_b

and ξ

_a

. These noises are assumed to be white, normally distributed with zero mean and known time varying covariances Q

_Y_,_s,_k

and Q

_φ_,_s,_k

:

k s k s k s

m,,

Y

,

ε

Y,,

Y = + (3.35)

k s k s k s

m,,

φ

,

ε

φ,,

φ = + ^(3.36)

0 ] [

_,_s_,_k

=

E ε

_Y

(3.37)

0 ] [

_,_s_,_k

=

E ε

_φ

(3.38)

l k t s k s T

l t k

E [ ε

_Y_,s_,

ε

_Y_,_,

] = Q

_Y_, _,

δ

_,

δ

_,

(3.39)

l k t s k s T

l t k

E [ ε

_φ_,s_,

ε

_φ_,_,

] = Q

_φ_,_,

δ

_,

δ

_,

^(3.40)

t s l k

E [ ε

_Y_,_s_,_k

ε

^T_φ_,_t_,_l

] = = = = 0 ∀ ∀ ∀ ∀ , , , ^(3.41)

where E represents the mathematical expectation and δ

_k,_l

is the Kronecker’s

symbol.

(16)

As developed in Bogaerts (1999), an estimation of the covariance matrix of the parameter estimation errors is given by

(((( ))))

¹

1 1

, 1

, , ,

,

) ( ˆ ) ( ˆ ) ( ˆ , ˆ )

, ˆ ( ˆ

~ ] [ ~ ˆ

−

=

= ====

−

−−

−











 + + + +

≈ ≈

≈ ≈ ∑ ∑ ∑ ∑∑

^S

∑ ∑ ∑

s N

k

k s st T s st s k s st T s k s k s st s T

st st

s

E θ θ Θ θ φ Q

_Y

θ θ Q

_φ

θ θ Θ θ φ

(3.42) where

st st

st

θ θ

θ = ˆ −

~ (3.43)

) ( dim ,

ˆ ) (:, ,

ˆ ) 1 (:,

,

) ˆ ˆ

, ˆ

( ˆ

T sk ^x^N ^M

st T M N s k

T s st T s k

s st s

st

st st st

st

−

−−

−

==

−

==

ℜ ℜ ℜ ℜ

∈ ∈

 





 





∂

∂∂

∂

∂∂

∂

∂∂

∂

∂∂

= ∂

= =

=

^θ

θ θ θ

θ

θ φ φ θ

θ φ θ

θ

Θ K (3.44)

and

( )

(

st msk

)

T s k s m

st s k s st T s k s st s k s k s m k s

, , ,

,

1 ,

, ,

, , , ,

ˆ ) (

ˆ ) ( ˆ )

( ˆ )

ˆ (

φ θ θ Y

θ θ θ Q

θ θ Q

θ φ Q

φ

_Y _φ

− + +

=

_ϕ ⁻

(3.45)

Note that the nonlinear optimization problem (3.34) only guarantees a unique solution if the measurement noises are time invariant (which means that

Y

Q

_,_s,_k

= , Q

_φ_,_s,_k

= Q

_φ

, ∀ s, k ). However, it is possible to obtain a unique first

initial guess of θ

_st

in a systematic way. It consists in considering time invariant

matrices for the measurement noise covariance or in reducing the problem to a

Markov estimate where the covariance matrix Q

_Y_,_s,_k

is diagonal and the

covariance matrix Q

_φ_,_s,_k

= 0 ^, ∀ s, k . Of course this simplification may only

serve as unique initial guess as it relies on the assumption that all the

measurements contained in φ

_s,_k

are not corrupted by noise. Although this

assumption is most of the time definitely unacceptable, this kind of error is often

made in the literature.

(17)

3.1.2 First principles kinetic parameter identification

Once a pseudo-stoichiometry is determined, a kinetic model structure must be chosen and their parameters identified. As mentioned in the introduction of this chapter, we select the model structure developed by Bogaerts and Hanus (2000) to model the kinetics:

[[[[ ^M ]]]]

j e

t t

l

t h

h j j

l lj

hj

( ) 1 ,

) ξ ,

( = = = = ^α ∏ ∏ ∏ ∏

^γ

∏ ∏ ∏ ∏

⁻^−β⁻⁻ ^ξ⁽⁾

∈ ∈ ∈ ∈

ϕ ξ ^(3.46)

Indeed, this model is general since it allows to represent the activation as well as the inhibition effects from each component in a culture. Moreover, it presents the advantage of a systematic identification procedure.

In this section, we recall this latter procedure which proceeds in two steps:

• a first linear estimation based on the mass balances (3.1), the components concentration measurements, the knowledge of the inputs (dilution rate or feeding rate), the previous obtained pseudo-stoichiometric coefficients and finally an estimation of the reaction rates from discrete measurements and an interpolation model;

• and, based on these first parameter estimates, a maximum likelihood

estimation of the kinetic parameters and the initial concentrations of the

experiments used for the identification.

(18)

3.1.2.1 Linear first estimation

The kinetic model structure (3.46) can be linearized with respect to its parameters thanks to a logarithmic transformation:

[[[[ ^M ]]]]

j t t

t

l l lj h

h hj j

j

( , ) ln ln ξ ( ) ξ ( ) 1 ,

ln ^ϕ ^ξ = = = = ^α + + + + ∑ ∑ ∑ ∑ ^γ − − − − ∑ ∑ ∑ ∑ ^β ∈ ∈ ∈ ∈ ^(3.47)

This allows to find a linear least squares estimate of the kinetic coefficients (which necessarily exists, is unique and independent of any initial guess):

( ) ^j [ ^M ]

ArgMin

S

s N

k

j cin T j

k s j

k s m j

cin

S

j cin

, 2 1

ˆ 1

1 1

) 2 ( ) (

, ) (

, , )

(

) (

∈

−

= ∑ ∑

= =

θ φ θ Y

θ

(3.48)

where

) ˆ (

ln

_,

) (

, j sk

j k

s

= = = = ϕ t

Y (3.49)

[[[[ 1 ln ξ (

_,

) ξ (

_,

) ]]]]

) (

, h sk l sk

T j

k

s

= = = =

K

t − − − −

K

t

φ ^(3.50)

[[[[

j h j l j

]]]]

T j

cin⁾

ln α γ

K

β

K

(

= = = =

θ ^(3.51)

under the positivity constraints [[[[ γ

hKj

β

lKj

]]]] ^≥ ^≥ ^≥ ^≥ ⁰ ^(3.52)

As for the estimates of the reaction rate, we can obtain φ ˆ ( t

_s_,k

) thanks to the following equation

 





 





− −

− − + + +

 +









= 

= =

= ˆ

⁻⁻⁻⁻

( ) ( ) ( ) ( ) ))

(

ˆ (

_, _, _,

^ 1 ,

, sk a sk a sk

k s a a k

s

D t t t

dt t

t d ξ ξ u

ξ K

φ ^(3.53)

(19)

where ξ

_a

^, K

a

^, u

a

are vectors defined for the decoupled identification method (section 3.1.1.1). Regarding the estimate of the time derivative

dt t d ξ

_a

(

_s_,_k

)

, it can be computed by the numerical derivation of an interpolation model (a cubic spline, for instance) of the vector ξ

a

(t ) .

These first estimates of ˆ

⁽^j⁾

θ

cin

are based on unreliable assumptions on the measurement errors and on estimates of the signal derivatives. Therefore, these estimates are just considered as an initial guess for the second step of the identification.

3.1.2.2 Markov estimation

On the basis of the identified pseudo-stoichiometry and the first estimation of the kinetics parameters, a new estimation of the kinetics with the initial concentrations of the experiments used for the identification has to be done.

The simulation model {3.1, 3.46} consists of a nonlinear differential system of the form

) );

( ), ( ) (

( x u θ

x f t t dt

t

d = ^(3.54)

where

[ ( ) ( ) ]

) ( )

( t

^T

t

^T_a

t

^T_b

t

T

ξ ξ ξ

x = = (3.55)

is the state vector containing the concentrations of the components involved in the selected reaction scheme (2.6)

[[[[ ⁽ ⁾ ⁽ ⁾ ]]]]

)

( t D t

^T

t

T

F

u = = = = (3.56)

is here the input vector containing the dilution rate and the external feed rates;

(20)

[[[[

j h j l j ^T

]]]] ^j ^[[[[ ^M ^]]]]

T

= = = = ln ξ ( 0 ) ∈ ∈ ∈ ∈ 1 ,

θ α γ

K

β

K

^(3.57)

is the vector of the parameters to be identified (logarithm of the kinetic coefficients and initial concentrations in order to ensure their positivity);

and f is the model structure corresponding to relations {3.1, 3.46}.

Let

) );

0 ( ), ( , ( )

( u ξ θ

ξ t = = = = g t t (3.58)

be the solution (generally obtained by numerical solving) of the differential system (3.54) starting from the initial concentrations ξ ( 0 ) . The sampled measurements assumed to be corrupted by a white noise ε

_y_,_s_,_k

, normally distributed with zero mean and covariance matrix Q

_s,_k

can be written as

k s y s

k s k s k s

m_,_,

g ( t

_,

, u ( t

_,

), x ( 0 ); θ ) ε

_, _,

y = = = = + + + + ^(3.59)

( t

_s_,_k

being the k

^th

sample time of the s

^th

experiment). As the noise is assumed to corrupt only the concentration measurements in ξ , the maximum likelihood estimate of θ can then be deduced from a nonlinear Markov estimator

(((( ))))

(((( ( , ( ), ( 0 ); ) ))))

) );

0 ( ), ( , 2 (

ˆ 1

, , ,

,

1 ,

1 1

, , ,

,

θ x u y

θ

s k s k s k s m

k s S

s N

k

T s

k s k s k s m

t t g

Q t

t g ArgMin

S

−

− −

−

− −

−

=

⁻⁻⁻⁻

==

== ====

∑ ∑ ∑

∑ ∑ ∑ ∑ ∑ _(3.60)

The initial guess of θ consists, on the one hand, of the first estimate of the

kinetic parameters deduced from the previous linear estimation step and, on the

other hand, of the measurements of ξ at the initial time.

(21)

The covariance matrix of the parameter estimation errors can also be estimated in this last step (Bogaerts, 1999):

1

1 1

, , 1 , ,

,

, ( ), ( 0 ); ˆ ) ( , ( ), ( 0 ); ˆ )

(

~ ] [ ~ ˆ

−

−−

−

=

= ====

−











≈ 

≈

≈ ≈ ∑ ∑∑ ∑ ∑

^S

∑ ∑ ∑

s N

k

s k s k s T k s s

k s k s T

s

t t t

t

E θ θ G

_θ

u x θ Q G

_θ

u x θ ^(3.61)

where

θ

θ x θ u

x u G

ˆ ,

, ,

,

) );

0 ( ), ( , ) (

); ˆ 0 ( ), ( , (

∂

=

= ∂

^s^k ^s^k ^s

s k s k s

t t t g

θ

t (3.62)

This Jacobian is obtained by solving (together with the simulation model (3.54)) the sensitivity equations

θ θ u θ x

x u x G

θ u θ x

x u

G ∂

+ ∂

∂

= ∂

∂

∂ ( , ( ); )

) );

0 ( ), ( , ) ( );

( , ) ( );

0 ( ), ( ,

( f t

t t t

t f

t

^θ

t

^θ

(3.63)

with the initial condition

0

) 0 ) ( );

0 ( ), 0 ( , 0

(

_θ

θ

G

θ θ x x u

G =

∂

= ∂ ^(3.64)

where G

_θ₀

is a matrix whose elements are all equal to zero except the ones corresponding to the partial derivative of the elements of ξ ( 0 ) ∈ x ( 0 ) with respect to their homologous ln ξ ( 0 ) ∈ ∈ ∈ ∈ θ , these partial derivatives being equal to

) 0 ( ξ .

The Jacobian G

_θ

( t

_s_,_k

, u ( t

_s_,_k

), x

_s

( 0 ); θ ˆ ) involved in relation (3.61) is thus

obtained by evaluating the numerical solution G

_θ

( t , u ( t ), x ( 0 ); θ ) of the system

{(3.54), (3.63)} for t = t

_s_,_k

and θ = θ ˆ .

(22)

At the end of this identification step, all the parameters have been identified:

the pseudo-stoichiometric coefficients (3.34) and the kinetic parameters (3.60).

Hence, the model is completely determined but has of course to be validated (cross validation, study of the correlation matrix of the parametric errors, etc.).

3.1.2.3 Covariance matrix and parameter reduction

The study of the covariance matrix (3.61) can help reducing the number of kinetic parameters. Indeed, hardly assessable coefficients can be suppressed after checking that their information is covered by other parameters. In concrete terms, coefficients with high variance and a sufficient correlation with other parameters can be cancelled out. This cancellation reduces the number of parameters, and in turn the effect of a component (e.g. activation or inhibition).

Developed thanks to the trial-and-error method, the following parameter reduction procedure is therefore proposed:

1. First, the parameters with a variance and a covariance respectively larger than 10

³

^and 10

⁶

are cancelled out.

2. Then, the remaining parameters are re-estimated and the parameters with a variance and a covariance respectively larger than 100 ^and 50 are cancelled out.

3. The same operation is repeated with variance and covariance levels of 4 and 2 , respectively.

4. Finally, a last round is achieved with variance and covariance levels of 5

.

1 ^and 1 , respectively.

Note that the selected thresholds are applied to every parameter, irrespective

of its dimension. Indeed, the parameter positivity is ensured through a

(23)

logarithmic transformation (3.57). The elements of (3.61) can therefore be regarded as relative errors.

Concerning the numerous reduction steps, they are necessary to efficiently isolate hardly assessable parameters. Indeed, the inaccurate estimation of one unimportant parameter can have a significant influence on the estimation of an essential parameter, i.e., this influence can lead to significant variance and correlation of this last coefficient. In order to avoid such a problem, we propose a step-by-step reduction procedure.

However, the reduction procedure has to be refined: kinetic constants and activation coefficients of reactants cannot be suppressed. Only inhibition coefficients and activation parameters of products can be cancelled out.

3.1.3 Neural Network parameter identification

In order to identify systematically neural network parameters, an estimation procedure which is inspired by the RBF estimation technique of Vande Wouwer et al. (2004) has been developed. It proceeds in four steps whatever the kind of hybrid models (3.2 or 3.3) or the considered neural network.

1. A first estimation of the hidden parameters is carried out:

a. for a MultiLayer Perceptron, we assign random values to the weights and biases of the sigmoid hidden layer. There exists no systematic method to determine a first estimation of these coefficients.

b. for a Radial Basis Function network, the centres of the Gaussians are

determined in a unsupervised learning phase in order to cover the

dataspace at best. A clustering algorithm, as k-means in MatLab

^©

,

classifies the data according to their similarities, organizes them into

groups and computes the centres of each group. Then, the widths are

(24)

chosen as the mean distances between the different centres (Suykens et al. 1996).

2. On the basis of these values of the hidden parameters, a least squares estimator allows to determine the output parameters. This linear estimation is based on component concentration measurements, the knowledge of the inputs, the pseudo-stoichiometric coefficients (if required) and finally an estimation of the reaction rates (or the reaction term) from discrete measurements and an interpolation model.

3. Starting from the previous parameter values , a nonlinear identification step gives a new estimation. This time, the identification is based on the simulation of the complete hybrid model and an estimator using a slightly modified Markov criterion, the regularization criterion (2.75) which takes the measurements errors into account and improves the generalization of the considered neural network (its ability to respond satisfactorily to unknown inputs).

4. A second nonlinear identification step re-estimates the neural parameters and initial conditions of the various cultures used for the identification. This last step uses the same estimator as in the third step and can be achieved using the simplex search method.

Since the first estimations of the hidden parameters are carried out, in this

work, thanks to Matlab

^©

functions (randn.m or kmeans.m), we propose to begin

this section with the linear estimation of the output parameters before presenting

nonlinear identification steps. We conclude this section with some remarks

about the optimal size of a neural network and other comments about the

identification.

(25)

3.1.3.1 Linear identification of the output parameters

In the considered hybrid neural network models (3.2) and (3.3), the neural expressions of the kinetics and the reaction term are the following

M j

b h w t

nhl

i

j i ji k

s

j

( ) 1 , ,

1

,

= = = = ∑ ∑ ∑ ∑ + + + + = = = = K

==

φ ^(3.65)

and (((( )))) t w h b j M

nhl

i

j i ji k

j

(

s

) 1 , ,

1

,

= = = = ∑ ∑ ∑ ∑ + + + + = = = = K

=

Kφ ^(3.66)

where











∑∑∑∑ ++++

−

===

+

=

+ + +

=

= =

=

N

q viq q βi

i

e h

1

ξ

1 1 for a MLP (3.67)

and

²

ξ 2 i

i

r

i

e

h

−C

−

− −

−

=

= for a RBF (3.68)

Each φ

_j

^and Kφ

_j

are linear in w

_ji

^and b

_j

. Hence, it is possible to find a linear least squares estimate of the output parameters (which necessarily exists-if enough independent measurements are available- is unique and independent of any initial guess):

(((( )))) ⁱ [[[[ ^M ]]]]

ArgMin

S

s N

k

j NN T j

k s j

k s m j

NN

S

i NN

, 2 1

ˆ 1

1 1

) 2 ( ) (

, ) (

, , )

(

) (

∈

∈ ∈

∈

−

=

= ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑

==

== ====

θ φ θ Y

θ

(3.69)

where

) ˆ (

_,

) (

, j sk

j k

s

φ t

Y = = = = ^or (((( )))) (

_,

)

) ^ (

, j sk

j k

s

Kφ t

Y = = = = (3.70)/(3.71)

[[[[

1

¹ ]]]]

) (

, n_hl

T j

k

s

= = = = h K h

φ ^(3.72)

[[[[

ji j

]]]] [[[[

hl

]]]]

T j

NN⁾

w b i 1 , , n

(

= = = =

_K

∈ ∈ ∈ ∈ K

θ ^(3.73)

(26)

As for the estimates of φ ˆ and ( Kφ

^{^}

) , we can obtain them thanks the following equations

 





 





−

− +

+ +

 +









= 

= =

=

⁻⁻⁻⁻

( ) ( ) ( ) ( ) )) ˆ

(

ˆ (

_, _, _,

^ 1 ,

, sk a sk a sk

k s a a k

s

D t t t

dt t

t d ξ ξ u

ξ K

φ ^(3.74)

) ( ) ( ) ) (

)) ( (

(

_, _, _,

^ , ,

^

k s k s k s k

s k

s

D t t t

dt t

t d ξ ξ u

ξ

Kφ  + + + + − − − −









= 

= =

= ^(3.75)

where ξ

_a

^, K

_a

^, u

_a

are the partitions of ξ ^, K , u defined for the decoupled identification method (section 3.1.1.1). The estimates of the time derivatives

dt t d ξ

_a

(

_s_,_k

)

and dt

t d ξ (

_s_,_k

)

can be computed by the numerical derivation of an interpolation model (a cubic spline, for instance) of the vectors ξ

_a

(t ) ^and ξ (t ) ^.

The estimates of the output parameters ˆ

⁽^j⁾

θ

NN

are based on unreliable assumptions on the measurement errors, on estimates of the signal derivatives and on initial values of the hidden parameters. Therefore, these estimates are just considered as an initial guess for the third step of the identification.

3.1.3.2 Nonlinear estimations

On the basis of the first estimation of the parameters obtained in section

3.1.3.1, a new estimation of the whole set of parameters must be done. It is

based on the complete model and the regularization estimator exposed in

(section 2.3.5.2). Afterwards, a last estimation determines the whole set with the

initial concentrations of the experiments used for the identification.

Chapter 3 First principles models of bioprocesses versus Hybrid neural network models

55

Chapter 3

First principles models of bioprocesses versus Hybrid neural network models

In this chapter, we attempt to answer the following question: “Which kind of macroscopic models must we use to obtain the best possible mathematical representation of a bioprocess?”.

Hence, this chapter rejects pure neural networks without any physical meaning, in order to evaluate models which exploit the available prior knowledge, the well known following mass balances:

) ( ) ( ) ( ) ( ) , ) (

( t D t t t t

dt t

d ξ K φ ξ ξ F Q

− +

−

= (3.1)

) ( ) ( ) ( ) ( ) , ) (

( t D t t t t

dt t

d ξ K NN ξ ξ F Q

− +

−

= (3.2)

) ( ) ( ) ( ) ( ) , ) (

( t D t t t t

dt t

d ξ NN ξ ξ F Q

− +

−

= (3.3)

where NN ( ξ , t ) represents the neural network.

Concerning the determination of a good reaction scheme, we adopt the

systematic procedure (Hulhoven et al., 2005) to generate and compare a group

of reaction schemes given a set of component measurements. It is useful when

no knowledge is available on the relations between the macroscopic species to

be considered.

3.1 Systematic parameter identification methods

This section describes the different parameter identification methods used in

this work. First, we explain the systematic procedure of Hulhoven et al. (2005)

which allows to generate C-identifiable reaction schemes, given a set of

components whose measurements are available. The comparison of the obtained

pseudo-stoichiometries leads to the best possible reaction scheme identifiable

independently of the kinetics.

3.1.1 Systematic identification of the pseudo-stoichiometry

The next section describes the decoupling method and condition which allow

to estimate the pseudo-stoichiometry and the kinetics independently. Section

3.1.1.2 talks about systematic generation of the C-identifiable reaction schemes

while section 3.1.1.3 tackles the problem of parameter estimation.

3.1.1.1 Identification of the pseudo-stoichiometry independently of the kinetics

The decoupling method, which allows to estimate the pseudo-stoichiometry independently of the kinetics, relies on a structural property of the general dynamic model (3.1) and a state-space transformation. This section briefly exposes their principles.

If K ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ

is of rank M ( M ≤ ≤ ≤ ≤ N )

= M ) (

rank K (3.4)

then there always exists a partition of the state vector ξ

[

]

ξ ξ

ξ = (3.5)

so that the corresponding partition of K

[

]

K K

K = (3.6)

involves a matrix K

∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ

of full rank

) = M (

rank K (3.7)

Given such a partition of K , the following matrix equation

+ K = 0

CK (3.8)

has a unique solution C ∈ ℜ

.

It is therefore possible to define an auxiliary vector z ∈ ℜ

: )

( ) ( )

( t Cξ

t ξ

t

z = + (3.9)

whose dynamics are independent of the kinetics φ ( ξ , t ) :

) ( ) ( )

= ^(3.1)

= ^(3.2)

= ^(3.3)

ξ = ^(3.5)

whose dynamics are independent of the kinetics φ ( ξ , t ) ^:

= ^(3.10)

u = ^(3.11)

^and ξ

K = ^where K

Within the yield coefficient matrix K , each column ^j ∈ [ 1 , ^M ] contains null