7
Chapter 2
Bioprocesses and their modelling
This chapter proposes a general discussion about bioprocess modelling. First, we remind what a bioprocess, a cell culture and a bioreactor are (section 2.1).
Afterwards, general characteristics of mathematical models (section 2.2.1) and bioprocess modelling (section 2.2.2) are presented. Then, we describe the most widespread bioprocess models (section 2.2.3) before tackling parameter estimation in section 2.3. Finally, section 2.4 proposes some conclusions.
2.1 Bioprocesses
According to Zaid et al. (1999), “a bioprocess may be defined as any process that uses complete living cells or their components (e.g., enzymes, chloroplasts) to effect desired physical or chemical changes”. In other words, a bioprocess consists of a cell culture in a bioreactor, which is a process able to create an optimal growth environment. All these concepts are explained in the next sections. First, we remind the most important characteristics of a living cell and the possibility of growing dissociated cells outside their natural environment.
Finally, we describe a typical bioreactor.
2.1.1 Cells and cell cultures
The central object of a bioprocess is the cell. A living cell is a highly complex system which is often defined as the smallest autonomous biological unit. Its main tasks are to maintain itself alive, to reproduce and to manage itself.
So, it is able to build its own constituents and to provide its own energy through physical and chemical processes which constitute the cell metabolism. This latter consists of a network of thousands interconnecting reactions, the metabolic pathways, which are catalysed by enzymes and accurately controlled by regulation processes.
The autonomy properties of cells allow to think of growing dissociated cells outside their natural environment. Such cell cultures find many applications in biotechnology industry. Indeed, bacteria, yeast or animal cell cultures allow the synthesis of numerous products of interest for food or pharmaceutical sectors:
vaccines, antibiotics, antibodies, wine, beer, industrial alcohols, yeast or enzymes for food technology. Moreover, some intervene in waste treatment and pollution control (Moo-Young, 1985; Bailey and Ollis, 1986).
However, these applications require the use of a bioreactor. Indeed, such a system favours cell growth by creating a good environment. It monitors and controls the cell environmental conditions like gas flow rates, temperature, pH, dissolved oxygen level and agitation rate. This is developed in the following section.
2.1.2 Bioreactor
The most common bioreactor used in industry is the classical stirred tank
reactor as the one described in figure 2.1. The central part of the reactor is the
tank containing the growing cells in their culture medium. All the peripherical
devices are used to control and monitor the cellular growth and production.
Figure 2.1: Schematic description of a perfectly stirred bioreactor (Hulhoven, 2006).
A typical bioreactor involves the following control processes: pH is supervised through a pH probe and controlled by addition of acid or base into the reactor, temperature is monitored by a thermocouple and controlled thanks to a heat exchanger, dissolved oxygen is observed by a probe and controlled by agitation rate and/or air flow and /or gas composition.
Besides these environmental considerations, culture strategies may be used to control the nutrient availability. In a batch process, all the culture medium is directly available to the cell and no medium is added or withdrawn during the culture. A fed-batch process is characterized by an addition of culture medium during the culture thanks to a predefined or a controlled flow rate. In a continuous culture mode, fresh culture medium is added while the culture is continuously withdrawn. Finally, an alternative to the continuous culture mode
Sampling valve
Fout
ξ
inFin
Exhaust gas
Air
O
2T°
pH Heat exchanger Qout
Qin
ξ
iV
Acid
Base
is the perfusion mode where culture medium is added and withdrawn whereas the cells are maintained in the bioreactor.
2.2 Bioprocess modelling
In order to improve process understanding or performance, different automatic tools can be developed: simulators able to reproduce system behaviours, software sensors which allow to obtain an estimation of an unmeasured signal or controllers to maintain optimal conditions. All these tools rely on a representation of the considered system, a mathematical model. Such a model may come in various shapes and be phrased with varying degrees of mathematical formalism. The intended use determines the degree of sophistication that is required to make the model purposeful (Ljung, 1987).
Different kinds of bioprocess models are distinguishable according to their possible biological interpretation or their level of complexity. However, before describing the most widespread bioprocess models, let us remind some general characteristics of a dynamic model.
2.2.1 General characteristics of models
Before considering bioprocess models, it is important to describe some very general characteristics of mathematical models. On this purpose, let us present a general dynamic system characterized by:
• its state evolution
)) ( );
( ), ( ) (
( f t t t
dt t
d x x u θ
= (2.1)
where
• x ( t ) ∈ ℜ
nis the state vector of the system;
• u ( t ) ∈ ℜ
lis the input vector, which contains the variables exciting the system (action or perturbations variables);
• θ ( t ) ∈ ℜ
pis the vector of model parameters ;
• t is the time.
• Its output values: y ( t ) = h ( x ( t )) (2.2)
where y ( t ) ∈ ℜ
mis the output vector.
A model of such a system may present some characteristics; the main ones are reminded here. A model may be:
• Linear or nonlinear
A linear model has to satisfy the superposition principle, i.e. any linear combination of model inputs (and state initial conditions) corresponds to the same linear combination of the states or the outputs. In brief, if the system functions ( f and h ) can be represented entirely by linear equations, then the model is known as a linear model. If one or more of these functions are represented with a nonlinear equation, then the model is a nonlinear model.
Actually, for more advanced applications (especially bioprocesses), many models are nonlinear as it will be illustrated later.
• Static or dynamic
A static model does not take the element of time into account unlike a
dynamic model. Dynamic models are typically represented by mathematical
expressions like differential equations in order to describe the dynamic
evolution of state variables (like cell growth, substrate consumption or product formation).
• Lumped or distributed parameters
The parameters of a model are lumped when the model is homogeneous: the time is the unique independent variable. When some state varies within the system, the time is no more the unique independent variable, the model is heterogeneous and the parameters are distributed. The space coordinates are in this case additional independent variables. Since a dynamic lumped parameter model is characterized by differential equations, distributed parameter models are typically represented by partial differential equations.
In bioprocess, lumped parameters are often preferred because they assume a perfectly stirred bioreactor. However, such a hypothesis is not valid for a tubular reactor as highlighted in Häfele et al. (2006).
• Continuous or discrete
Many models are continuous; their independent variables are considered to be defined for any real values of time. However, measured data are usually obtained through discrete sampling which implies measurements at discrete time instants
)) ( ( )
( t
kh x t
ky = (2.3)
where t
kk ∈ ∈ ∈ ∈ N are measurement times.
• Deterministic or stochastic
Unlike deterministic models, stochastic models present randomness. This uncertainty is usually introduced as the measurement noise. The discrete measurements (2.3) can thus be described by
) ( )) ( ( )
( t
k= = = = h x t
k+ + + + ε t
ky (2.4)
where ε ( t
k) are the random measurement noises.
However, it is also possible to introduce noise in the state model considering possible model errors
) ( )) ( );
( ), ( ) (
( f t t t t
dt t d
ε
x+ + + +
=
=
=
= x u θ
x (2.5)
where ε
x( t
k) are random noises.
• White or black box
Black boxes are models without physical interpretation. They are usually used when no a priori information about the system is available. White boxes involve a model structure that allows physical interpretations. An example of black box models used in bioengineering is the Artificial Neural Networks (Montague and Morris, 1994; Meltser et al., 1996; Chen et al., 2004b; Ferentinos, 2005). As for white boxes, they usually rely on mass balances involving substrate inputs, accumulation and dilution terms as well as kinetics described by activation, inhibition and saturation coefficients.
Between these two extremes, there exist grey boxes which combine the principles of white and black box models. An example of such a kind of model are the hybrid models which replace some model elements by neural networks while the others are knowledge based elements. Some of these structures are used in bioprocess modelling (e.g. (Psichogios and Ungar, 1992), (de Azevedo et al., 1997), (Chen et al., 2000), (Chen et al., 2004a), (Oliveira, 2004), (Vande Wouwer et al., 2004)), state observation (e.g.
(James et al., 2002), (Choi and Park, 2001), (Glassey et al. 1997)) and
control (e.g. (Lübbert and Simutis, 1994), (Schubert et al., 1994a), (Simutis
et al., 1997), (Costa et al., 1998), (Tian et al., 2002), (Komives and Parker,
2003), (Karakuzu et al., 2006)). They will be studied in section (2.2.3).
All these model characteristics are very general. Now we propose to classify in more details the way to consider the bioprocess particularity, the presence of living cells in the system.
2.2.2 Characteristics of bioprocess models
The classification presented here was proposed in Tsuchiya et al. (1966).
However, it is still often used in the literature (in Mu et al. (2005), for instance).
The authors introduce two new characteristics for bioprocess models. First, a model can be structured or unstructured depending on whether it describes intracellular characteristics of the cell (metabolic reactions, cellular processes etc.) or considers the cell like an entity without internal structure. Second, a model can be segregated or unsegregated depending on whether it considers or not the heterogeneity of the cellular population, the position in the cell cycle for example.
The choice among these properties depends on the objective of the model.
Hence, structured models (e.g. (Votruba et al., 1985), (Dantigny, 1995), (Montesinos et al., 1997), (Lei et al., 2001)) are used to describe in more details the intrinsic complexity of the system. As a consequence, they usually present a high number of equations with many state variables and parameters to be measured and identified. For this reason, they are particularly difficult to use in process tools like software sensors or controllers. However, they are very useful for process understanding. On the other hand, unstructured models consider living cells regardless their intracellular subprocesses. While they focus on the process behaviour, they usually involve only the most significant signals known as macroscopic species (e.g. substrates, biomass, and products of interest).
Hence, only a few states are considered which makes the model easier to
identify and to use in controllers or software sensors. So structured versus
unstructured modelling is a permanent discussion. However, there is a strong
link between these two modelling principles. Indeed, different studies (Provost
and Bastin, 2004; Haag et al., 2005) have highlighted the possibility to deduce a simple model from a structured one thanks to model reduction. This validates the macroscopic approach.
Regarding the segregated concept, it characterizes the cellular population. In a segregated model, the heterogeneity of the population is taken into account;
for instance, such a model may take the cell cycle in consideration or distinguish different cell states (de Andrès-Toro et al., 1998; Uchiyama and Shioya, 1999).
As for unsegregated models, they suppose a culture medium with a homogeneous cellular population. Again segregated models require a higher degree of complexity but are more realistic than unsegregated ones which are more focused on model simplicity.
2.2.3 Bioprocess models
In this section, we propose to present different general approaches of bioprocess modelling. However, we focus our attention on unstructured and unsegregated models. Indeed, such models are more often used for engineering tools, for control and state observation; they allow to limit the number of state variables and parameters, they avoid numerous expensive sensors and identifiability problems. Among the existing macroscopic models, three kinds are distinguishable (Shioya et al., 1999): the first principles models based on the available prior knowledge (white box), the black box models without biological interpretation and finally the hybrid models which are grey-boxes because they combine a first principles model with a black box.
Let us describe the mathematical formalisms of the above mentioned model
classes.
2.2.3.1 First principles models
A very general approach to describe the dynamics of a bioprocess has been proposed in Bastin and Dochain (1990). It consists of the system of mass balances for the macroscopic species involved in a reaction scheme. Such a reaction scheme is a set of irreversible reactions that describe the main phenomena occurring in the culture. Its expression is the following:
[[[[ ,M ]]]]
k ν
ν
k k
k j P
j j,k R
i
i
i,k
) ξ ξ 1
( − − − − → → → → ∑ ∑ ∑ ∑ ∈ ∈ ∈ ∈
∑
∑
∑
∑
∈
∈
∈
∈
∈
∈∈
∈
ϕ
(2.6)
where
• M is the number of reactions;
• N the number of components (biomass, substrates, enzymes or products of interest);
• ϕ
kis the k
threaction rate;
• ξ
ithe i
thcomponent ;
• ν
i,k, ν
j,kthe corresponding pseudo-stoichiometric (or yield) coefficients (positive when associated to a component which is produced, negative when it is consumed);
• R
kthe k
thset of reactant and catalyst indices;
• and P
kthe k
thset of product and (auto-) catalyst indices.
Such a reaction scheme is analogous but not equivalent to those commonly
used in classical chemical engineering. Indeed, it does not represent a
stoichiometric relationship between the components. It simply represents a
qualitative relation, called pseudo-stoichiometry. Hence, it is never an
exhaustive description of the process. For instance, non limiting substrates are often omitted as well as reaction by-products, which are not involved in any other reactions or which do not present any interest to the user. In consequence, a reaction scheme may be inconsistent with the law of mass conservation.
The general dynamic model of bioprocess proposed by Bastin and Dochain (1990) relies on the reaction scheme (2.6) and the assumption of a perfectly stirred bioreactor (figure 2.1). It consists of the system of mass balances for each of the N components ξ
i:
(((( )))) = = = = ∑ ∑ ∑ ∑ + + + + − − − − + + + + − − − −
j~i
out i in i i out in
i in N
j i,j
i
ν ,t V t F t t F t t Q t Q t
dt t V t
d ξ ( ) ( ) (ξ ) ( ) ( )ξ ( ) ( )ξ ( ) ( ) ( )
1K
ϕ
(2.7) where
• F
in∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ is the influent flow rate;
• F
out∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ is the effluent flow rate;
• V is the culture volume;
• ξ
ini∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ is the concentration of the component ξ
iin the feeding solution;
• Q
iin(t ) and Q
iout(t ) are input and output gaseous outflow rates;
• the notation j ~ i means a summation only taken on the reactions j which involve the component i .
If we introduce the following matrix notations:
[[[[ ξ
1ξ
2K ξ
N]]]]
=
= =
T
=
ξ (2.8)
[[[[ ϕ ϕ K ϕ
M]]]]
2
=
1=
=
T
=
φ (2.9)
[[[[
inN]]]]
in in
in
Q Q K Q
2
=
1=
=
T
=
Q (2.10)
[[[[
Nout]]]]
out out
out
Q Q K Q
2
=
1= =
T
=
Q (2.11)
[[[[ ]]]] ν
i,j=
= =
=
K (2.12)
the dynamic of the state in terms of quantity can be written in a matricial form:
(((( ))))
) ( ) ( ) ) ( ) ( ) ( ) ( ) , ) (
( )
( t V t F t t F t t t t
dt t V t
d
in in out in outQ Q
ξ(
ξ ξ
ξ Kφ
−
− −
− +
+ + +
−
− −
− +
+ + +
=
= =
= (2.13)
since the volume variation of the culture broth is given by
) ( ) ) (
( F t F t
dt t
dV = = = =
in− − − −
out(2.14)
If we combine the equations (2.13) and (2.14) and develop the derivative of )
( ) ( t V t
ξ , we obtain
) (
) ( )
( ) ) (
) ) ( ) ( ) ( (
) ) (
, ) (
(
t V
t t
V t t t t V t F t V
t t F dt
t
d
in in inQ
inQ
outξ(
ξ ξ
ξ Kφ
−
−
−
− +
+ + +
−
− −
− +
+ + +
=
= =
= (2.15)
or ( ) ( , t ) D ( t ) ( ( t ) t ) ) ( t ) ( t )
dt t
d
in in outq ξ( q
ξ ξ
ξ Kφ
− +
− +
= (2.16)
where
) (
) ) (
( V t
t t F
D
in
=
= =
= is the dilution rate; (2.17)
) (
) ) (
( V t
t t
in
in
Q
q = the gas-liquid transfer rate; (2.18)
) (
) ) (
( V t
t t
out
out
Q
q = the liquid-gas transfer rate. (2.19)
If we define a vector of external (liquid and gaseous) feed rates )
( ) ( ) ( )
( t D t ξ
int q
int
F = + (2.20)
the system of mass balances (2.16) can be written as follows:
) ( ) ( ) ( ) ( ) , ) (
( t D t t t t
dt t
d ξ K φ ξ ξ F Q
−
−
−
− + + + +
−
−
−
−
=
=
=
= (2.21)
where Q ( t ) = q
out( t ) (2.22)
The left member of this general dynamic model (2.21) involves two main sets of terms. A first one is the reaction term ( K φ ( ξ , t ) ) and a second one concerns transport phenomena in the bioprocess ( − D ( t ) ξ ( t ) + F ( t ) − Q ( t ) ).Let us describe these model elements.
2.2.3.1.1 Transport phenomena
The transport phenomena set of terms is divided into three parts: the dilution term, the external feed rates and the gaseous outflow rates.
Dilution term
Regarding the dilution term, the dilution rate D corresponds to F
inV where F
inis the flow rate of the incoming liquid phase. According to the considered culture strategy, the dilution rate will vary.
For a batch process, no liquid substrate is added during the culture, all the substrates are present from the beginning of the culture in a sufficient amount in order to ensure cell growth:
= 0
=
=
=
=
=
=
=
outin
F
F (2.23)
= 0
D (2.24)
For a fed-batch process, one or several substrates are introduced during the cell culture whereas nothing is removed:
= 0
=
= =
F
out(2.25)
V D F
in
=
=
=
= (2.26)
Finally, in continuous mode, the feed rate is equal to the effluent rate:
out
in
F
F = = = = (2.27)
V D F
in
= =
= = (2.28)
External feed rates and gaseous outflow rates
As for the external feed rates F (t ) , they correspond to D ( t ) ξ
in( t ) + q
in( t ) . In this expression, the first term is related to the introduction of feeding solution in fed-batch or continuous cultures, where the feeding concentration of the added medium is given by ξ
in. Regarding the second term, it represents the gas inputs.
As the gas outputs, input gas transport involves gas transfers. These transport terms are difficult to describe and very sensitive to the process conditions such as mixing, culture medium composition, air flow etc. These terms are usually described as follows:
) ( )
( ξ
*ξ
q t = k
La − (2.29)
where k
Lis the liquid phase mass transfer coefficient, a is the gas-liquid
interfacial area per unit volume of liquid and ξ
*is the saturation concentration
in the liquid medium.
In this simple equation (2.29), k
La and ξ
*translate the strong dependence on the culture conditions. Indeed, ξ
*may vary according to medium composition, temperature or pressure, while k
La strongly depends on the agitation, air flow, the characteristic size of the reactor (impeller geometry), the salinity or the presence of antifoam agents. Values of ξ
*are available in the literature for a few kinds of solutions (van Stroe-Bienen et al., 1993; Eya et al., 1994; Mishima et al., 1996; Mishima et al., 1997) while models are developed to reproduce the k
La behaviour (Vanlaethem et al., 2001; Maier et al., 2004).
2.2.3.1.2 Yield coefficient matrix
The main challenge in first principles modelling lies in the reaction term )
, ( ξ t φ
K , in the selection of an appropriate pseudo-stoichiometry and a good kinetic model structure.
The choice of the reaction scheme is an important prerequisite, which usually
rests on physical knowledge of the bioprocess. However, all the information
available about the system must be adequately summarized in order to represent
the main phenomena in a sufficiently detailed manner to reproduce the culture
behaviour, but at the same time, with enough simplicity to respect the
characteristics of a macroscopic reaction scheme. In this connection, a few
methods have been developed to deduce macroscopic reaction schemes from
complex networks of intracellular reaction pathways thanks to model reduction
procedures (Provost and Bastin, 2004; Haag et al, 2005). Unfortunately, when
new strains are considered in the industry, development times are often so short
that little knowledge about metabolic pathways is available. The application of
the above mentioned methods is thus not possible. To solve this problem, a
second category of methods is developed. It tries to directly determine a
macroscopic reaction scheme linking the substrates to the products of the
reactions. For instance, Hulhoven et al. (2005) propose a systematic procedure
to generate a group of reaction schemes, given a set of components whose measurements are available. This method relies on the decoupled identification method proposed by Bastin and Dochain (1990), which allows the pseudo- stoichiometric coefficients to be estimated independently of the kinetics (C- identifiability property, Chen and Bastin, 1996). This procedure is particularly useful when no knowledge is available on the relations between the macroscopic species to consider. However, this systematic procedure is not alone in the field of bioprocess modelling; Bernard and Bastin (2005) have also developed a method to determine the number of reactions that must be taken into account to reproduce the available data set, what are the most plausible reaction schemes and the corresponding values of the pseudo-stoichiometric coefficients.
2.2.3.1.3 Kinetics
Once a pseudo-stoichiometry is determined, a kinetic model structure has to be chosen. Such reaction rates vary with time and are usually influenced by many physicochemical and biological environmental factors like substrate, biomass and product concentrations as well as pH, temperature, dissolved oxygen concentration or various microbial growth inhibitors. Hence, the reaction rates are commonly expressed by multiplication of individual terms, each referring to one of the influencing factors:
) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ,
( ξ t = = = = µ t X t = = = = µ
XX µ
SS µ
PP µ
O2O
2µ
pHpH µ
TT µ
II X t
φ (2.30)
where µ (t ) is the specific growth rate, X ,S, P, O
2are the concentrations in biomass, substrates, products and dissolved oxygen, T and I are the temperature and the inhibitor concentration.
The most widespread kinetic models used for the different terms of equation
(2.30) are the Monod-type laws which are rational expressions similar to the
Monod law described in Table 2.1 (Birol et al., 1998; Cruz et al., 1999; Tsuneda
et al., 2002; Ghaly et al., 2005). These kinetic structures are able to reproduce different behaviours like component limitations ( lim ( ) 0
0
= = = =
→
→
→
→ i
i
ξ
ξ
µ ), saturations ( lim ( ) = = = = constant
∞∞
∞∞
→→→
→ i
i
ξ
ξ
µ ), inhibitions ( lim ( ) = = = = 0
∞∞
∞∞
→→
→→ i
i
ξ
ξ
µ ) as well as influences by environmental factors (Bastin and Dochain, 1990). However, numerous of them are able to reproduce the same behaviours. For instance, Monod or Ming laws (Table 2.1) are able to model limitation by a substrate while Haldane or (Jerusaliwski and Engambervediev) laws model inhibition by a component.
Moreover, other kinetic structures are able to represent the same behaviours like the Tessier law which models the limitation by a component whereas the law of Aiba et al. models inhibitions. This profusion of similar laws makes arduous the
Structure
Monod
S K S S
M
+
=
*)
( µ
µ
Tessier ( ) ( 1 )
) (
* KM
t S
e S
−
−
= µ µ
Ming
22
)
*( K S
S S
M
+ + + +
=
= =
= µ µ
Aiba et al µ ( P ) = µ
*e
−KiP(t)Haldane
I
M
K
t t S S K
t S S
) ) (
( ) ) (
(
2*
+ +
= µ
µ
Jerusaliwski and
Engambervediev K P
P K
I I
=
*+ )
( µ
µ
Table 2.1: Examples of specific growth rates
choice of an appropriate kinetic model structure. There exists no systematic rule in order to determine the best model. Moreover, the identification of their biological parameters can be time-consuming. Indeed, these structures are nonlinear and, in most cases, non linearizable. Hence, there is no systematic method to determine a unique first estimation of the different parameters, which would make the identification easier.
In order to improve kinetic model structures, Bastin and Dochain (1990) have introduced a unifying stance for modelling φ ( ξ , t ) . This relies on the fact that a reaction rate is necessarily zero whenever the concentration of one of its reactants is zero. This is represented as follows:
∏
∏
∏
∏
= =
= =
j n
n j
j
t t
~
) ξ , ( ) ,
( ξ α ξ
ϕ (2.31)
where α
jis the specific reaction rate.
However, this structure requires again the selection of an appropriate analytical description of α
jamong the fifty different growth rate structures proposed in the Appendix 1 of Bastin and Dochain (1990).
In order to avoid the arbitrary choice of a particular structure, Bogaerts (1999) generalizes the expression (2.31) by adding an activation coefficient and using negative exponential functions for the inhibition modelling:
[[[[ M ]]]]
j e
t t
I l
β t R
h γ h j j
l lj
j
hj
( ) 1 ,
) ξ ,
(
ξ()*
∈ ∈
∈ ∈
= = =
= ∏ ∏ ∏ ∏ ∏ ∏ ∏ ∏
∈
∈
∈
∈
−
−−
−
∈
∈
∈
∈
α
ϕ ξ (2.32)
where
• α
j> > > > 0 is a kinetic constant describing any reaction rate dependence (e.g.
temperature), except the one corresponding to the concentrations of the
components;
• γ
hj> > > > 0 the activation coefficient associated to the component h in reaction j;
• β
lj≥ ≥ ≥ ≥ 0 the inhibition coefficient of the component l in reaction j;
• I is the set of all the component indices while R
*jis the set of reactant, catalyst and auto-catalyst components.
Such a structure enables to describe activation and/or inhibition phenomena for any component of the culture (Bogaerts et al., 1999). Moreover, it allows to avoid the problem of the time-consuming optimization (Bogaerts and Hanus, 2000). Indeed, this structure is linearizable with respect to its parameters and it is therefore possible to determine unique initial estimates of the different physical parameters.
However, in order to model particular behaviours of micro-organisms,
specific kinetic structures have been developed. For instance, Zinn et al. (2004)
propose to represent dual nutrient limited growth thanks to basic models
(Monod-type laws). In the same way, Ribes et al. (2004) develop a model
reproducing the cessation of bacteria growth below a certain substrate threshold
concentration. As for Sonnleitner and Käppeli (1986), they model yeast cultures
by considering the limited respiratory capacity of Saccharomyces cerevisiae
(section 4.4.2.1). Regarding diauxic behaviours (observed when two substrates
are available for growth), cybernetic models are developed (Kompala et al.,
1986; Narang et al., 1997; Jones et al., 1999; Kompala,, 1999; Di Serio et al.,
2001a; Di Serio et al., 2001b; Altintas et al., 2002). This kind of models relies
on the assumption that cells optimize the use of the available substrates to
maximize their growth rate at every moment. This principle allows to reproduce
the cell preference for one substrate which is consumed to completion before a
possible growth on the second substrate.
Now that the first principles model is described with its pseudo- stoichiometric and kinetic parts, we consider black box modelling in the following subsection.
2.2.3.2 Black box models
A black box model considers the system to be modelled in terms of its input and output characteristics. Such a model is only based on experimental data.
Indeed, it does not rely on any prior knowledge. Thus, it has no physical interpretation. This kind of models is often preferred when bioprocess knowledge is missing.
Black box models may be static or dynamic according to their dependance with time. If we define two vectors which contain input ( u ) and output ( y ) observations according to the time t:
[ ( 1 ) ( 2 ) ( t ) ]
t
u u u
u = K (2.33)
[ ( 1 ) ( 2 ) ( t ) ]
t
y y y
y = K (2.34)
a general dynamic black box model can be expressed as follows:
) ( ) , , ( )
( t g u
t 1y
t1θ ε t
y =
− −+ (2.35)
where
• y (t ) is a vector which contains the outputs at the time t;
• u (t ) the inputs at the time t;
• g ( u
t−1, y
t−1, θ ) is the model structure to be chosen in order to reproduce the
behaviour of the system;
• θ is the vector of parameters to be determined;
• and finally, ε is the noise on the output measurements.
The selection of the nonlinear mapping g ( u
t−1, y
t−1, θ ) is very difficult.
Numerous combinations of basis functions exist including wavelets, B-splines (Ljung, 2001) or fuzzy models (Campello et al., 2003). However, the most common black boxes are Artificial Neural Networks (ANN) according to Sjöberg et al. (1995), Suykens and VandeWalle (1998) or Amrit and Saha (2005). Such models are simple combinations of basis functions represented in a particular manner. They are described in terms of simple processing elements that are connected into a network as in Figure 2.2. Because of their structural similarities with highly interconnected biological neural networks, a physiological vocabulary has been developed to describe these structures.
Hence, the following subsection proposes some neural definitions.
Figure 2.2: Example of an Artificial Neural Network
2.2.3.2.1 Basic neural network architectures
As explained previously, an Artificial Neural Network is composed of simple processing elements that are connected into a network. These processing elements are called neurons, nodes or units and transform their inputs according
u
3y
1y
3y
2Input Layer
Hidden Layers
Ouput Layer u
2u
1Figure 2.3: a neuron
to a defined function h which provides a unique argument to a scalar valued function f , called activation function (figure 2.3). h and f may have different forms. The most common functions h are the Euclidean norm between the neuron inputs u and some parameters C to be determined
C u C u , ) = = = = − − − − (
h (2.36)
and a linear combination of the inputs B
u V V
u , ) = = = = + + + + (
h (2.37)
where V are called weights and B biases.
Regarding the activation function f , Haykin (1994), Suykens et al. (1996) and Norgaard et al. (2000) propose different versions. Hence, linear, sigmoid, piece-wise constant or hyperbolic tangent functions are often met (figure 2.4).
-4 -2 0 2 4
0 0.5 1
x f=sgn(x)
-4 -2 0 2 4
-1 -0.5 0 0.5 1
x f=tanh(x)
-4 -2 0 2 4
-1 -0.5 0 0.5 1
x f=x
-4 -2 0 2 4
-1 -0.5 0 0.5 1
x f=(1+e-x)-1
Figure 2.4: examples of activation functions: piece-wise constant, hyperbolic tangent, linear and sigmoid functions
Neuron
u
h f
M
z
In a network, the units can be combined in numerous fashions. However, they are always ordered in layers. Hence, a network presents three kinds of layers: an input layer that does not compute anything but provides the signals to the second layer, one or more hidden layers and finally an output layer.
According to the interconnections between the neurons of the different layers, two classes of structures are distinguishable: the feedforward and the recurrent neural networks. The neuron inputs in a feedforward network (as in figure 2.2) are outputs of previous layers. Regarding recurrent networks, they present feedback elements that allow to model dynamic systems. This is why Gadkar et al. (2005) adopt such a model for bioprocess control. However, the use of a recurrent network is limited by the degree of complexity they introduce (Montague and Morris, 1994; Lennox et al., 2001). Hence, numerous researchers prefer the simplest form of recurrent structure: the globally recurrent network that consists in a feedforward neural network which feeds back its outputs to the input layer (e.g. (Karim et al., 1997), (de Assis and Filho, 2000), or, for an economic system (Swarup and Simi, 2006)).
Since a feedforward network allows to develop static and dynamic models, we propose now to describe the most widespread feedforward structures: the MultiLayer Perceptron and the Radial Basis Function Network. However, numerous other networks exist; let us quote Probabilistic (Specht, 1990; Simon and Karim, 2001), Wavelets (Oussar et al., 1998) or Generalized Regression Neural Networks (Specht, 1991; Kulkarni et al., 2004).
2.2.3.2.2 MultiLayer Perceptron
The MultiLayer Perceptron or MLP is the most widespread neural network.
It presents one or more hidden layers of computation nodes and is characterized
by an input signal propagated in a forward direction, on a layer-by-layer basis
(Haykin, 1999).
The use of such a model implies difficult choices. Indeed, which activation functions should be used in each neuron? How many hidden layers and how many nodes should be included in the network? Cybenko (1989) tries to answer these questions. He shows that all continuous nonlinear functions can be approximated, to any desired accuracy, with a network with one hidden layer of sigmoid (or hyperbolic tangent) hidden units and a layer of linear output units.
This ability to approximate any nonlinear function explains why most authors adopt this structure called sigmoid neural network.
The mathematical expression of an MLP with one sigmoid hidden layer and a linear output layer is the following:
m i
b t
u v f w t
y
nhl
r
i m
j
r j rj ir
i
( ) ( ( ) ) 1 , ,
1 1
= K
=
=
= + + + + + + + +
=
=
=
= ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑
=
=
=
= ====
β (2.38)
or in matricial form B β Vu W
y ( t ) = = = = f ( ( t ) + + + + ) + + + + (2.39)
where
• n
hlis the number of hidden neuron;
• u ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
lthe input vector;
• y ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
mthe output vector;
• W ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
mxnhlcontains the weights of the output layer;
• B ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
mthe biases of the output layer;
• V ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
nhlxlthe weights of the hidden layer;
• β ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
nhlthe biases of the hidden layer;
• and
e
qq
f
−= + 1 ) 1
( is the sigmoid activation function.
The number of hidden units depends on the modelled system as we will observe in Chapter 3.
2.2.3.2.3 Radial Basis Function Network
The most common alternative to the MLP is the Radial Basis Function network or RBF. This kind of ANN is also a universal approximator, able to approximate any nonlinear continuous function arbitrarily accurately. Like the sigmoid neural network, it presents three layers including one linear output layer and one hidden layer with radial basis activation functions. The mathematical expression of such a network is the following:
(((( t )))) b i m
f w t
y
nhl
j
i j ij
i
( ) ( ) 1 , ,
1
= K
=
=
= +
+ + +
−
− −
−
=
= =
= ∑ ∑ ∑ ∑
==
==
C
u (2.40)
where
• w
ijrepresent the weights of the output layer;
• b
ithe biases of the output layer;
• C
j∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
nhlx1is the centre of the j
thactivation function;
• and f (((( u ( t ) − − − − C
j)))) the j
thradial basis function that usually is a Gaussian
(((( ))))
2) 2
(
)
(
jt j
j
e
t
f
rC u
C u
−
−−
−
−
−
−
−
=
=
=
=
−
− −
− (2.41)
where r ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
nhlare the widths of the Gaussians centred in C .
2.2.3.2.4 Artificial Neural Networks and bioprocess modelling
In spite of a variety of neural network proposals, ANN’s are not often used to
model dynamic bioprocesses. Indeed, when the available process knowledge is
not used, a large number of parameters is necessary to model the considered
system. This leads to weak extrapolation properties and the need for a large set of data for process identification. Hence, numerous authors (e.g; (Psichogios and Ungar, 1992), (Zorzetto et al., 2000)) prefer combining the ANN capability to approximate any nonlinear continuous function with available prior knowledge.
The concept of the hybrid model is born.
However, feedforward neural networks are commonly used to develop static software sensors. They are exploited to determine a priori unknown nonlinear relationships between on-line signals and off-line component concentrations.
They combine signals like dissolved oxygen, carbon dioxide evolution rate or oxygen uptake rate in order to reproduce macroscopic species behaviour.
Examples of such sensors are exposed in Warnes et al. (1996), Linko et al.
(1997), Linko et al (1999) or James et al. (2002).
2.2.3.3 Hybrid models
First principles models of bioprocesses are very useful to build engineering tools in biotechnology. They rely on the available prior knowledge, the transport phenomena. Moreover, they allow a better understanding of the process behaviour thanks to the determination of an appropriate reaction scheme and a kinetic model.
However, the pseudo-stoichiometry and the kinetics are often difficult to select because these are usually a priori unknown. In order to avoid these complex choices, some prefer modelling a bioprocess by using black box models without physical interpretation. Nevertheless, their numerous parameters lead to weak extrapolation properties and the need of a large set of data for process identification.
But there exists a compromise between these two extreme modelling
techniques: the hybrid model that combines a first principles model, which
incorporates the available prior knowledge, with a black box which serves as an estimator of unmeasured process parameters.
Thompson and Kramer (1994) distinguish two types of hybrid models, the serial and the parallel approaches. In the parallel structure (figure 2.5), a complete first principles model is connected in parallel with a neural network.
The first principles model provides an estimation of the output, while the neural network is trained to compensate the remaining errors between the model and the observed process behaviour. In the serial structure (figure 2.6), an incomplete first principles model is used. The unknown or not well known terms (i.e. the pseudo-stoichiometry and/or the kinetics) are represented by a black box, usually a neural network.
Figure 2.5: Parallel hybrid model
Figure 2.6: Serial hybrid model First principles model
Black Box x
y First principles model
Black Box
x y
These two types of structures are the subject of numerous papers. Hence, Vande Wouwer et al. (2004) study parallel and serial approaches. Psichogios and Ungar (1992) and James et al. (2002) compare some kinds of ANN’s to the corresponding hybrid models. Others develop special identification methods (e.g. (Chen et al., 2000), (Hanomolo et al., 2000), (Graefe et al., 1999)) while the last ones use such structures to model kinetics (Schubbert et al., 1994a), to monitor (Zorzetto and Wilson, 1996) or control their processes (Chen et al., 2004a).
Now that the main bioprocess models are described, we approach parameter estimation techniques in the following section.
2.3 Parameter estimation
The selection of an appropriate model structure is essential for modelling engineering processes. Nevertheless, the model parameters within the structure are fundamental ingredients and therefore not less important. Once the structure is selected, the unknown parameters have to be determined. This is usually done by optimization of a criterion that evaluates the agreement of the model output with some information from experimental studies.
The following sections introduce the concept of identifiability and the different criteria that are commonly used for model identification. Afterwards, we propose a short overview of numerical optimization methods before presenting the special case of artificial neural networks.
2.3.1 Identifiability
A critical point in model identification is the identifiability of the selected
model structure. The identifiability is a necessary condition for a successful
parameter estimation since it guarantees the uniqueness of the identified
parameters for a given model output. This property should be studied as independently as possible of the values taken by the parameters. As a matter of fact, this study should take place before estimation of the parameters to detect potential structural problems.
In order to characterize the structural identifiability of a model, Walter and Pronzato (1997) consider an idealized framework where
• the process and model have identical structure M( θ ), characterized by their parameters, respectively θ
*and θ ˆ ;
• the data are not corrupted by noise;
• the input of the process and the measurement times can be chosen at will.
Under these conditions, it is always possible (e.g. by choosing θ ˆ = = = = θ
*) to tune the parameters of a model so as to make its input-output behaviour identical to that of the process, and this, for any time and any input. Hence, a parameter θ
iwill be structurally globally identifiable if for almost any θ
**
*
) ˆ
( ˆ )
( M
i iM θ = = = = θ ⇒ ⇒ ⇒ ⇒ θ = = = = θ (2.42)
There exist various methods to test the structural identifiability of a model (Walter and Pronzato, 1997). However, in the case of a nonlinear equation system, for which an analytical solution does not exist or is hard to obtain, the structural identifiability is often put aside while practical one is preferred. With practical identifiability, the capability to obtain an appropriate model thanks to the available signals is tested. After the model identification, the information contained in the data is checked thanks to the calculation of the Fisher information matrix. This matrix quantifies the sensibility of the model w.r.t. the parameters on the basis of the available experimental field (see section 2.3.2).
This check is however of local nature only, since it analyzes the correlation
between parameters for given experiments (Haag, 2003).
2.3.2 Optimization criteria
The aim of optimization is the determination of a set of variables (parameters) that leads to the best model. This parameter selection usually relies on the minimization of a scalar cost function J (θ ) with respect to the model parameters θ . Such a cost function corresponds to the distance between the experimental measurements and the values produced by the model. Hence, the optimal value of θ depends of course on the chosen distance type, which should therefore always be specified.
In this section, we propose to recall the general principles of the three main approaches associated to the three most common cost functions: the maximum likelihood, the Markov and the least squares estimators. We will see that the step from one to the other criterion requires less a priori information but corresponds to increasingly restrictive assumptions (Bogaerts, 1999).
2.3.2.1 Maximum Likelihood Criterion
The maximum likelihood estimator is based on the search for the most likely errors given the considered model and the available experimental data.
Given a measurement set y
mk∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
ny, φ
mk∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
nϕ(((( k ∈ ∈ ∈ ∈ [[[[ 1 , N ]]]] )))) of the signals y
kand φ
kthat are linked by a nonlinear model which is a function of the variables θ ∈ ∈ ∈ ∈ ℜ ℜ ℜ ℜ
nθ) , ( φ θ
y
k= = = = g
k(2.43)
Let us consider that y
mkand φ
mkare measurements corrupted by
independent random errors ε
ykand ε
ϕkyk k
mk
y ε
y = = = = + + + + (2.44)
k k
mk
φ ε
ϕφ = = = = + + + + (2.45)
where ε
ykand ε
ϕkare supposed to be white, with Gaussian distribution of zero mean and known time varying covariances Q
y,kand Q
φ,k.
Since the expressions of the Gaussian distributions of the measurement errors are given by
[[[[ ]]]]
(((( ))))
− − − −
=
=
=
=
Tyk −−−−yk ykyk yk n
y
p ε Q ε
Q
ε
12 exp 1 ) det(
2 1 π
(2.46)
[[[[ ]]]]
(((( ))))
− − − −
=
=
=
=
Tk −−−−k kk k n
p
ϕ ϕ ϕϕ
ϕ
π
ϕε Q ε Q
ε
12 exp 1 ) det(
2
1 (2.47)
the joint probability associated to the set of the N measurements can be written as follows given the whiteness assumption on the measurement errors
[[[[ ]]]] [[[[ ]]]] [[[[ ]]]]
(((( )))) (((( ))))
∏ ∏
∏ ∏
∏ ∏
∏ ∏
==
==
−
−
−
−
−
−
−
− +
++ +
=
=
=
=
− − − − + + + +
=
=
=
=
=
=
=
=
=
= =
=
N
k
k k T
k yk yk T
yk k
yk n
n
N
k
k yk
k yk
y
p p
N k p
1
1 1
1
2 exp 1 ) det(
) det(
2
1 ] , 1 [
; ,
ϕ ϕ ϕ ϕ
ϕ ϕ
π
ϕQ Q ε Q ε ε Q ε
ε ε
ε ε
(2.48) The most likely errors maximize latter probability while respecting the model hypothesis (2.43). The expression of the estimator is then the following
{{{{ }}}}
(((( )))) (((( ))))
∏ ∏
∏ ∏
===
=
−
−
−
−
−
−
−
− +++
+
− − − − + + + +
=
=
=
=
N
k
k k T
k yk yk T yk k
yk n
n k
yk
k y yk
ArgMax
1
1 1
,
2
exp 1 ) det(
) det(
2
1 ˆ
, ˆ
ϕ ϕ ϕ ϕ
ϕ
ϕ
π
ϕε Q ε ε Q ε Q
Q ε
ε
ε ε