Claudio A. Gelmi*^{2} · Ricardo Pérez-Correa^{2}
Processes in biotechnology usually represent productions in relatively small quantities and are therefore mostly done in batches. These systems possess an inherently dynamic character making it difficult for scale-up and optimization. Therefore, accurate dynamic models are necessary to predict the behaviour of these processes, which is why models have gained great importance in recent years. However it is difficult to uniquely identify the parameters and state variables, due to the sheer number used. The usual procedure to identify a certain model relies on the quest to identify a set of values for the unknown parameters, such that the integration of the model will represent the existing experimental data as precisely as possible. This procedure, referred to as regression, involves minimizing the differences between experimental data and the model predicted values-which is essentially an optimization problem. Most of the problems in bioprocess engineering applications are highly constrained; they exhibit nonlinear dynamics and sometimes have a non-smooth nature. These properties often result in non-convex and multimodal problems, and therefore are not well predicted by gradient based local methods. In order to provide an approximation of the global optimum, one must use global optimization (GO) methods (Banga et al. 2003). These methods are based on robust solvers that can locate the vicinity of the global solution in a reasonable number of iterations and handle noise and/or discontinuities. Stochastic algorithms, a particular classification of GO methods, treat the objective function as a black box; that is, as a simple relationship between inputs and outputs without further information about relationships among the decision variables (no derivative information). Metaheuristics are particularly useful stochastic global optimization methods which are often used in biotechnology. They consist of an iterative generation process that efficiently finds near-optimal solutions by intelligently combining different learning strategies to explore and exploit the search spaces (Jones et al. 2002). These methods are easy to implement, making them robust for a wide variety of problems. The metaheuristic applied in this work is a scatter search method (SS), which is a population-based method that has recently shown promising outcomes for solving combinatorial and nonlinear optimization problems (Egea et al. 2007). Nonetheless, a good model fitting doesn’t ensure that a suitable model can be found. Models with a great number of parameters generally fit the experimental data better, but they tend to estimate parameters with large confidence intervals. Therefore, the quality of the parameter estimation, in terms of accuracy, needs to be checked before reaching a meaningful interpretation of the results. The procedures required to test the quality of the parameter estimation are called post-regression diagnostics, which encompassvarious methods (Jaqaman and Danuser, 2006): Parameter sensitivity (how parameter variations affect response variables). Parameter significance (determine parameter confidence intervals). Parameter identifiability (detection of cross correlation between parameters). Model validity, also known as structural identifiability, implies that the model’s parameters can be uniquely identified. If the post-regression diagnostics are unsatisfactory, the model has to be modified and/or the experimental data has to be repeated or complemented with new experimental data. Furthermore, the diagnostic methods can give valuable recommendations for modifying experimental setups in order to increase the data’s information content. It has to be noted that there are several methods that systematically optimize experiments so that their information content is maximized. The different criteria and problems encountered are described in other publications (Chu and Hahn, 2008; Franceschini and Macchietto, 2008), and will not be discussed here. Finally, post-regression diagnostics are similar in function to pre-regression diagnostics, since the process of linking data to models is inherently cyclical and iterative (Figure 1). In this research, a published
dynamic model (Gelmi et al. 2002) that consists of eight coupled ordinary
differential equations and 14 unknown parameters was analyzed. The model
presented in Gelmi et al. 2002 describes the growth and production of a
secondary metabolite in solid substrate cultivation (SSC) (for more details see Kinetic model). The model was fitted to different experimental
conditions using a modern version of the SS algorithm, SSm (Rodriguez-Fernandez
et al. 2006; Egea et al. 2007). Parameter sensitivity, significance and identifiability were
applied in order to assess the model’s validity. As a consequence of the
post-regression analysis, a modified model with a reduced parameter set was
postulated and fitted to the experimental data using The lumped parameter model
(Gelmi et al. 2002) describes the changes in the biomass
of Oxygen transfer resistance is low. Available nitrogen is the limiting substrate. The carbon source is not limiting. Temperature (T) and water activity (aw) remain constant throughout the cultivation. The model parameters remain constant during the cultivation. The total amount of
measurable dry biomass ( Assuming a first order death rate, the active biomass is: Here, Experimental data (not shown in this article, see Gelmi et al. 2000) shows that the rate of urea consumption follows a zero order kinetics and is expressed as:
In Equation 4, 0.47
corresponds to the urea nitrogen content and The microorganism consumes starch for growth and maintenance, Since GA The differential equations
for CO The specific growth rate, Here, _{3} production rate, where The above model was
calibrated using experimental data of four cultures at different temperature
and a 1) Temperature = 25ºC, Water Activity = 0.992. 2) Temperature = 25ºC, Water Activity = 0.999. 3) Temperature = 31ºC, Water Activity = 0.985. 4) Temperature = 31ºC, Water Activity = 0.992. Further details regarding the experimental setup and the above model are available in Gelmi et al. (2000) and Gelmi et al. (2002).
The model parameters were estimated by minimizing the sum of squared residual errors between predicted and experimental data. Because not all variables were measured simultaneously and because variables have different scales, the sums of squared residuals were weighted by the squared number of measurements and by the squared mean experimental value. Thus, the following least square function was minimized: where Θ is the
parameter space, j of the
variable i, m is the number of measured variables, n is the number of measurements and is the mean of experimental
measurements for the variable _{i}i.The optimization procedure
was solved numerically using SSm, which is a routine developed in MATLAB
The regression diagnostic
tests were computed using the software package MATLAB Parameter sensitivity measures to what extent the model’s outputs are influenced by variations in parameter values. Absolute sensitivity is defined as: It is particularly useful for complex systems that involve a large number of variables and parameters, in which it is crucial to identify either the most important ones or the non-sensitive parameters (Yue et al. 2008). Non-sensitive parameters are parameters that don’t influence the state variables. In dynamic models, parameter sensitivity varies with time and thus parameters can be non-sensitive for certain time intervals and sensitive for others. Since parameters and response variables can differ extremely in scale, a normalized sensitivity is useful to facilitate comparison: The derivatives of Equation 13 were determined numerically by the finite difference method. A central difference approach was used with a 1% perturbation factor around the nominal parameter values. Parameter determinability or identifiability calculates the correlation matrix of the model parameters in order to determine if the fitted parameters are locally identifiable for a given time interval (Jacquez and Greif, 1985). The matrix elements represent the parameter cross correlations, defined as:
If two parameters are highly correlated, it is said that they are a priori unidentifiable. These parameters affect the measured variables in exactly the same way (or same opposite way). Parameter significance and
confidence intervals were calculated after the parameters were fitted to the
experimental data. In order to establish the statistical significance of
estimates, a Student where i^{th} parameter. The t-values
may be tested by reference to a t-distribution with (n-p)
degrees of freedom; higher t-values than those in the distribution tend
to indicate reliable estimates (t-value > 1.96 for large values of n-p).
Very low t-values suggest that confidence intervals or regions involving
that parameter may include zero and such a situation means that the parameter
could be statistically dropped from the model.The usual method to obtain the parameter variances relies on estimating them through the diagonal elements of the Fisher Information Matrix(FIM) (Landaw and DiStefano III, 1984; Petersen et al. 2001): In the case of biologic and chemical dynamic processes, the FIM can be obtained by (Chu and Hahn, 2008): The matrix product is
summed over all experimental times. In this equation, We numerically estimated the derivatives of the sensitivity matrix using a central finite difference: This task proved far from
trivial. Usually, one would make the Δ Notice that in order to apply Equation 18, all experimental variables need to have measurements at the same experimental times. This is not our case. However, the frequency of the measurements was very similar and many of them were sampled at the same times. This made it reasonable to interpolate the different measurements using a uniform time interval vector. The interpolated values and times are shown in Figure 2, Figure 3, Figure 4 and Figure 5. In this section we will present and discuss the main results of the different diagnostic methods used in this paper (see Pre/post-regression diagnostics), along with the modifications performed based on our analysis. Later, we will show how the fitted parameters compare with those reported by Araya et al. (2007) using Orthogonal Collocation on Finite Elements (OCFE). Finally, we contrast the performance of the global optimization technique used here, with standard techniques used in previous publications. The identifiability analysis showed that several pairs of the 14 model parameters were highly correlated. Primarily, parameter K_{ij} > 0.98) in all conditions. The implication of this is that
there are several combinations of µ_{max} and k values that lead to identical biomass levels. The usual procedure to deal with
this situation is to hold constant one of the correlated parameters. The
parameter to fix should be the one that is well known in the literature so that
a reliable value for the less known parameter is obtained. In our case we fixed _{N}µ_{max} as we found reliable values in Gelmi et al. (2000).Significant correlation
was also found in parameters associated with starch, CO m, _{S}Y and _{X/CO2}m, _{CO2}Y and _{X/O2}m. As before, the better known value should be fixed
in order to make a reliable estimation for the other. However, no reliable
values for those parameters exist in the literature, since to date few
investigations in SSC have been done in _{CO2}G. fujikuroi. Consequently,
other experiments have to be planned under each condition in order to identify
one of the parameters independently. In this study we fixed Y_{X/S}_{, }Yand _{X/CO2} Y arbitrarily to the
values obtained in Gelmi et al. (2002), as shown in Table 1. However, the
oxygen yields are similar to (even smaller than) those previously reported in
SSF-between 3.8 and 2.7 g/g- (Soccol et al. 1993; Rodríguez-León et al. 1999;
Brand et al. 2001; Brand et al. 2002). Furthermore, a recent publication by
Chavez-Parga et al. (2008) reported yields similar (Y_{X/O2}_{x/s} = 0.22-1.87
g/g and Y_{x/n} = 11-49.4 g/g) to the ones reported by Gelmi et al. (2002).
The confidence intervals shown in Figure 6 show that the estimated
biomass/nitrogen yields are reliable.
Our sensitivity analysis
showed that the degradation rate _{3} equation shows that regardless of the culture conditions evaluated, state
variables show no sensitivity on any state variable at any culture condition.
Therefore, in order to facilitate the optimization effort, we set k to zero. All other parameters showed to affect at least one of the state
variables of the model. The results are shown in Figure 7._{p}
First, calibrations with the full set of 14 parameters were performed. The regression was unstable, in the sense that very different estimates produced equally good results. This is due to the effects of unidentifiable parameters described in Identifiability analysis. As a direct consequence, we reduced the parameter space to 9 parameters. After the reduction in dimensionality, the calibration results were stable. Figure 2, Figure 3, Figure 4 and Figure 5 show our calibration curves in comparison with the curves resulting from OCFE. Under all culture conditions, good parameter fits were produced. In Table 6 the objective function values were compared to the ones found in Araya et al. (2007) and Gelmi et al. (2002).
In this section we present
the To perform a fair comparison, we calculated the standard
deviations for all three parameter sets using the same method described in Methods.
For all three parameter sets, First of all, it can be
seen that the significance generally increased by reducing the number of
parameters (Table 2, Table 3, Table 4 and Table 5, t-value (OCFE) versus t-value
(14 par.)). Therefore, in this case the parameters will have similar confidence
intervals.Significance problems only
occurred in the case of k can be estimated reliably by sampling more frequently in the zone where dGA_{i}_{3}/dk_{i} is most sensitive (i.e., t > 50 hrs, see Figure 1, column GA_{3},
parameter k)._{i}
The Scatter Search algorithm proved to be an effective and easy tool to handle a global minimization problem for a typical biologic dynamic model. Regression diagnostics are crucial tools to validate and modify the model, as they could eradicate important correlations between parameters and reduce the model’s uncertainty. Furthermore, the diagnostic methods could give us important recommendations regarding sampling times for further experiments; this will help to obtain unique parameter estimates and will increase the statistical significance of the model. ARAYA,
M.M.; ARRIETA, J.J.; PÉREZ-CORREA, J.R.; BIEGLER, L.T. and JORQUERA, H. (2007). Fast and reliable calibration of solid substrate fermentation
kinetic models using advanced non-linear programming techniques. BANGA,
J.; BALSA-CANTO, E.; MOLES, C.G. and ALONSO, A.A. (2003). Improving food
processing using modern optimization methods. BRAND,
D.; PANDEY, A.; RODRIGUEZ-LEON, J.A.; ROUSSOS, S.; BRAND, I. and SOCCOL, C.R.
(2001). Packed bed column fermenter and kinetic modeling for upgrading the
nutritional quality of coffee husk in solid-state fermentation. BRAND,
D.; PANDEY, A.; RODRIGUEZ-LEON, J.A.; ROUSSOS, S.; BRAND, I. and SOCCOL, C.R. (2002). Relationship
between coffee husk caffeine degradation and respiration of CHAVEZ-PARGA,
M.C.; GONZALEZ-ORTEGA, O.; NEGRETE-RODRÍGUEZ, M.L.; VALLARINO, I.G.;
GONZÁLEZ-ALATORRE, G. and ESCAMILLA-SILVA, E.M. (2008). Kinetic
of the gibberellic acid and bikaverin production in an airlift bioreactor. CHU,
Y. and HAHN, J.
(2008). Integrating parameter selection with experimental design under
uncertainty for nonlinear dynamic systems. EGEA,
J.; RODRÍGUEZ-FERNÁNDEZ, M.; BANGA, J. and MARTÍ, R. (2007). Scatter search for
chemical and bio-process optimization. FRANCESCHINI,
G. and
MACCHIETTO, S. (2008). Model-based design of experiments for parameter
precision: state of the art. GELMI,
C.; PÉREZ-CORREA, R.;
GONZÁLEZ, M. and
AGOSIN, E. (2000). Solid substrate cultivation of GELMI,
C.; PÉREZ-CORREA, R. and AGOSIN, E. (2002). Modelling JACQUEZ,
J.A. and GREIF,
P. (1985). Numerical parameter identifiability and estimability:
Integrating identifiability, estimability, and optimal sampling design. JAQAMAN,
K. and DANUSER,
G. (2006). Linking data to models: Data regression. JONES,
D.; MIRRAZAVI, S.K. and TAMIZ, M. (2002). Multi-objective meta-heuristics: an
overview of the current state-of-the-art. LANDAW, E.W. and DISTEFANO III, J.J.
(1984). Multiexponential, multicompartmental, and noncompartmental modeling.
II. Data analysis and statistical considerations. PETERSEN, B.;
GERNAEY, G. and
VANROLLEGHEM, P.A. (2001). Practical
identifiability of model parameters by combined respirometric-titrimetric
measurements. RODRIGUEZ-FERNANDEZ,
M.; EGEA, J. and
BANGA, J. (2006). Novel metaheuristic for
parameter estimation in nonlinear dynamic biological systems. RODRÍGUEZ-LEÓN,
J.A.; DOMENECH, F.; LEÓN, M.; MÉNDEZ, T.; RODRÍGUEZ, T.E. and PANDEY, A. (1999). Production of spores of SOCCOL, C.; LEON, J.R.; MARIN, B.; ROUSSOS, S. and RAIMBAULT, M. (1993). Growth kinetics
of YUE, H.;
BROWN, M.; HE, F.; JIA, J. and
KELL, D.B. (2008). Sensitivity analysis and robust experimental design
of a signal transduction pathway system. Note: Electronic Journal of Biotechnology is not responsible if on-line references cited on manuscripts are not available any more after the date of publication. |

Full Text - Improved calibration of a solid substrate fermentation model