You are on page 1of 1

Statistical Modeling in Bioprocess Monitoring

Michael Melcher1,2 , Theresa Scharl1 , and Friedrich Leisch1,2


Austrian Centre of Industrial Biotechnology, Graz, Austria
Institute of Applied Statistics and Computing, University of Natural Resources and Life
Sciences, Vienna, Austria

A major goal in the optimization of the production of biopharmaceuticals is the monitoring, control
and fault detection of/in these bioprocesses in real-time. A limited understanding of these complex multistep processes and the inaccessability of key process parameters, such as the biomass or
product concentration, pose a serious problem. Currently, success or failure of such as process is
determined by labour- and cost-intensive offline measurements (i.e. a sample is drawn and analysed
in a chemical laboratory), in many cases with a delay of several hours or even days. On the other
hand, a great variety of online sensor and analyser systems exists providing physical and chemical
information on the system on a virtually continuous time grid. Machine learning techniques have
demonstrated to be the missing link by translating these data into the previously mentioned quantities of interest, e.g. the product quality and quantity [1].
In a recent study a series of 25 bacterial E. coli fermentations performed within a full-factorial
experimental design served as data basis. Offline measurements of up to 18 responses (biomass
and product concentrations, various supernatants and nucleotides) were conducted in hourly or bihourly intervals. Routinely recorded process variables (pH, feed and base consumption, O2 and CO2
in exhast gas etc.) as well as two-dimensional multi-wavelength fluorescence spectroscopic signals
serve as predictors.
The applicability of four machine learning methods, random forests (RF), neural networks (NN),
partial-least squares (PLS) and boosted models was investigated. As major results we obtained [2]:
• Models solely based on routinely measured process variables (i.e. without additional costs)
give a satisfying prediction accuracy of about ± 4 % for the cell dry mass.
• Additional spectroscopic information allows for an estimation of the product (protein) concentration within ± 12 %.
• While all of the above modeling techniques provide feasible results, a combined approach of
random forests and neural networks came out on top: random forest as a variable selection
tool and neural networks as modeling technique.
[1] G. Striedner and K. Bayer: An Advanced Monitoring Platform for Rational Design of Recombinant Processes,
in C.-F. Mandenius and N. J. Titchener-Hooker: Measurement, Monitoring, Modelling and Control of Bioprocesses.
Springer, 2013, 65-84.
[2] M. Melcher, T. Scharl, B. Spangl, M. Luchner, M. Cserjan, K. Bayer, F. Leisch and G. Striedner: The potential of
random forest and neural networks for biomass and recombinant protein modeling in E.coli fed-batch fermentations.
Biotechnology Journal, DOI: 10.1002/biot.201400790.