Predictive Clustering Hybrid Regression(pCHR) Approach and Its Application to Sucrose-Based Biohydrogen Production

A predictive clustering hybrid regression (pCHR) approach was developed and evaluated using dataset from H2- producing sucrose-based bioreactor operated for 15 months. The aim was to model and predict the H2-production rate using information available about envirome and metabolome of the bioprocess. Selforganizing maps (SOM) and Sammon map were used to visualize the dataset and to identify main metabolic patterns and clusters in bioprocess data. Three metabolic clusters: acetate coupled with other metabolites, butyrate only, and transition phases were detected. The developed pCHR model combines principles of k-means clustering, kNN classification and regression techniques. The model performed well in modeling and predicting the H2-production rate with mean square error values of 0.0014 and 0.0032, respectively.




References:
[1] H. Kitano, Foundations of Systems Biology. The MIT Press, 2001.
[2] M. T. Facciotti, R. Bonneau, L. Hood, and N. S. Baliga, "Systems
biology experimental design - considerations for building predictive
gene regulatory network models for prokaryotic systems," Current
Genomics, vol. 5, no. 7, pp. 527-544, Nov. 2004.
[3] H. Kitano, "Systems biology: a brief overview," Science, vol. 295, no.
5560, pp. 1662-1664, March 2002.
[4] A. Kremling, and J. Saez-Rodriguez, "Systems biology - an
engineering perspective," J. Biotechnol., vol. 129, pp. 329-351, 2007.
[5] R. Takors, B. Bathe, M. Rieping, S. Hans, R. Kelle, and K.
Hutchmacher, "Systems biology for industrial strains and fermentation
processes - example: amino acids," J. Biotechnol., vol. 129, pp. 181-
190, 2007.
[6] P.C. Hallenbeck, "Fundamentals of fermentative production of
hydrogen," Water Sci. Technol., vol. 52, no. 1-2, pp. 21-29, 2005.
[7] J-O. M.Bockris, "The origin of ideas on a hydrogen economy and its
solution to the decay of the environment," Int. J. Hydrogen Energy, vol.
27, pp. 731-740, 2002.
[8] D. Das, and T.N. Veziro─ƒlu, "Hydrogen production by biological
processes: a survey of literature," Int. J. Hydrogen Energy, vol. 26, pp.
13-28, 2001.
[9] J. Benemann, "Hydrogen biotechnology: progress and prospects," Nat.
Biotechnol., vol. 14, pp. 1101-1103, 1996.
[10] I. K. Kapdan, and F. Kargi, "Bio-hydrogen production from waste
materials," Enzyme Microb. Tech., vol. 38, pp. 569-582, 2006.
[11] C. Li, and H. H. P. Fang, "Fermentative hydrogen production and
wastewater and solid wastes by mixed cultures," Crit. Rew. Env. Sci.
Technol., vol. 37, pp. 1-39, 2007.
[12] C.-Y. Lin, and R.-C. Chang, "Fermentative hydrogen production at
ambient temperature," Int. J. Hydrogen Energy, vol. 29, pp. 715-720,
2004.
[13] J. Rodriguez, R. Kleerebezem, J. M. Lema, and M. C. van Loosdrecht,
"Modeling product formation in anaerobic mixed culture
fermentations," Biotechnol. Bioeng., vol. 93, pp. 592-606, 2006.
[14] R. Nandi, and S. Sengupta, "Microbial production of hydrogen: an
overview," Crit. Rev. Microbiol., vol. 24, pp. 61-84, 1998.
[15] G. Liden, "Understanding the bioreactor," Bioprocess Biosyst. Eng.,
vol. 24, pp. 273- 279, 2002.
[16] Nikhil, "Formulation of mathematical models for control and
optimization of bioreactors," M.Sc. thesis, Dept. Environmental
Technology, Tampere Univ. Technology, Tampere, Finland, 2005.
[17] Nikhil, "Application of systems bioengineering for fermentative
hydrogen production," presented at 3rd TICSP Workshop on
Computational Systems Biology, WCSB 2005, Tampere, Finland, June
13 - 14, 2005, pp. 33-34.
[18] K. Y. Rani, and V. S. R. Rao, "Control of fermenters - a review,"
Bioprocess Eng., vol. 21, pp. 77-78, 1999.
[19] Schugerl, K.; Bellgardt, K.H. Bioreaction engineering. Modeling and
control. Berlin, Heidelberg, New York: Springer-Verlag. 2000.
[20] Bailey, E.J. Mathematical modeling and analysis in biochemical
engineering: Past accomplishments and future opportunities.
Biotechnol. Prog. 1998, 14, 8-20.
[21] Bernard, O.; Bastin, G. On the estimation of the pseudo-stoichiometric
matrix for macroscopic mass balance modelling of biotechnological
processes. Math. Biosci. 2005, 193, 51-77.
[22] Husain, A. Mathematical models of the kinetics of anaerobic digestion
- a selected review. Biomass. Bioenerg. 1998, 14, 561-571.
[23] McCarty, P.L.; Mosey, F.E. Modelling of anaerobic digestion processes
(a discussion of concepts). Wat. Sci. Technol. 1991, 24:8, 123-129.
[24] Batstone, D.J.; Keller, J.; Angelidaki, I.; Kalyuzhnyi, S.V.;
Pavlostathis, S.G.; Rozzi, A.; Sanders, W.T.M.; Siegrist, H.; Vavilin,
V.A. Anaerobic digestion model no. 1 (ADM1), IWA Task Group for
mathematical modelling of anaerobic digestion processes. London, UK:
IWA Publishing 2002.
[25] Blumensaat, F.; Keller J. Modelling of two-stage anaerobic digestion
using the IWA Anaerobic Digestion Model No. 1 (ADM1). Water Res
2005, 39, 171-183.
[26] Kalyuzhnyi, S.V. Batch anaerobic digestion of glucose and its
mathematical modeling. II. Description, verification and application of
model. Bioresour. Technol. 1997, 59, 249-258.
[27] Parker, W.J. Application of the ADM1 model to advanced anaerobic
digestion. Bioresour. Technol. 2005, 96, 832-1842.
[28] Nikhil, A. Visa, O. Yli-Harja, C.-Y. Lin, and J. A. Puhakka,
"Application of the Clustering Hybrid Regression Approach to Model
Xylose-Based Fermentative Hydrogen Production," Energy Fuels,
2008, 22 (1), 128-133.
[29] Nikhil, P. E. P. Koskinen, A. Visa, A. H. Kaksonen, J. A. Puhakka, and
O. Yli-Harja, "Clustering hybrid regression (CHR): a novel
computational approach to study and model biohydrogen production
through dark fermentation," Bioprocess and Biosystems Engineering,
2008, doi: 10.1007/s00449-008-0213-9.
[30] P. J. Huber, "Projection pursuit," Ann. Statist., vol. 13, no. 2, pp. 435-
475, 1985.
[31] J. H. Friedman, "Exploratory projection pursuit," J. Amer. Statist.
Assoc., vol. 82, no. 397, pp. 249-266, 1987.
[32] B. D. Ripley, "Neural networks: a review from statistical perspective,"
Statistical Sci., vol. 9, no. 1, pp. 45-48, Feb. 1994.
[33] J. A. Lee, A. Lendasse, and M. Verleysen, "Nonlinear projection with
curvilinear distances: isomap versus curvilinear distance analysis,"
Neurocomputing, vol. 57, pp. 49-76, 2004.
[34] T. Kohonen, Self-organizing maps. Springer, Berlin, Heidelberg, New
York: Springer Series in Information Sciences, vol. 30, 1995.
[35] S. Kaski, "Data exploration using self-organizing maps," D.Tech.
(Ph.D.) dissertation, Helsinki University of Technology, Finland, 1997.
[36] M. Kasslin, J. Kangas, and O. Simula, "Process state monitoring using
self organizing maps," in Artificial Neural Networks, vol. 2, I.
Aleksander, and J. Taylor, Eds. Amsterdam, The Netherlands, North
Holland, 1992, pp. 1531-1534.
[37] O. Simula, and J. Kangas, Process monitoring and visualization using
self-organizing maps. Neural networks for chemical engineers.
Computer-aided chemical engineering. Amsterdam: Elsevier, 1995, pp.
377-390.
[38] H. Yin, "ViSOM - a novel method for multivariate data projection and
structure visualization," IEEE Trans. Neural Networks, vol. 13, no. 1,
pp. 237-243, Jan. 2002.
[39] Tamayo, P.; Slonim, D.; Mesirov, J.; Zhu, Q.; Kitareewan, S.;
Dmitrovsky, E.; Lander, E.; Golub, T. Interpreting patterns of gene
expression with self-organizing maps; methods and application to
hematopoietic differentiation. Proceedings of the National Academy of
Sciences, USA 1999, 96, 2907-2912.
[40] Törönen, P.; Kolehmainen, M.; Wong, G.; Castren, E. Analysis of gene
expression data using self-organizing maps. FEBS Letters 1999, 451:2,
142-146.
[41] Hill, A.; Hunter, C.; Tsung, B.; Tucker-Kellogg, G.; Brown, E.
Genomic analysis of gene expression in C.elegans. Science 2000, 290,
809-812.
[42] Chen, D.-R.; Chang, R.-F.; Huang, Y.-L. Breast cancer diagnosis using
self-organizing maps for sonography. Ultrasound in Medicine and
Biology 2000, 26:3, 405-411.
[43] J. C. Principe, L. Wang, and M. A. Motter, "Local dynamic modeling
with self-organizing maps and applications to nonlinear system
identification and control," Proc. IEEE, vol. 86, no. 11, pp. 2240-2258,
Nov. 1998.
[44] T. Kohonen, J. Hynninen, J. Kangas, and J. Laaksonen, "SOM_PAK:
The self-organizing map program package," Laboratory of Computer
and Information Science, Helsinki University of Technology, Finland,
Technical Report A31, 1996.
[45] J. Vesanto, J. Himberg, E. Alhoniemi, and J. Parhankangas, (2000)
"SOM Toolbox for MATLAB 5," SOM Toolbox Team, Helsinki
University of Technology, Finland. Available:
http://www.cis.hut.fi/projects/somtoolbox/.
[46] J. W. Sammon, Jr, "A nonlinear mapping for data structure analysis,"
IEEE Trans. Computers, vol. c-18, no. 5, pp. 401-409. May 1969.
[47] D. K. Agrafiotis, "A new method for analyzing protein sequence
relationships based on Sammon maps," Protein Sci., vol. 6, pp. 287-
293, 1997.
[48] B. Lerner, H. Guterman, M. Aladjem, and I. Dinstein, "On the
initialization of Sammon-s nonlinear mapping," Pattern analysis and
applications, vol. 3, pp. 61-68, 2000.
[49] J. B. MacQueen, "Some methods for classification and analysis of
multivariate observations," Proc. 5th Berkeley Symposium on
Mathematical Statistics and Probability, Berkeley, University of
California Press, vol. 1, pp. 281-297, 1967.
[50] A. K. Jain, and R. C. Dubes, Algorithms for clustering data. Englewood
Cliffs, New Jersey: Prentice Hall, 1988.
[51] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review,"
ACM Comput. Surv., vol. 31, pp. 264-323, 1999.
[52] T. M. Cover, and P. E. Hart, "Nearest neighbor pattern classification,"
IEEE Trans. Information Theory, vol. IT-13, no. 1, pp. 21-27, 1967.
[53] C. M. van der Walt, and E. Barnard, "Data characteristics that
determine classifier performance", in Proc. Sixteenth Annual
Symposium of the Pattern Recognition, Association of South Africa,
pp.160-165, 2006.
[54] R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification. Wiley
Interscience, 2nd ed., 2000, ch. 4.
[55] P. J. Rousseeuw, "Silhouettes: a graphical aid to the interpretation and
validation of cluster analysis," J. Comput. Appl. Math., vol. 20, pp. 53-
65, 1987.
[56] L. Kaufman, and P. J. Rousseeuw, Finding Groups in Data: An
Introduction to Cluster Analysis. Wiley Interscience, 1990.
[57] V. E. McGee, and W. T. Carleton, "Piecewise regression," J. Am. Stat.
Assoc., vol. 65, pp. 1109-1124, 1970.
[58] M. N. Karim, D. Hodge, and L. Simon, "Data-based modeling and
analysis of bioprocesses. Some real experiences," Biotechnol. Prog.,
vol. 19, pp. 1591-1605, 2003.
[59] W. S. Cleveland, E. H. Grosse, and W. M. Shyu, Local regression
models. London: Chapman and Hall, J. M. Chambers, and T. J. Hastie,
Eds., 1992, pp. 309-376.
[60] Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, "Multidimensional
regression analysis of time-series data streams," Proc. 28th
Int. Conf. Very Large Data Bases, Hongkong, China, pp. 323-334,
2002.
[61] Akhbardeh, A., Nikhil, Koskinen, P.E., Yli-Harja, O., Towards the
Experimental Evaluation of Novel Supervised Fuzzy Adaptive
Resonance Theory for Pattern Classification, Pattern Recognition
Letters (2007), doi: 10.1016/j.patrec.2007.10.017
[62] Ramkrishna D, Amundson NR (2004) Mathematics in chemical
engineering: a 50 year introspection. AIChE J 50:7-23
[63] G. Endo, T. Noike and J. Matsumoto, "Characteristics of cellulose and
glucose decomposition in acidogenic phase of anaerobic digestion,"
Proc. Soc. Civ. Engrs., vol. 325, pp. 61-68, 1982. (In Japanese).
[64] H. Q. Yu, Z. H. Hu, T. Q. Hong and G. W. Gu, "Performance of an
anaerobic filter treating soybean processing wastewater with and
without effluent recycle," Process Biochem., vol. 38, pp. 507-513,
2002.
[65] N. Kataoka, A. Miya, and K. Kiriyama, "Studies on hydrogen
production by continuous culture system of hydrogen-producing
anaerobic bacteria," Water Sci. Technol., vol. 36, no. 6-7, pp. 41-47,
1997.
[66] C. C. Chen, and C.-Y. Lin, "Using sucrose as a substrate in an
anaerobic hydrogen producing reactor," Adv. Environ. Res., vol. 7, pp.
695-699, 2003.
[67] C.-Y. Lin, and C. H. Lay, "Carbon/nitrogen-ratio effect on fermentative
hydrogen production by mixed microflora," Int. J. Hydrogen Energy,
vol. 29, no. 1, pp. 41-45, 2004.
[68] C.-Y. Lin, and C. H. Lay, "Effects of carbonate and phosphate
concentrations on hydrogen production using anaerobic sewage
microflora," Int. J. Hydrogen Energy, vol. 29, no. 3, pp. 275-81, 2004.
[69] M. Dubois, K. A. Giles, J. K. Hamilton, P. A. Rebers, and F. Smith,
"Colorimetric method for determination of sugars and related
substances," Anal. Chem., vol. 28, pp. 350-356, 1956.
[70] APHA. 1995. Standard methods. 19th Edition. American Public Health
Association, Washington, DC.
[71] Koskinen PEP, Kaksonen AH and Puhakka JA (2007) The relationship
between instability of H2 production and compositions of bacterial
communities within a dark fermentation fluidized-bed bioreactor.
Biotechnol Bioeng 97(4):742-758
[72] Hawkes, F.R.; Hussy, I.; Kyazze, G.; Dinsdale, R.; Hawkes, D. L.
Continuous dark fermentative hydrogen production by mesophilic
microflora: Principles and progress. International Journal of Hydrogen
Energy 2007, 32, 172 - 184.