Oncogene Identification using Filter based Approaches between Various Cancer Types in Lung

Lung cancer accounts for the most cancer related deaths for men as well as for women. The identification of cancer associated genes and the related pathways are essential to provide an important possibility in the prevention of many types of cancer. In this work two filter approaches, namely the information gain and the biomarker identifier (BMI) are used for the identification of different types of small-cell and non-small-cell lung cancer. A new method to determine the BMI thresholds is proposed to prioritize genes (i.e., primary, secondary and tertiary) using a k-means clustering approach. Sets of key genes were identified that can be found in several pathways. It turned out that the modified BMI is well suited for microarray data and therefore BMI is proposed as a powerful tool for the search for new and so far undiscovered genes related to cancer.





References:
[1] A. Jemal, R. Siegel, E. Ward, Y. Hao, J. Xu, T. Murray and M.J. Thun,
"Cancer Statistics", CA Cancer J Clin, vol 58, pp. 71-96, 2008.
[2] R.S. Herbst, J.V. Heymach, S.M. Lippman, "Lung cancer." , N Engl J Med., vol. 360, pp. 87-8, 2009.
[3] I.G. Campbell, S.E. Russell, D.Y. Choong, K.G. Montgomery, M.L.
Ciavarella, C.S. Hooi, B.E. Cristiano, P.B. Pearson, W.A. Phillips, "Mutation of the pik3ca gene in ovarian and breast cancer", Cancer
Res., vol. 64, pp. 7678-7681, 2004.
[4] R. Hewett and P. Kijsanayothin, "Tumor classification ranking from
microarray data", BMC Genomics, vol. 9, 2008.
[5] C. Baumgartner and A. Graber, "Data mining and knowledge discovery
in metabolomics," In Masseglia F, Poncelet P, Teisseire M (eds.)
Successes and new directions in data mining. Idea Group Inc, 2007, pp.
141-166.
[6] M. Netzer, G. Millonig, M. Osl, B. Pfeifer, S. Praun, J. Villinger, W. Vogel and C. Baumgartner, "A new ensemble-based algorithm for
identifying breath gas marker candidates in liver disease using ion molecule reaction mass spectrometry (IMR-MS)", Bioinformatics, vol.
25, pp. 941-947, 2009.
[7] S. Geman, E. Bienenstock and R. Doursat, "Neural networks and the
bias/variance dilemma.", Neural Computation, vol. 4, pp. 1-58, 1992.
[8] P. Putten and M. Someren, "A bias-variance analysis of a real world
learning problem: the coil challenge 2000." Machine Learning, vol. 57,
pp. 177-195, 2004.
[9] I.H. Witten and E. Frank, Data Mining: Practical Machine Learning
Tools and Techniques, Second Edition. Morgan Kaufmann Publishers
Inc., San Francisco, CA, USA, 2005.
[10] R.J. Quinlan, C4.5: Programs for Machine Learning. San Francisco:
Morgan Kaufmann, 1993.
[11] C. Baumgartner and D. Baumgartner, "Biomarker discovery, disease
classification, and similarity query processing on high-throughput ms/ms
data of inborn errors of metabolism." J Biomol Screen, vol. 11, pp. 90-99, 2006.
[12] NCI,
https://array.nci.nih.gov/caarray/project/details.action?project.experime
nt.publicIdentifier=woost-00041#; last visited on April 9th, 2009.
[13] M. Osl, S. Dreiseitl, B. Pfeifer, K. Weinberger, H. Klocker, G. Bartsch,
G. Schäfer, B. Tilg, A. Graber, and C. Baumgartner, "A new rule-based
data mining algorithm for identifying metabolic markers in prostate
cancer using tandem mass spectrometry." Bioinformatics, vol. 24, pp.
2908-2914, 2008.
[14] J.D. Nelson, "Finding useful questions: on Bayesian diagnosticity,
probability, impact, and information gain." Psychol Rev., pp. 979-99,
2005.
[15] J.B. MacQueen (1967): "Some Methods for classification and Analysis
of Multivariate Observations, Proceedings of 5-th Berkeley Symposium
on Mathematical Statistics and Probability", Berkeley, University of
California Press, 1:281-297
[16] J.A. Hartigan and M.A. Wong, "A k-means clustering algorithm." JR
Stat. Soc. Ser. C-Appl. Stat, 28:100-108, 1979.
[17] R. Barriot, J. Poix., A. Groppi, A. Barre., N. Goffard., D. Sherman., I.
Dutour and A. de Daruvar, "New strategy for the representation and the
integration of biomolecular knowledge at a cellular scale." Nucleic Acids
Res., vol. 32, pp. 3581-3589, 2004.