A General Framework for Knowledge Discovery Using High Performance Machine Learning Algorithms

The aim of this paper is to propose a general framework for storing, analyzing, and extracting knowledge from two-dimensional echocardiographic images, color Doppler images, non-medical images, and general data sets. A number of high performance data mining algorithms have been used to carry out this task. Our framework encompasses four layers namely physical storage, object identification, knowledge discovery, user level. Techniques such as active contour model to identify the cardiac chambers, pixel classification to segment the color Doppler echo image, universal model for image retrieval, Bayesian method for classification, parallel algorithms for image segmentation, etc., were employed. Using the feature vector database that have been efficiently constructed, one can perform various data mining tasks like clustering, classification, etc. with efficient algorithms along with image mining given a query image. All these facilities are included in the framework that is supported by state-of-the-art user interface (UI). The algorithms were tested with actual patient data and Coral image database and the results show that their performance is better than the results reported already.




References:
[1] Pedro Domingos and Geoff Hulten, “A general method for scaling up
machine learning algorithms and its application to clustering” in ICML
’01: Proceedings of the Eighteenth International Conference on Machine
Learning, San Francisco, CA, USA, 2001, pp. 106–113.
[2] Yu Zhang, et al., “A Fast Online Learning Algorithm for Distributed
Mining of BigData”, ACM SIGMETRICS Performance Evaluation
Review 41 (4), 2014, pp. 90-93
[3] Domenico Talia, Paolo Trunfioy, Oreste Verta, Pedro Domingos Geo,
“The Weka4WS framework for distributed data mining in serviceoriented
Grids”, 2008.
[4] http://www. elki.dbs.ifi.lmu.de, “Data mining software framework”,
2015.
[5] Piotr Kraj, Ashok Sharma, Nikhil Garge, Robert Podolsky:
“ParaKMeans: Implementation of a parallelized k-Means algorithm
suitable for general laboratory”, BMC Bioinformatics 2008,
doi:10.1186/1471-2105-9-200, pp. 1-13.
[6] Wooyoung Kim: Parallel Clustering Algorithms: Survey
http://www.cs.gsu.edu/~wkim/index_files/SurveyParallelClustering.pdf,
2009.
[7] S. Nandagopalan, T.S.B Sudarshan, N. Deepak N. Pradeep, “Intelligent
Echocardiographic Video Analyzer Using Parallel Algorithms”, in
Recent Advances in Information and Communication Technology,
Advances in Intelligent Systems and Computing, vol. 265, 2014,
Springer International Publishing, 2014, pp 157-166.
[8] S. Nandagopalan, B. S. Adiga, TSB Sudarshan, C. Dhanalakshmi,
“Multifeature Based Retrieval of 2D and Color Doppler
Echocardiographic Images for Clinical Decision Support”, in
MySec2011, Proc. of The Fifth Malaysian Software Engineering
Conference – indexed by IEEE and SCOPUS, December 12-14, 2011
Johor Bahru, Malaysia, IEEXPlore , pp. 319-324.
[9] Reynaldo J. Gil-García1, José M. Badía-Contelles2 and Aurora Pons-
Porrata1 “A General Framework for Agglomerative Hierarchical Clustering Algorithms,” in IEEE, The 18th International Conference on
Pattern Recognition (ICPR'06), 2006, pp. 569 - 572.
[10] S. Nandagopalan, Dr. B. S. Adiga, N. Deepak, “A Universal Model for
Content-Based Image Retrieval”, International Journal of Computer
Science, Vol. 4, No. 4, Dec 2009, pp. 242–245.
[11] S. Nandagopalan, Dr. B. S. Adiga, Dr. TSB Sudarshan, C.
Dhanalakshmi, Dr. C. N. Manjunath, “A Naïve-Bayesian Methodology
to Classify Echocardiographic Images through SQL”, Springer-Verlag,
LNCS/LNAI, Berlin Heidelberg, vol. 6746, 2011, pp. 155-165.
[12] Saso D’zeroski, “Towards a General Framework for Data Mining”, in
KDID, Springer-Verlag Berlin Heidelberg 2007, pp. 259–300.