Performance Optimization of Data Mining Application Using Radial Basis Function Classifier

Text data mining is a process of exploratory data analysis. Classification maps data into predefined groups or classes. It is often referred to as supervised learning because the classes are determined before examining the data. This paper describes proposed radial basis function Classifier that performs comparative crossvalidation for existing radial basis function Classifier. The feasibility and the benefits of the proposed approach are demonstrated by means of data mining problem: direct Marketing. Direct marketing has become an important application field of data mining. Comparative Cross-validation involves estimation of accuracy by either stratified k-fold cross-validation or equivalent repeated random subsampling. While the proposed method may have high bias; its performance (accuracy estimation in our case) may be poor due to high variance. Thus the accuracy with proposed radial basis function Classifier was less than with the existing radial basis function Classifier. However there is smaller the improvement in runtime and larger improvement in precision and recall. In the proposed method Classification accuracy and prediction accuracy are determined where the prediction accuracy is comparatively high.




References:
[1] Oliver Buchtala, Manual Klimek and Bernhard Sick, Member, IEEE
" Evolutionary Optimization of Radial Basis Function Classifier for Data
Mining Applications", IEEE Transactions on
systems,man,andcybernets,vol.35,No.5, October,2005
[2] Blake, C., & Merz, C. (1998). UCI repository of machine learning
databases. http://www.ics.uci.edu/˜mlearn/MLRepository.html.
[3] C. L. Bauer. A direct mail customer purchase model. Journal of Direct
Marketing, 2:16-24, 1988.
[4] Dietterich, T. (1998). Approximate statistical tests for comparing
supervised classification learning algorithms.Neural Computation, 10,
1895-1923.
[5] Friedman, J., Bentley, J., &Finkel, R. (1977). An algorithm for finding
best matches in logarithmic expected time. ACM Transactions on
Mathematical Software, 3, 209-226.
[6] Jiawei Han, Micheline Kamber " Data Mining - Concepts and
Techniques" Elsevier, 2003, pages 359 to 365.
[7] N. Jovanovic, V. Milutinovic, and Z. Obradovic, Member, IEEE,
"Foundations of Predictive Data Mining" (2002)
[8] J. M. Sousa, U. Kaymak, and S. Madeira. A comparative study of fuzzy
target selection methods in direct marketing. In Proceedings of the 11th
IEEE International Conference on Fuzzy Systems, Hawaii, USA, May
2002.
[9] Kohavi, R. (1995). A study of cross-validation and bootstrap for
accuracy estimation and model selection. Proceedings of International
Joint Conference on Artificial Intelligence (pp. 1137-1143).
[10] Margaret H.Dunham, "Data Mining- Introductory and Advanced
Topics" Pearson Education, 2003, page 112.
[11] Mitchell, T. (1997). Machine learning. New York: McGraw-Hill.
[12] Naohiro lshiil, Eisuke suchiya, Yongguangao and Nobuhiko yamaguchi,
"Combining Classification Improvements by Ensemble Processing"
Proceedings of the 2005 Third ACIS Int'l Conference on Software
Engineering Research, Management and Applications (SERA-05) 0-
7695-2297-1/05 $20.00 ┬® 2005 IEEE
[13] Ross, S. (1988). A first course in probability. New York: Macmillan.
[14] Sara Madeira Joao M.Sousa, "Comparison of target selection methods
in direct Marketing" Technical University of Lisbon, Institution Superior
T-echicio, Dept.Mechanical Eng./IDMEC, 1049-001 Lisbon, Portugal
(2002).
[15] Vapnik, V. (1998). Statistical learning theory. New York: Wiley.