Data Preprocessing for Supervised Leaning

Many factors affect the success of Machine Learning (ML) on a given task. The representation and quality of the instance data is first and foremost. If there is much irrelevant and redundant information present or noisy and unreliable data, then knowledge discovery during the training phase is more difficult. It is well known that data preparation and filtering steps take considerable amount of processing time in ML problems. Data pre-processing includes data cleaning, normalization, transformation, feature extraction and selection, etc. The product of data pre-processing is the final training set. It would be nice if a single sequence of data pre-processing algorithms had the best performance for each data set but this is not happened. Thus, we present the most well know algorithms for each step of data pre-processing so that one achieves the best performance for their data set.




References:
[1] Bauer, K.W., Alsing, S.G., Greene, K.A., 2000. Feature screening using
signal-to-noise ratios. Neurocomputing 31, 29-44.
[2] M. Boulle. Khiops: A Statistical Discretization Method of Continuous
Attributes. Machine Learning 55:1 (2004) 53-69
[3] Breunig M. M., Kriegel H.-P., Ng R. T., Sander J.: ÔÇÿLOF: Identifying
Density-Based Local Outliers-, Proc. ACM SIGMOD Int. Conf. On
Management of Data (SIGMOD 2000), Dallas, TX, 2000, pp. 93-104.
[4] Brodley, C.E. and Friedl, M.A. (1999) "Identifying Mislabeled Training
Data", AIR, Volume 11, pages 131-167.
[5] Bruha and F. Franek: Comparison of various routines for unknown
attribute value processing: covering paradigm. International Journal of
Pattern Recognition and Artificial Intelligence, 10, 8 (1996), 939-955
[6] J.R. Cano, F. Herrera, M. Lozano. Strategies for Scaling Up
Evolutionary Instance Reduction Algorithms for Data Mining. In: L.C.
Jain, A. Ghosh (Eds.) Evolutionary Computation in Data Mining,
Springer, 2005, 21-39
[7] C. Cardie. Using decision trees to improve cased-based learning. In
Proceedings of the First International Conference on Knowledge
Discovery and Data Mining. AAAI Press, 1995.
[8] M. Dash, H. Liu, Feature Selection for Classification, Intelligent Data
Analysis 1 (1997) 131-156.
[9] S. Das. Filters, wrappers and a boosting-based hybrid for feature
selection. Proc. of the 8th International Conference on Machine
Learning, 2001.
[10] T. Elomaa, J. Rousu. Efficient multisplitting revisited: Optimapreserving
elimination of partition candidates. Data Mining and
Knowledge Discovery 8:2 (2004) 97-126
[11] Fayyad U., and Irani K. (1993). Multi-interval discretization of
continuous-valued attributes for classification learning. In Proc. of the
Thirteenth Int. Joint Conference on Artificial Intelligence, 1022-1027.
[12] Friedman, J.H. 1997. Data mining and statistics: What-s the connection?
Proceedings of the 29th Symposium on the Interface Between Computer
Science and Statistics.
[13] Marek Grochowski, Norbert Jankowski: Comparison of Instance
Selection Algorithms II. Results and Comments. ICAISC 2004a: 580-
585.
[14] Jerzy W. Grzymala-Busse and Ming Hu, A Comparison of Several
Approaches to Missing Attribute Values in Data Mining, LNAI 2005,
pp. 378−385, 2001.
[15] Isabelle Guyon, André Elisseeff; An Introduction to Variable and
Feature Selection, JMLR Special Issue on Variable and Feature
Selection, 3(Mar):1157--1182, 2003.
[16] Hernandez, M.A.; Stolfo, S.J.: Real-World Data is Dirty: Data Cleansing
and the Merge/Purge Problem. Data Mining and Knowledge Discovery
2(1):9-37, 1998.
[17] Hall, M. (2000). Correlation-based feature selection for discrete and
numeric class machine learning. Proceedings of the Seventeenth
International Conference on Machine Learning (pp. 359-366).
[18] K. M. Ho, and P. D. Scott. Reducing Decision Tree Fragmentation
Through Attribute Value Grouping: A Comparative Study, in Intelligent
Data Analysis Journal, 4(1), pp.1-20, 2000.
[19] Hu, Y.-J., & Kibler, D. (1996). Generation of attributes for learning
algorithms. Proc. 13th International Conference on Machine Learning.
[20] J. Hua, Z. Xiong, J. Lowey, E. Suh, E.R. Dougherty. Optimal number of
features as a function of sample size for various classification rules.
Bioinformatics 21 (2005) 1509-1515
[21] Norbert Jankowski, Marek Grochowski: Comparison of Instances
Selection Algorithms I. Algorithms Survey. ICAISC 2004b: 598-603.
[22] Knorr E. M., Ng R. T.: ÔÇÿA Unified Notion of Outliers: Properties and
Computation-, Proc. 4th Int. Conf. on Knowledge Discovery and Data
Mining (KDD-97), Newport Beach, CA, 1997, pp. 219-222.
[23] R. Kohavi and M. Sahami. Error-based and entropy-based discretisation
of continuous features. In Proceedings of the Second International
Conference on Knowledge Discovery and Data Mining. AAAI Press,
1996.
[24] Kononenko, I., Simec, E., and Robnik-Sikonja, M.(1997).Overcoming
the myopia of inductive learning algorithms with RELIEFF. Applied
Intelligence, 7: 39-55.
[25] S. B. Kotsiantis, P. E. Pintelas (2004), Hybrid Feature Selection instead
of Ensembles of Classifiers in Medical Decision Support, Proceedings of
Information Processing and Management of Uncertainty in Knowledge-
Based Systems, July 4-9, Perugia - Italy, pp. 269-276.
[26] Kubat, M. and Matwin, S., 'Addressing the Curse of Imbalanced Data
Sets: One Sided Sampling', in the Proceedings of the Fourteenth
International Conference on Machine Learning, pp. 179-186, 1997.
[27] Lakshminarayan K., S. Harp & T. Samad, Imputation of Missing Data in
Industrial Databases, Applied Intelligence 11, 259-275 (1999).
[28] Langley, P., Selection of relevant features in machine learning. In:
Proceedings of the AAAI Fall Symposium on Relevance, 1-5, 1994.
[29] P. Langley and S. Sage. Induction of selective Bayesian classifiers. In
Proc. of 10th Conference on Uncertainty in Artificial Intelligence,
Seattle, 1994.
[30] Ling, C. and Li, C., 'Data Mining for Direct Marketing: Problems and
Solutions', Proceedings of KDD-98.
[31] Liu, H. and Setiono, R., A probabilistic approach to feature selectionÔÇöa
filter solution. Proc. of International Conference on ML, 319-327, 1996.
[32] H. Liu and R. Setiono. Some Issues on scalable feature selection. Expert
Systems and Applications, 15 (1998) 333-339. Pergamon.
[33] Liu, H. and H. Metoda (Eds), Instance Selection and Constructive Data
Mining, Kluwer, Boston, MA, 2001
[34] H. Liu, F. Hussain, C. Lim, M. Dash. Discretization: An Enabling
Technique. Data Mining and Knowledge Discovery 6:4 (2002) 393-423.
[35] Maas W. (1994). Efficient agnostic PAC-learning with simple
hypotheses. Proc. of the 7th ACM Conf. on Computational Learning
Theory, 67-75.
[36] Markovitch S. & Rosenstein D. (2002), Feature Generation Using
General Constructor Functions, Machine Learning, 49, 59-98, 2002.
[37] Oates, T. and Jensen, D. 1997. The effects of training set size on
decision tree complexity. In ML: Proc. of the 14th Intern. Conf., pp.
254-262.
[38] Pfahringer B. (1995). Compression-based discretization of continuous
attributes. Proc. of the 12th International Conference on Machine
Learning.
[39] S. Piramuthu. Evaluating feature selection methods for learning in data
mining applications. European Journal of Operational Research 156:2
(2004) 483-494
[40] Pyle, D., 1999. Data Preparation for Data Mining. Morgan Kaufmann
Publishers, Los Altos, CA.
[41] Quinlan J.R. (1993), C4.5: Programs for Machine Learning, Morgan
Kaufmann, Los Altos, California.
[42] Reinartz T., A Unifying View on Instance Selection, Data Mining and
Knowledge Discovery, 6, 191-210, 2002, Kluwer Academic Publishers.
[43] Rocke, D. M. and Woodruff, D. L. (1996) "Identification of Outliers in
Multivariate Data," Journal of the American Statistical Association, 91,
1047-1061.
[44] Setiono, R., Liu, H., 1997. Neural-network feature selector. IEEE Trans.
Neural Networks 8 (3), 654-662.
[45] M. Singh and G. M. Provan. Efficient learning of selective Bayesian
network classifiers. In Machine Learning: Proceedings of the Thirteenth
International Conference on Machine Learning. Morgan Kaufmann,
1996.
[46] Somol, P., Pudil, P., Novovicova, J., Paclik, P., 1999. Adaptive floating
search methods in feature selection. Pattern Recognition Lett. 20
(11/13), 1157-1163.
[47] P. Somol, P. Pudil. Feature Selection Toolbox. Pattern Recognition 35
(2002) 2749-2759.
[48] C. M. Teng. Correcting noisy data. In Proc. 16th International Conf. on
Machine Learning, pages 239-248. San Francisco, 1999.
[49] Yang J, Honavar V. Feature subset selection using a genetic algorithm.
IEEE Int Systems and their Applications 1998; 13(2): 44-49.
[50] Yu and Liu (2003), Proceedings of the Twentieth International
Conference on Machine Learning (ICML-2003), Washington DC.
[51] Zheng (2000), Constructing X-of-N Attributes for Decision Tree
Learning, Machine Learning, 40, 35-75, 2000, Kluwer Academic
Publishers.