Studies of Rule Induction by STRIM from the Decision Table with Contaminated Attribute Values from Missing Data and Noise — In the Case of Critical Dataset Size —

STRIM (Statistical Test Rule Induction Method) has been proposed as a method to effectively induct if-then rules from the decision table which is considered as a sample set obtained from the population of interest. Its usefulness has been confirmed by simulation experiments specifying rules in advance, and by comparison with conventional methods. However, scope for future development remains before STRIM can be applied to the analysis of real-world data sets. The first requirement is to determine the size of the dataset needed for inducting true rules, since finding statistically significant rules is the core of the method. The second is to examine the capacity of rule induction from datasets with contaminated attribute values created by missing data and noise, since real-world datasets usually contain such contaminated data. This paper examines the first problem theoretically, in connection with the rule length. The second problem is then examined in a simulation experiment, utilizing the critical size of dataset derived from the first step. The experimental results show that STRIM is highly robust in the analysis of datasets with contaminated attribute values, and hence is applicable to real-world data





References:
[1] Z. Pawlak: Rough sets, Internat. J. Inform. Comput. Sci., Vol. 11, No.
5, pp. 341-356 (1982).
[2] A. Skowron and C. M. Rauser: The discernibility matrix and functions in
information systems, In: R. Sl´owin´ski(ed), Intelligent Decision Support,
Handbook of Application and Advances of Rough Set Theory, Kluwer
Academic Publishers, pp. 331-362 (1992).
[3] Y. G. Bao, X. Y. Du, M. G. Deng and N. Ishii: An efficient method for
computing all reducts, Transactions of the Japanese Society for Artificial
Intelligence, Vol. 19, No. 3, pp. 166-173 (2004).
[4] J. W. Grzymala-Busse: LERS — A system for learning from examples
based on rough sets. In: Intelligent Decision Support. Handbook of
Applications and Advances of the Rough Sets Theory, ed. By R.
Sl´owin´ski, Kluwer Academic Publishers, pp. 3-18 (1992).
[5] W. Ziarko: Variable precision rough set model, Journal of Computer and
System Science, Vol. 46, pp. 39-59 (1993).
[6] N. Shan and W. Ziarko: Data-based acquisition and incremental
modification of classification rules, Computational Intelligence, Vol. 11,
No. 2, pp. 357-370 (1995).
[7] T. Nishimura, Y. Kato and T. Saeki: Studies on an effective algorithm
to reduce the decision matrix, RSFDGrC 2011, LNAI Vol. 8743, pp.
240-243, (2011).
[8] T. Matsubayashi, Y. Kato and T. Saeki: A new rule induction method
from a decision table using a statistical test, In: T. Li et al. (Eds.): RSKT
2012, LNAI 7414, pp. 81-90, Springer, Heidelberg (2012).
[9] J. W. Grzymala-Busse and W. J. Grzymala-Busse: Handing missing
attribute values, O. Maimon, L. Rokach (eds.), Data Mining and
Knowledge Discovery Handbook, 2nd ed., Springer, pp. 33-49 (2010).
[10] J. W. Grzymala-Busse: MLEM2; A new algorithm for rule induction
from imperfect data, Proceedings of 9th International Conference on
Information Processing and Management of Uncertainty in Knowledge
— Based Systems, pp. 243-250 (2002).
[11] R. E. Walpole, R. H. Myers, S. L. Myers, K. Ye: Probability and
Statistics for Engineers and Scientists, Eighth edition, Pearson Prentice
Hall, pp.187-191 (2007).