Frequent Itemset Mining Using Rough-Sets

Frequent pattern mining is the process of finding a
pattern (a set of items, subsequences, substructures, etc.) that occurs
frequently in a data set. It was proposed in the context of frequent
itemsets and association rule mining. Frequent pattern mining is used
to find inherent regularities in data. What products were often
purchased together? Its applications include basket data analysis,
cross-marketing, catalog design, sale campaign analysis, Web log
(click stream) analysis, and DNA sequence analysis. However, one of
the bottlenecks of frequent itemset mining is that as the data increase
the amount of time and resources required to mining the data
increases at an exponential rate. In this investigation a new algorithm
is proposed which can be uses as a pre-processor for frequent itemset
mining. FASTER (FeAture SelecTion using Entropy and Rough sets)
is a hybrid pre-processor algorithm which utilizes entropy and roughsets
to carry out record reduction and feature (attribute) selection
respectively. FASTER for frequent itemset mining can produce a
speed up of 3.1 times when compared to original algorithm while
maintaining an accuracy of 71%.





References:
[1] R. Agrawal, T. Imielinski, Mining Association Rules between Sets of
Items in Large Databases. SIGMOD 1993, pp. 207-216.
[2] S. Chai, J. Yang, Y. Cheng, The Research of Improved Apriori
Algorithm for Mining Association Rules, International Conference on
Service Systems and Service Management, 2007, pp. 1-4.
[3] J. Liang, Y. Qian, Information granules and entropy theory in
information systems, Science in China Series F: Information Sciences,
Vol. 51, 2008, pp. 1427-1444.
[4] Pawlak. Rough Sets: Theoretical Aspects of Reasoning About Data.
Dordrecht: Kluwer Academic. 1991.
[5] Li-Juan, L. Zhou-Jun, A novel rough set approach for classification,
IEEE International Conference on Granular Computing, 2006, pp. 349-
352.
[6] C. Hung, H. Purnawan, B,Kuo, Multispectral image classification using
rough set theory and the comparison with parallelepiped classifier,
Geoscience and Remote Sensing Symposium, 2007. IGARSS 2007.
IEEE International, pp. 2052-2055.
[7] R. Jensen and Q. Shen. Fuzzy-rough data reduction with ant colony
optimization. Fuzzy Sets Systems, vol. 149, Issue No. 1, 2005, pp. 5–20.
[8] Zengyou H, Xiaofei Xu, An Optimization Model for Outlier Detection
in Categorical Data, Lecture Notes in Computer Science, Volume 3644,
2005, pp. 400-409.
[9] UCL Machine Learning Group.