An Efficient Approach to Mining Frequent Itemsets on Data Streams

The increasing importance of data stream arising in a wide range of advanced applications has led to the extensive study of mining frequent patterns. Mining data streams poses many new challenges amongst which are the one-scan nature, the unbounded memory requirement and the high arrival rate of data streams. In this paper, we propose a new approach for mining itemsets on data stream. Our approach SFIDS has been developed based on FIDS algorithm. The main attempts were to keep some advantages of the previous approach and resolve some of its drawbacks, and consequently to improve run time and memory consumption. Our approach has the following advantages: using a data structure similar to lattice for keeping frequent itemsets, separating regions from each other with deleting common nodes that results in a decrease in search space, memory consumption and run time; and Finally, considering CPU constraint, with increasing arrival rate of data that result in overloading system, SFIDS automatically detect this situation and discard some of unprocessing data. We guarantee that error of results is bounded to user pre-specified threshold, based on a probability technique. Final results show that SFIDS algorithm could attain about 50% run time improvement than FIDS approach.




References:
[1] C.Raissi, P. Poncelet, Teisseire, "Towards a new approach for mining
frequent itemsets on data stream", J Intell Inf Syst , 2007, vol. 28, pp.
23-36, 2007.
[2] J,.Xu Yu , Z.Chong , H. Lu, Z. Zhang , A. Zhou b," A false negative
approach to mining frequent itemsets from high speed transactional data
streams", Information Sciences 176, 1986-2015,2006.
[3] G. Giannella, J. Han, J. Pei, X. Yan, P.Yu," Mining frequent patterns in
data streams at multiple time granularities", In Next generation data
mining. New York: MIT, 2003.
[4] H. Li, F. Lee, S.Y., M. Shan," An efficient algorithm for mining frequent
itemsets over the entire history of data streams".,In Proceedings of the
1st InternationalWorkshop on Knowledge Discovery in Data
streams,2004.
[5] R. Jin, G. Agrawal," An Algorithm for In-Core Frequent Itemset Mining
on Streaming Data"
[6] P. Domingos , G. Hulten," Mining high-speed data streams",In
Proceedings of the ACM Conference on Knowledge and Data
Discovery (SIGKDD), 2000.
[7] A. Cheng, Y. Ke,W. Ng, "A survey on algorithms for mining frequent
itemsets over data streams", Knowl Inf Syst, 2007.
[8] M. Charikar, K. Chen, M. Farach," Finding frequent items in data
streams". Theor Comput Sci, vol. 312, pp.3-15, 2004.
[9] J.Cheng,Y. Ke,W. Ng," Maintaining frequent itemsets over high-speed
data streams", In Proceedings of the 10th Pacific-asia Conference on
knowledge discovery and data mining, Singapore, pp. 462-467, April
2006.
[10] R.Agrawal,T. Imielinski,A. Swami , "Mining association rules between
sets of items in large databases", In Proceedings of the ACM SIGMOD
international conference on management of data, Washington DC, pp
207-216,1993.
[11] H. Chernoff, A measure of asymptotic efficiency for tests of a
hypothesis based on the sum of observations, The Annals of
Mathematical Statistics 23 (4) 493-507, 1953.
[12] M. Charikar, K. Chen, M. Farach," Finding frequent items in data
streams", in Proceedings of the International Colloquium on Automata,
Languages and Programming (ICALP), pp. 693-703,2002.
[13] T. Calders , N. Dexters , B. Goethals ,"Mining Frequent Items in a
Stream Using Flexible Windows"
[14] X. Sun Maria, E. Orlowska , X. Li, "Finding Frequent Itemsets in High-
Speed Data Streams".
[15] X. Han Dong, W. Ng, K. Wong,V. Lee, "Discovering Frequent Sets
from Data Streams with CPU Constraint", This paper appeared at the
AusDM 2007, Gold Coast, Australia. Conferences in Research and
Practice in Information Technology (CRPIT), Vol.70
[16] H. Chernoff, "A measure of asymptotic efficiency for thests of a
hypothesis based on the sum of observations", in Annals of
Mathematical Statistics, pp. 493, 1952.
[17] R. Agrawal, R. Srikant, " Fast algorithms for mining association rules",
In Proc. Int. Conf. Very Large Data Bases (VLDB'94), 487.499, 1994
[18] J. Han, J. Pei, Y. Yin," Mining frequent patterns without candidate
generation", In Proc. 2000 ACM-SIGMOD Int. Conf. Management of
Data (SIGMOD'00), 1.12, 2000.
[19] J. Pei, J. Han, R. Mao," CLOSET: An efficient algorithm for mining
frequent closed itemsets", In Proc. ACM-SIGMOD Int.Workshop Data
Mining and Knowledge Discovery (DMKD'00), 11.20,2000.
[20] M. Zaki, C. Hsiao," CHARM: An efficient algorithm for closed itemset
mining", In Proc. SIAM Int. Conf. Data Mining, 457.473, 2002.
[21] R. Agrawal, R. Srikant," Mining sequential patterns", In Proc. Int. Conf.
Data Engineering (ICDE'95), 3.14, 1995.
[22] M. Kuramochi, G. Karypis, "Frequent subgraph discovery", In Proc.Int.
Conf. Data Mining (ICDM'01), 313.320, 2001.
[23] K. Beyer, R. Ramakrishnan," Bottom-up computation of sparse and
iceberg cubes", In Proc. ACM-SIGMOD Int. Conf.Management of Data
(SIGMOD'99), 359.370,1999.
[24] T. Imielinski, L. Khachiyan,A. Abdulghani,"Cubegrades: Generalizing
association rules", Data Mining and knowledge Discovery
6:219.258,2002.
[25] B. Liu, W. Hsu, Y. Ma," Integrating classification and association rule
mining", In Proc. Int. Conf. Knowledge Discovery and Data Mining
(KDD'98), 80.86,1998.
[26] H. Wang, J. Yang, W. Wang, P. Yu," Clustering by pattern similarity in
large data sets", In Proc. ACM-SIGMOD Int. Conf. on Management of
Data (SIGMOD'02), 418.427,2002.
[27] R. Karp, C. Papadimitriou, S. Shenker, "A simple algorithm for finding
frequent elements in streams and bags", ACM Trans. Database Systems
2003.
[28] Y. Chi, H. Wang, P. Yu, R. Muntz," Moment: Maintaining closed
frequent itemsets over a stream sliding window", In Proceedings of
International Conference on Data Missing Conference, pp. 59-66, 2004.
[29] C.Jin, W. Qian, C. Sha, J.Yu, A. Zhou," Dynamically maintaining
frequent items over a data stream", In Proceedings of International
Conference on Information and Knowledge Management Conference,
pp. 287-29, Washington, District of Columbia, 2004.
[30] H. Li, S. Lee, M. Shan, "An efficient algorithm for mining frequent
itemsets over the entire history of data streams", In Proceedings of the
1st International Workshop on Knowledge Discovery in Data streams,
2003.
[31] G. Manku,R. Motwani, " Approximate frequency counts over data
streams. In Proceedings of very Large Databases Conference, pp. 346-
357, Hong Kong, China, 2002.
[32] Y. Chen, G. Dong, J. Han, B. Wah, J. Wang, ," Multidimensional
regression analysis of time-series data streams" In VLDB
Conference,2002.
[33] W. Teng, M. Chen, P. Yu, "A regression-based temporal patterns mining
schema for data streams", In Proceedings of very large Databases
Conference, pp. 93-104, Berlin, 2003.
[34] X. Hong , W. Keong , K. Leong Ong ,v. C S Lee , "Discovering
Frequent Sets from Data Streams with CPU Constraint", Sixth
Australasian Data Mining Conference, Australia. Conferences in
Research and Practice in Information Technology (CRPIT), Vol. 70,
2007.