Application and Limitation of Parallel Modelingin Multidimensional Sequential Pattern

The goal of data mining algorithms is to discover useful information embedded in large databases. One of the most important data mining problems is discovery of frequently occurring patterns in sequential data. In a multidimensional sequence each event depends on more than one dimension. The search space is quite large and the serial algorithms are not scalable for very large datasets. To address this, it is necessary to study scalable parallel implementations of sequence mining algorithms. In this paper, we present a model for multidimensional sequence and describe a parallel algorithm based on data parallelism. Simulation experiments show good load balancing and scalable and acceptable speedup over different processors and problem sizes and demonstrate that our approach can works efficiently in a real parallel computing environment.




References:
[1] J. Han and M. Kamber, Data Mining: Concepts and Techniques , 1st ed.,
Morgan Kaufmann, New York, August 2001.
[2] R. Agrawal and R. Srikant, "Mining sequential patterns," in Eleventh
International Conference on Data Engineering, P. S. Yu and A. S. P.
Chen, Eds. Taipei, Taiwan: IEEE Computer Society Press, 1995, pp. 3-
14.
[3] H. Pinto, J. Han, J. Pei, K. Wang, Q. Chen, and U. Dayal, "Multidimensional
sequential pattern mining," in Proceedings of the tenth
international conference on Information and knowledge management
(CIKM '01). New York, NY, USA: ACM, 2001, pp. 81-88.
[4] M.J. Zaki, H. Ching-Tien, " Large scale parallel data mining", Lecture
notes in artificial intelligence, Vol 1759, Springer-Verlag 2000.
[5] R. Srikant and R. Agrawal, "Mining sequential patterns: Generalizations
and performance improvements," in Proc. 5th Int. Conf. Extending
Database Technology, EDBT, P. M. G. Apers, M. Bouzeghoub, and
G. Gardarin, Eds., vol. 1057. Springer-Verlag, FebruaryMay-
FebruarySeptember~ 1996, pp. 3-17.
[6] M. J. Zaki, "Spade: An efficient algorithm for mining frequent
sequences," Machine Learning, vol. 42, no. 1/2, pp. 31-60, 2001.
[7] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, Q. Chen, U. Dayal, and M.-C.
Hsu, "Prefixspan,: mining sequential patterns efficiently by prefixprojected
pattern growth," in Proc.17th Int'l Conf. on Data Eng, 2001,
pp. 215-224.
[8] W. Jinlin, X. Chen, Z. Kefa,W. Wei, "Parallel Research of Sequential
Pattern Data Mining Algorithm", Int-l Conference on Computer Science
and Software Engineering, vol 4, 2008, pp. 348-353.
[9] T. Shintani, M. Kitsuregawa, "Mining algorithms for sequential patterns
in parallel: Hash based approach", In Proc of the Second Pacific-Asia
Conf on Knowledge Discovery and Data mining, 1998, pp. 283-294.
[10] V. Guralnik, N. Garg, G. Karypis, "Parallel tree projection algorithm for
sequence mining", In Proc of 7th European Conf on Parallel Computing,
2001, pp. 310-320.
[11] S. de Amo, D.A. Furtado, A. Giacometti, D. Laurent, "An apriori-based
approach for first-order temporal pattern mining", in: Proceedings of the
19th Brazilian Symposium on Databases, Brasilia, Brazil, October 2004,
pp. 48-61.
[12] M. Plantevit, Y.W. Choong, A. Laurent, D. Laurent, M. Teisseire ,
"M2SP: Mining Sequential Patterns Among Several Dimensions",
Principles of Knowledge Discovery in Databases, Volume 3721, page
205-216, 2005.
[13] C.-C. Yu and Y.-L. Chen, "Mining sequential patterns from
multidimensional sequence data," Knowledge and Data Engineering,
IEEE Transactions on, vol. 17, no. 1, pp. 136-140, 2005.