A Comparative Study of GTC and PSP Algorithms for Mining Sequential Patterns Embedded in Database with Time Constraints

This paper will consider the problem of sequential
mining patterns embedded in a database by handling the time
constraints as defined in the GSP algorithm (level wise algorithms).
We will compare two previous approaches GTC and PSP, that
resumes the general principles of GSP. Furthermore this paper will
discuss PG-hybrid algorithm, that using PSP and GTC. The results
show that PSP and GTC are more efficient than GSP. On the other
hand, the GTC algorithm performs better than PSP. The PG-hybrid
algorithm use PSP algorithm for the two first passes on the database,
and GTC approach for the following scans. Experiments show that
the hybrid approach is very efficient for short, frequent sequences.

Authors:



References:
[1] F. Masseglia, P. Poncelet, M. Teisseire. ”Efficient mining of sequential
patterns with time constraints: Reducing the combinations”. Expert
Systems with Applications, Volume 36, pages 2677-2690, 2009.
[2] F. Cathala, F. Masseglia, P. Poncelet. ”The PSP Approach for Mining
Sequential Patterns”. ACM, pagesv176-184, 1998.
[3] Th. Rincy.N, Y. Pandey. ”Performance evaluation on state of the art
sequential pattern mining algorithms”. International Journal of Computer
Applications (0975-8887), Volume 65, N0. 14, March 2013.
[4] M. Lin, S. Hseueh, C. Chang. ”Mining closed sequential patterns with
time constraints”. International Journal Of Information Science And
Engineering Volume 24, pages 33-46, 2008.
[5] M. Morzy, T. Zakrzewicz, ”Efficient constraint-based sequential pattern
mining using dataset filtering techniques”. In Proceedings of the Baltic
conference, BalticDB IS, pages 213-224, 2002.
[6] M. J. Zaki, ”SPADE: An efficient algorithm for mining frequent
sequences”. Machine Learning Journal, Volume 42, NO. 1, pages 31-60,
2001.
[7] R. Srikant and R. Agrawal. ”Mining Sequential Patterns: Generalizations
and Performance Improvements”. In Proc. of the EDBT’96, Avignon,
France, Sept 1996.
[8] U. M. Fayad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors.
”Advances in Knowledge Discovery and Data Mining”. AAAI Press,
1996.
[9] J. Wijsen. ”Condensed Representation of Database Repairs for Consistent
Query Answering”. In International Conference on Database Theory
(ICDT), Springer-Verlag, LNCS 2572, pages 378-393, 2003.
[10] P. C. Kanellakis. ”Elements of Relational Database Theory”. In Jan van
Leeuwen, editor, Handbook of Theoretical Computer Science, volume B,
chapter 17, Elsevier/MIT Press, pages 1073-1158, 1990.
[11] J. Fry, G. Xian, S. Jin, J. Dewitz, C. Homer, L. Yang, C. Barnes,
N. Herold, J. Wickham. ”Completion of the 2006 National Land
Cover Database for the conterminous United States”, Photogrammetric
Engineering Remote Sensing, Volume 77, NO. 9, pages 858-863, 2011.
[12] B. C. Ooi, C. Yu, K. L. Tan, and H. V. Jagadish. ”Indexing the distance:
an efficient method to knn processing”. In Procdf the Int. Conf. on Very
Large Data Bases, 2001.
[13] K. V. Ravi Kanth, D. Agrawal, Amr El Abbadi, and Ambuj K. Singh.
”Dimensionality reduction for similarity searching in dynamic databases”.
In Proc. A CM SIGMOD Int. Conf. on Management of Data, 1998.
[14] J. Griss, Y. Perez-Riverol, H. Hermjakob, J. A. Vizcaino. ”Identifying
novel biomarkers through data mining-A realistic scenario”. Proteomics
Clin. Appl., Volume 9, pages 437-443, 2015.
[15] N. Roussopoulos, S. Kelley, and F. Vincent. ”Nearest neighbor queries”.
In Proc. A CM SIGMOD Int. Conf. on Management of Data, pages 71-79,
May 1995.
[16] H. Samet. ”Recent developments in linear quadtree-based geographic
information systems”. Image and Vision Computing, Volume 5, NO. 3,
pages 187-197, Aug. 1987.
[17] T. Sellis, N. Roussopoulos, and C. Faloutsos. ”The r+-tree: A dynamic
index for multi-dimensional objects”. Procdf the Int. Conf. on Very Large
Data Bases, Volume 13, pages 507-518, 1988.
[18] Y. Theodoridis and T. K. Sellis. ”Optimization issues in r-tree
construction”. In Geographic Information Systems (IGIS), pages 270-273,
1994.
[19] F. Wang. ”Relational-linear quadtree approach for two-dimensional
spatial representation and manipulation”. IEEE Trans. on Knowledge and
Data Engineering, Volume 3, NO. 1, pages 118-122, Mar. 1991.
[20] J. Chomicki, J. Marcinkowski, and S. Staworko, ”Computing consistent
query answers using conflict hypergraphs,” in CIKM, D. Grossman,
L. Gravano, C. Zhai, O. Herzog, and D. A. Evans, Eds. ACM, pages
417-426, 2004.