Tool for Fast Detection of Java Code Snippets

This paper presents general results on the Java source
code snippet detection problem. We propose the tool which uses
graph and subgraph isomorphism detection. A number of solutions
for all of these tasks have been proposed in the literature. However,
although that all these solutions are really fast, they compare just the
constant static trees. Our solution offers to enter an input sample
dynamically with the Scripthon language while preserving an
acceptable speed. We used several optimizations to achieve very low
number of comparisons during the matching algorithm.





References:
[1] Christophe-André M. Irniger, “Graph Matching - Filtering Databases of
Graphs Using Machine Learning Techniques,” 2005, ISBN 1-58603-
557-6.
[2] B.D. McKay, “ractical graph isomorphism,” In Congressus
Numerantium, volume 30, 1981, pages 45–87.
[3] J.R. Ullmann, “An algorithm for subgraph isomorphism,” Journal ofthe
Association for Computing Machinery, 1976, pages 31-42.
[4] G. Levi, “A note on the derivation of maximal common subgraphs
oftwo directed or undirected graphs,” Calcolo, 1972, pages 341-354.
[5] J. McGregor, “Backtrack search algorithms and the maximal common
subgraph problem,” Software-Practice and Experience, 1982, pages 23-
34.
[6] G.A. Stephen, “String Searching Algorithms,” World Scientific, 1994.
[7] H. Bunke and K. Shearer, “A graph distance metric based on the
maximalcommon subgraph,” Pattern Recognition Letters, 1998, pages
255-259.
[8] M.-L. Fernandez and G. Valiente, “A graph distance metric combining
maximum common subgraph and minimum common supergraph,”
Pattern Recognition Letters, 2001, pages 753-758.
[9] W.D. Wallis, P. Shoubridge, M. Kraetz, and D. Ray, “Graph distances
using graph union,” Pattern Recognition Letters, May 2001, pages 701-
704.
[10] A. Sanfeliu and K.S. Fu, “A distance measure between attributed
relationalgraphs for pattern recognition,” IEEE Transactions on
Systems, Man, and Cybernetics, 1983, 353-363.
[11] W.H. Tsai and K.S. Fu, “Error-correcting isomorphisms of
attributedrelational graphs for pattern recognition.” IEEE Transactions
on Systems, Man, and Cybernetics, 1979, pages 757-768.
[12] M.A. Eshera and K.S. Fu, “A graph distance measure for image
analysis,” IEEE Transactions on Systems, Man, and Cybernetics, 1984,
pages 398-408.
[13] E.K. Wong, “Three-dimensional object recognition by attributed
graphs,” In H. Bunke and A. Sanfeliu, editors, Syntactic and Structural
Pattern Recognition- Theory and Applications, World Scientific, 1990,
pages 381-414.
[14] R. Wilson and E.R. Hancock, “Levenshtein distance for graph spectral
features,” In J. Kittler, M. Petrou, and M. Nixon, editors, Proc. 17th Int.
Conference on Pattern Recognition, volume 2, 2004, pages 489-492.
[15] A. Robles-Kelly and E.R. Hancock, “Graph edit distance from spectral
seriation,” IEEE Transactions on Pattern Analysis and Machine
Intelligence, 2005, pages 365-378.
[16] T. Caelli and S. Kosinov, “Inexact graph matching using eigensubspace
projection clustering,” Int. Journal of Pattern Recognition and Artificial
Intelligence, 2004, pages 329-355.
[17] T. Caelli and S. Kosinov, “An eigenspace projection clustering method
for inexact graph matching,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, 2004, pages 515-519.
[18] Ch. K. Roy, J. R. Cordy, and R. Koschke. 2009. “Comparison and
evaluation of code clone detection techniques and tools: A qualitative
approach,” Sci. Comput. Program. 74, 7 (May 2009), pp. 470-495.
[19] I. D. Baxter, A. Yahin, L. Moura, M. Sant'Anna, and L. Bier. 1998.
“Clone Detection Using Abstract Syntax Trees,” in Proceedings of the International Conference on Software Maintenance (ICSM '98). IEEE
Computer Society, Washington, DC, USA, pp. 368-377.
[20] V. Wahler, D. Seipel, J. W. von Gudenberg, and G. Fischer, "Clone
detection in source code by frequent itemset techniques," In SCAM,
2004.
[21] R. Koschke, R.Falke, P. Frenzel, “Clone Detection Using Abstract
Syntax Suffix Trees, Reverse Engineering,” 2006. WCRE '06. 13th
Working Conference on, ISBN 0-7695-2719-1, 2006, pages 253-262.
[22] Tomáš Bublík., Miroslav Virius.: “New language for searching Java
code snippets,” in: ITAT 2012. Proc. of the 12th national conference
ITAT. diar, Sep 17 – 21 2012. Pavol Jozef Safrik University in Kosice.
p. 35 – 40.
[23] J. T. Yao and M. Zhang. 2004. “A Fast Tree Pattern Matching
Algorithm for XML Query,” in Proceedings of the 2004
IEEE/WIC/ACM International Conference on Web Intelligence (WI
’04). IEEE Computer Society, Washington, DC, USA, pp. 235-241.
[24] G. Valiente, "Algorithms on Trees and Graphs," Springer, ISBN
3540435506, 2002, page 170.
[25] R. Shamir, D. Tsur, "Faster Subtree Isomorphism," In Journal of
Algorithms, Volume 33 Issue 2, 1999, pages 267-280,
doi:10.1006/jagm.1999.104