Syntax Sensitive and Language Independent Detection of Code Clones

This paper proposes a new technique to detect code clones from the lexical and syntactic point of view, which is based on PALEX source code representation. The PALEX code contains the recorded parsing actions and also lexical formatting information including white spaces and comments. We can record a list of parsing actions (shift, reduce, and reading a token) during a compiling process after a compiler finishes analyzing the source code. The proposed technique has advantages for syntax sensitive approach and language independency.

Authors:



References:
[1] Bill Moggridge, "Designing Interactions," The MIT Press, 2007.
[2] Brenda .S. Baker, "On Finding Duplication and Near-Duplication in
Large Software Systems," Working Conferneceo on Reverse Engineering,
pp.86-95, 1995.
[3] Ira D. Baxter, Andrew Yahin, et al., "Clone Detection Using Abstract Syntax
Trees," International Conference on Software Maintenance, pp.368-
377, 1998.
[4] St'ephane Ducasse, Matthias Rieger, Serge Demeyer, "A Language Independent
Approach for Detecting Duplicated Code," 15th IEEE International
Conference on Software Maintenance, pp.109-118,1999.
[5] Cory Kapser and Michael W. Godfrey, "-Cloning Considered Harmful-
Considered Harmful," Working Conference on Reverse Engineering,
pp.19-28, 2006.
[6] Kazuaki Maeda, "XML-Based Source Code Representation with Parsing
Actions," The International Conference on Software Engineering Research
and Practice, 2007.
[7] PMD: Finding copied and pasted code, available from
http://pmd.sourceforge.net/cpd.html (accessed 2009-11-28).
[8] Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, "CCFinder: A Multilinguistic
Token-Based Code Clone Detection System for Large Scale
Source Code," IEEE Transactions on Software Engineering, pp.654-670,
vol.28, no.7, Jul. 2002.
[9] Vera Wahler, Dietmar Seipel, et al., "Clone Detection in Source Code by
Frequent Itemset Techniques," IEEE International Workshop on Source
Code Analysis and Manipulation, pp.128-135, 2004.
[10] William S. Evans, Christopher W. Fraser, Fei Ma, "Clone Detection via
Structural Abstraction," Software Quality Journal, vol.17, no.4, pp.309-
330, 2009.
[11] Raghavan Komondoor, Susan Horwitz, "Using Slicing to Identify Duplication
in Source Code," pp.40-56, LNCS vol.2126, 2001.
[12] Jens Krinke, "Identifying Similar Code with Program Dependence
Graphs," Working Conference on Reverse Engineering, pp.301-309,
2001.
[13] Chao Liu, Chen Chen, et al., "GPLAG: Detection of Software Plagiarism
by Program Dependence Graph Analysis," The 12th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining,
pp.872-881, 2006.
[14] Steven C. Johnson. "Yacc: Yet Another Compiler Compiler," UNIX
Programmer-s Manual, vol. 2, pp. 353-387, 1979.
[15] Charles Donnelly, Richard Stallman, "Bison - The Yacc-Compatible
Parser Generator," Free Software Foundation, 2006.
[16] Maxime Crochmore, Christphe Hancart, Thierry Lecroq, "Algorithms
on Strings," Cambridge University Press, 2001.