A New Edit Distance Method for Finding Similarity in Dna Sequence

The P-Bigram method is a string comparison methods base on an internal two characters-based similarity measure. The edit distance between two strings is the minimal number of elementary editing operations required to transform one string into the other. The elementary editing operations include deletion, insertion, substitution two characters. In this paper, we address the P-Bigram method to sole the similarity problem in DNA sequence. This method provided an efficient algorithm that locates all minimum operation in a string. We have been implemented algorithm and found that our program calculated that smaller distance than one string. We develop PBigram edit distance and show that edit distance or the similarity and implementation using dynamic programming. The performance of the proposed approach is evaluated using number edit and percentage similarity measures.




References:
[1] Saul B. Needleman, Christlan D. Wunsch, "A General Applicable to the
Search for Similarities in the Amino Acid Sequence of two Proteins", J
Mol. Biol, Vol. 48, 1970, pp. 443-453.
[2] David Sankoff, "Simultaneous solution of the RNA folding alignment
and protosequence problems", Siam J. Appl Math, Vol. 45, 1985, pp.
810-825.
[3] A drien Coyette, et al., "Trainable Sketch Recognizer of Graphical User
Interface Design", International Federation for Information Processing,
Vol. 1, 2007, pp. 124-135.
[4] Levenshtein V.I. "Binary codes capable of correcting deletions,
insertions and reversals", Soviet Physics Doklady, Vol. 8, 1966, pp. 705-
710.
[5] Gusfield. "Algorithms on String Trees and Sequences", Computer
science and Computational Biology Cambridge University Press, 1997.
[6] Hall, P.A.V, Dowling, G.R. "Approximate string matching", ACM
Computing Surveys, Vol. 4, 1980, pp. 381-402.
[7] Christen, P. "A Comparison of Personal Name Matching Techniques and
Practical Issues", Technical Report TR-CS-06-02, Joint Computer
Science Technical Report Series, Department of Computer Science,
2006.
[8] Cohen, W. Ravikumar, P. Fienberg, S."A comparison of string distance
metrics for name-matching tasks", IJCAI Workshop on Information
Integration on the Web, Acapulco, Mexico, 2003, pp. 73-78.
[9] AbdulJaleel, N. Larkey, L.S. "Statistical transliteration for English-
Arabic cross language information retrieval", CIKM, 2003, PP. 139-146.
[10] Linden, K, "Multilingual Modeling of Cross-Lingual Spelling Variants
spelling variants Information Retrieval", Vol. 3, 2006, pp. 295-310.
[11] Ristad, E.S. Yianilos, P.N. "Learning string-edit distance", IEEE
Transactions or Pattern Analysis and Machine Intelligence, 1998.
[12] Carlo Batini, Monica Scannapieco, "Data Quality Concepts,
Methodologies and Techniqes", Springer, DCSA, 1998, pp. 117-127.
[13] Heikki Hyyrö, Ayumi Shinohara, "New Bit-Parallel-Distance
Algorithm", Nikoletseas, LNCS 3503, 2005, pp. 380-390.
[14] Adrian. Horia Dediu, et al., "A fast Longest Common Subsequence
Algorithm for Similar Strings", Language and Automata Theory and
Applications, International Conference, LATA, 2010, pp. 82-93.
[15] Heikki Hyyrö, "Restricted Transposition Invariant Approximate String
Matching Under Edit Distance", SPIRE, LNCS 3772, 2005, pp. 256-
266.
[16] M.H. Alsuwaiyel, "Algorithms design techniques and analysis", World
Scientific Connecting Great Minds, Vol. 7, 1999, pp. 203-208.
[17] Dekang Lin(1998), "An Information Theoretic Definition of similarity",
Proceedings of the 15th international conference on statistic", Citeseer.