Comparison of Phylogenetic Trees of Multiple Protein Sequence Alignment Methods

Multiple sequence alignment is a fundamental part in many bioinformatics applications such as phylogenetic analysis. Many alignment methods have been proposed. Each method gives a different result for the same data set, and consequently generates a different phylogenetic tree. Hence, the chosen alignment method affects the resulting tree. However in the literature, there is no evaluation of multiple alignment methods based on the comparison of their phylogenetic trees. This work evaluates the following eight aligners: ClustalX, T-Coffee, SAGA, MUSCLE, MAFFT, DIALIGN, ProbCons and Align-m, based on their phylogenetic trees (test trees) produced on a given data set. The Neighbor-Joining method is used to estimate trees. Three criteria, namely, the dNNI, the dRF and the Id_Tree are established to test the ability of different alignment methods to produce closer test tree compared to the reference one (true tree). Results show that the method which produces the most accurate alignment gives the nearest test tree to the reference tree. MUSCLE outperforms all aligners with respect to the three criteria and for all datasets, performing particularly better when sequence identities are within 10-20%. It is followed by T-Coffee at lower sequence identity (<10%), Align-m at 20-30% identity, and ClustalX and ProbCons at 30-50% identity. Also, it is noticed that when sequence identities are higher (>30%), trees scores of all methods become similar.




References:
[1] J.D. Thompson, et al. "The ClustalX:windows interface: Flexible
strategies for multiple sequence alignment aided by quality analysis
tools," Nucleic Acids Res., vol. 25, 1997, pp. 4876-4882.
[2] C. NotreDame, et al. "T-Coffee: A novel method for multiple sequence
alignments", J. Mol. Biol., vol. 302, 2000, pp. 205-217.
[3] R.C. Edgar, "MUSCLE: multiple sequence alignment with high
accuracy and high throughput," Nucleic Acids Res., vol. 32, 2004, pp.
1792-1797.
[4] K. Katoh, et al. "MAFFT: a novel method for rapid multiple sequence
alignment based on fast Fourier Transform," Nucleic Acids Res., vol.30,
2002, pp. 3059-3066.
[5] C. B. Do, "ProbCons: Probabilistic consistency-based multiple sequence
alignmentg," Genome Res., vol. 15, 2005, pp. 330-340.
[6] I. V. Walle, et al. "Align-mÔÇöA new algorithm for multiple alignment of
high divergent sequences," Bioinformatics., vol. 20, 2004, pp. 1428-
1435.
[7] B. Morgenstern, "DIALIGN2: improvement of the segment-to-segment
approach to multiple sequence alignment," Bioinformatics., vol. 15,
1999, pp. 211-218.
[8] C. NotreDame and D. G. Higgins, "SAGA: sequence alignment by
genetic algorithm," Nucleic Acids Res., vol. 24, 1996, pp. 1515-1524.
[9] M. A. McClure, et al., "Comparative analysis of multiple proteinsequence
alignment methods," Mol. Biol. Evol., vol. 11, 1994, pp. 571-
592.
[10] S. Henikoff and J. G. Henikoff, "Embedding strategies for effective use
of information from multiple sequence alignments," Protein Sci., vol. 6,
1997, pp. 698-705.
[11] P. Briffeuil, et al., "Comparative analysis of multiple protein sequence
alignment servers: clues to enhance reliability of prdictions,"
Bioinformatics, vol. 14, 1998, pp. 357-366.
[12] J.D. Thompson, et al. "BAliBASE: A benchmark alignment database for
the evaluation of multiple alignment programs," Bioinformatics., vol.
15, 1999, pp. 87-88.
[13] M. S. Waterman and T. F. Smith "On the similarity of dendrograms," J.
Theor. Biol., vol. 73, 1978, pp. 789-800.
[14] D. F. Robinson and L. R. Foulds "Comparison of phylogenetic trees,"
Math. Bios., vol. 53, 1981, pp. 131 -147.
[15] G. P. S. Raghava et al., "OXBench: A benchmark for evaluation of
protein multiple sequence alignment accuracy," BMC Bioinformatics,
vol. 4, 2003.
[16] J.D. Thompson, et al. "ClustalW: improving the sensitivity of
progressive multiple sequence alignment through sequence weighting,
position-specific gap penalties and weight matrix choice," Nucleic
Acids Res., vol. 22, 1994, pp. 4673-4680.
[17] N. Saitou and M. Nei "The Neighbor-Joining method: a new method for
reconstructing phylogenetic trees," Mol. Biol. Evol., vol. 4, 1987, pp.
406-425.
[18] R. D. M. Page "COMPONENT: Tree comparison software for Microsoft
Windows, version 2.0," The Natural History Museum, London, 1993.
[19] A. Drummond and K. Strimmer "PAL: An object-oriented programming
library for molecular evolution and phylogenetics," Bioinformatics, vol.
17, 2001, pp. 662-663.