Feature Extraction Technique for Prediction the Antigenic Variants of the Influenza Virus

In genetics, the impact of neighboring amino acids on
a target site is referred as the nearest-neighbor effect or simply
neighbor effect. In this paper, a new method called wavelet particle
decomposition representing the one-dimensional neighbor effect
using wavelet packet decomposition is proposed. The main idea lies
in known dependence of wavelet packet sub-bands on location and
order of neighboring samples. The method decomposes the value of
a signal sample into small values called particles that represent a part
of the neighbor effect information. The results have shown that the
information obtained from the particle decomposition can be used to
create better model variables or features. As an example, the approach
has been applied to improve the correlation of test and reference
sequence distance with titer in the hemagglutination inhibition assay.




References:
[1] T. R. Klingen, S. Reimering, C. A. Guzm´an, and A. C. McHardy, “In
silico vaccine strain prediction for human influenza viruses,” Trends in
Microbiology, vol. 26, no. 2, 2018.
[2] W. T. Harvey, D. J. Benton, V. Gregory, J. P. Hall, R. S. Daniels,
T. Bedford, D. T. Haydon, A. J. Hay, J. W. McCauley, and R. Reeve,
“Identification of low-and high-impact hemagglutinin amino acid
substitutions that drive antigenic drift of influenza a (h1n1) viruses,”
PLoS pathogens, vol. 12, no. 4, p. e1005526, 2016.
[3] W. T. Harvey, “Quantifying the genetic basis of antigenic variation
among human influenza a viruses,” Ph.D. dissertation, University of
Glasgow, 2016. [4] X. Xia and Z. Xie, “Protein structure, neighbor effect, and a new index
of amino acid dissimilarities,” Molecular biology and evolution, vol. 19,
no. 1, pp. 58–67, 2002.
[5] T.-H. Kuo and K.-B. Li, “Predicting protein–protein interaction sites
using sequence descriptors and site propensity of neighboring amino
acids,” International journal of molecular sciences, vol. 17, no. 11, p.
1788, 2016.
[6] W. Xue, X.-y. Hong, N. Zhao, R.-l. Yang, and L. Zhang, “Predicting
protein subcellular localization by approximate nearest neighbor
searching,” in Control And Decision Conference (CCDC), 2017 29th
Chinese. IEEE, 2017, pp. 2842–2846.
[7] M. Fu, Z. Huang, Y. Mao, and S. Tao, “Neighbor preferences of
amino acids and context-dependent effects of amino acid substitutions
in human, mouse, and dog,” International journal of molecular sciences,
vol. 15, no. 9, pp. 15 963–15 980, 2014.
[8] G.-Z. Wang, L.-L. Chen, and H.-Y. Zhang, “Neighboring-site effects
of amino acid mutation,” Biochemical and biophysical research
communications, vol. 353, no. 3, pp. 531–534, 2007.
[9] S. Mallat, A wavelet tour of signal processing. Academic press, 1999.
[10] P. Lio, “Wavelets in bioinformatics and computational biology: state of
art and perspectives,” Bioinformatics, vol. 19, no. 1, pp. 2–9, 2003.
[11] M. Cardelli, M. Nicoli, A. Bazzani, and C. Franceschi, “Application
of wavelet packet transform to detect genetic polymorphisms by the
analysis of inter-alu pcr patterns,” BMC bioinformatics, vol. 11, no. 1,
p. 593, 2010.
[12] R. Jiang and H. Yan, “Studies of spectral properties of short genes
using the wavelet subspace hilbert–huang transform (wshht),” Physica
A: Statistical Mechanics and its Applications, vol. 387, no. 16-17, pp.
4223–4247, 2008.
[13] J. Ning, C. N. Moore, and J. C. Nelson, “Preliminary wavelet analysis
of genomic sequences,” in Bioinformatics Conference, 2003. CSB 2003.
Proceedings of the 2003 IEEE. IEEE, 2003, pp. 509–510.
[14] G. Dodin, P. Vandergheynst, P. Levoir, C. Cordier, and L. Marcourt,
“Fourier and wavelet transform analysis, a tool for visualising regular
patterns in dna,” Journal of Theoretical Biology, vol. 206, no.
EPFL-ARTICLE-86700, pp. 323–326, 2000.
[15] J. Zhao, X. W. Yang, J. P. Li, and Y. Y. Tang, “Dna sequences
classification based on wavelet packet analysis,” in Wavelet Analysis
and Its Applications. Springer, 2001, pp. 424–429.
[16] E. R. Dougherty, X. Cai, Y. Huang, S. Kim, and R. Yamaguchi,
“Editorial [hot topic: Genomic signal processing: Part 1 (guest editors:
Er dougherty, x. cai, y. huang, s. kim and r. yamaguchi)],” Current
Genomics, vol. 10, no. 6, pp. 364–364, 2009.
[17] H. K. Kwan and S. B. Arniker, “Numerical representation of dna
sequences,” in Electro/Information Technology, 2009. eit’09. IEEE
International Conference on. IEEE, 2009, pp. 307–310.
[18] G. K. Hirst, “The quantitative determination of influenza virus and
antibodies by means of red cell agglutination,” Journal of Experimental
Medicine, vol. 75, no. 1, pp. 49–64, 1942.
[19] R. Reeve, B. Blignaut, J. J. Esterhuysen, P. Opperman, L. Matthews,
E. E. Fry, T. A. De Beer, J. Theron, E. Rieder, W. Vosloo
et al., “Sequence-based prediction for vaccine strain selection and
identification of antigenic variability in foot-and-mouth disease virus,”
PLoS computational biology, vol. 6, no. 12, p. e1001027, 2010.
[20] D. J. Smith, A. S. Lapedes, J. C. de Jong, T. M. Bestebroer, G. F.
Rimmelzwaan, A. D. Osterhaus, and R. A. Fouchier, “Mapping the
antigenic and genetic evolution of influenza virus,” science, vol. 305,
no. 5682, pp. 371–376, 2004.
[21] D. K. Ruch and P. J. Van Fleet, Wavelet theory: An elementary approach
with applications. John Wiley & Sons, 2011.
[22] V. Gregory, W. T. Harvey, R. S. Daniels, R. Reeve, L. Whittaker,
C. Halai, A. Douglas, R. Gonsalves, J. J. Skehel, A. J. Hay,
and J. W. McCauley, “Human former seasonal influenza A(H1N1)
haemagglutination inhibition data 1977-2009 from the who collaborating
centre for reference and research on influenza – London, UK,” University
of Glasgow, Tech. Rep., 2016.
[23] M. Harman, “The current state and future of search based software
engineering,” in 2007 Future of Software Engineering. IEEE Computer
Society, 2007, pp. 342–357.
[24] N. R. Vempaty, V. Kumar, and R. E. Korf, “Depth-first versus best-first
search.” in AAAI, 1991, pp. 434–440.