Meta-Learning for Hierarchical Classification and Applications in Bioinformatics

Hierarchical classification is a special type of classification task where the class labels are organised into a hierarchy, with more generic class labels being ancestors of more specific ones. Meta-learning for classification-algorithm recommendation consists of recommending to the user a classification algorithm, from a pool of candidate algorithms, for a dataset, based on the past performance of the candidate algorithms in other datasets. Meta-learning is normally used in conventional, non-hierarchical classification. By contrast, this paper proposes a meta-learning approach for more challenging task of hierarchical classification, and evaluates it in a large number of bioinformatics datasets. Hierarchical classification is especially relevant for bioinformatics problems, as protein and gene functions tend to be organised into a hierarchy of class labels. This work proposes meta-learning approach for
recommending the best hierarchical classification algorithm to a
hierarchical classification dataset. This work’s contributions are: 1)
proposing an algorithm for splitting hierarchical datasets into
new datasets to increase the number of meta-instances, 2) proposing
meta-features for hierarchical classification, and 3) interpreting
decision-tree meta-models for hierarchical classification algorithm
recommendation.




References:
[1] L. Schietgat, C. Vens, J. Struyf, H. Blockeel, D. Kocev, and S. Dzeroski,
“Predicting gene function using hierarchical multi-label decision tree
ensembles.” BMC Bioinformatics, vol. 11, no. 2, pp. 1–14, Jan. 2010.
[2] D. Delen, G. Walker, and A. Kadam, “Predicting breast cancer
survivability: a comparison of three data mining methods,” Artificial
Intelligence in Medicine, vol. 34, no. 2, pp. 113–127, 2005.
[3] C. N. Silla Jr. and A. A. Freitas, “A Survey of Hierarchical Classification
Across Different Application Domains,” Data Mining and Knowledge
Discovery, vol. 44, no. 1-2, pp. 31–72, 2011.
[4] P. Brazdil, C. G. Carrier, C. Soares, and R. Vilalta, Metalearning:
Applications to data mining. Springer, 2008.
[5] C. Vens, L. Schietgat, J. Struyf, H. Blockeel, and D. Kocev, “Predicting
Gene Function using Predictive Clustering Trees,” BMC Bioinformatics,
vol. 11, no. 2, pp. 1–25, 2010.
[6] D. Koller and M. Sahami, “Hierarchically Classifying Documents Using
Very Few Words,” in Proceedings of the 14th International Conference
on Machine Learning, ser. ICML ’97. San Francisco, CA, USA:
Morgan Kaufmann Publishers Inc., 1997, pp. 170—-178.
[7] M. A. Harris, J. Clark, A. Ireland, J. Lomax et al., “The Gene Ontology
(GO) database and informatics resource.” Nucleic Acids Research,
vol. 32, pp. D258–61, Jan. 2004.
[8] H. Blockeel, M. Bruynooghe, S. Dzeroski, J. Ramon, and J. Struyf,
“Hierarchical Multi-Classification,” in Proceedings of the ACM SIGKDD
2002 workshop on multi-relational data mining (MRDM 2002), 2002,
pp. 21–35.
[9] C. Vens, J. Struyf, L. Schietgat, S. Dzeroski, and H. Blockeel, “Decision
Trees for Hierarchical Multi-label Classification,” Machine Learning,
vol. 73, no. 2, pp. 185–214, Aug. 2008.
[10] F. Fabris and A. A. Freitas, “Dependency Network Methods
for Hierarchical Multi-label Classification of Gene Functions,”
in Proceedings of the 2014 IEEE International Conference on
Computational Intelligence and Data Mining, Orlando, Florida, Dec.
2014, pp. 241–248.
[11] F. Fabris, A. Freitas, and J. Tullet, “An Extensive Empirical
Comparison of Probabilistic Hierarchical Classifiers in Datasets of
Ageing-Related Genes,” IEEE/ACM transactions on computational
biology and bioinformatics/IEEE, ACM, pp. 1–14, dec 2015. [Online].
Available: http://europepmc.org/abstract/MED/26661786
[12] F. Fabris and A. A. Freitas, “A Novel Extended Hierarchical Dependence
Network Method Based on non-Hierarchical Predictive Classes and
Applications to Ageing-Related Data,” in Proceedings of the 2015
IEEE 27th International Conference on Tools with Artificial Intelligence
(ICTAI). IEEE, 2015, pp. 294–301.
[13] L. d. C. Merschmann and A. A. Freitas, “An Extended Local
Hierarchical Classifier for Prediction of Protein and Gene Functions,”
in Data Warehousing and Knowledge Discovery, ser. Lecture Notes in
Computer Science. Springer, 2013, vol. 8057, pp. 159–171.
[14] A. A. Freitas, “Comprehensible Classification Models - a position
paper,” ACM SIGKDD Explor. Newsl., vol. 15, no. 1, pp. 1–10, 2014.
[15] A. Vellido, J. D. Mart´ın-Guerrero, and P. J. Lisboa, “Making machine
learning models interpretable,” in In Proc. European Symposium on
Artificial Neural Networks, Computational Intelligence and Machine
Learning, vol. 12, 2012, pp. 163–172.
[16] K. Boyd, K. H. Eng, and C. D. Page, “Area Under the Precision-Recall
Curve: Point Estimates and Confidence Intervals,” in Machine Learning
and Knowledge Discovery in Databases, ser. Lecture Notes in Computer
Science. Springer, 2013, vol. 8190, pp. 451–466.
[17] Y. Peng, P. A. Flach, C. Soares, and P. B. Brazdil, “Improved dataset
characterisation for meta-learning,” ser. Lecture Notes in Computer
Science. Springer, 2002, vol. 2534, pp. 141–152.
[18] R. Leite and Pavel Brazdil, “Active Testing Strategy to Predict the
Best Classification Algorithm via Sampling and Meta-Learning,” in
Proceedings of the 2010 conference on ECAI 2010: 19th European
Conference on Artificial Intelligence. IOS Press, 2010, pp. 309–314.
[19] Q. Sun and B. Pfahringer, “Pairwise meta-rules for better
meta-learning-based algorithm ranking,” Machine Learning, vol. 93,
no. 1, pp. 141–161, jul 2013.
[20] J. N. van Rijn, S. M. Abdulrahman, P. Brazdil, and J. Vanschoren, “Fast
algorithm selection using learning curves,” in International Symposium
on Intelligent Data Analysis. Springer, 2015, pp. 298–309.
[21] R. Leite, P. Brazdil, and J. Vanschoren, “Selecting classification
algorithms with active testing,” in Machine Learning and Data Mining
in Pattern Recognition, ser. Lecture Notes in Computer Science, 2012,
vol. 7376, pp. 117–131.
[22] S. M. Abdulrahman and P. Brazdil, “Measures for combining accuracy
and time for meta-learning,” in Proceedings of the 2014 International
Conference on Meta-learning and Algorithm Selection (MLAS’14), vol.
1201, 2014, pp. 49–50.
[23] I. Partalas, R. Babbar, E. Gaussier, and C. Amblard, “Adaptive classifier
selection in large-scale hierarchical classification,” in Lecture Notes in
Computer Science, vol. 7665, no. 3, 2012, pp. 612–619.
[24] G. Tsoumakas, I. Katakis, and I. Vlahavas, “Mining Multi-label Data,”
in Data Mining and Knowledge Discovery Handbook, O. Maimon and
L. Rokach, Eds., 2010, pp. 667–685.
[25] A. Ruepp, A. Zollner, D. Maier, K. Albermann, J. Hani et al., “The
FunCat, a functional annotation scheme for systematic classification of
proteins from whole genomes,” Nucleic Acids Research, vol. 32, no. 18,
pp. 5539–5545, 2004.
[26] R. Tacutu, T. Craig, A. Budovsky, D. Wuttke, G. Lehmann,
D. Taranukha, J. Costa, V. E. Fraifeld, and J. a. P. de Magalh˜aes,
“Human Ageing Genomic Resources: integrated databases and tools for
the biology and genetics of ageing.” Nucleic Acids Research, vol. 41,
no. Database issue, pp. D1027–D1033, Jan. 2013.
[27] F. Fabris and A. A. Freitas, “New KEGG pathway-based interpretable
features for classifying ageing-related mouse proteins,” Bioinformatics,
vol. 32, no. 19, pp. 2988–2995, jun 2016.
[28] “HMC Software and Datasets,” https://dtai.cs.kuleuven.be/clus/
hmcdatasets/, accessed: 2016-09-23.
[29] “Other Bioinformatics Datasets, including ageing-related datasets with
GO and FunCat classes,” https://www.cs.kent.ac.uk/people/rpg/ff79/
Fabris Datasets.tar.gz, accessed: 2016-09-23.
[30] M. Lichman, “UCI machine learning repository
http://archive.ics.uci.edu/ml,” 2013. [Online]. Available: http:
//archive.ics.uci.edu/ml
[31] B. E. Boser, I. M. Guyon, and V. N. Vapnik, “A training algorithm for
optimal margin classifiers,” in Proceedings of the Fifth Annual Workshop
on Computational Learning Theory, ser. COLT ’92. New York, NY,
USA: ACM, 1992, pp. 144–152.
[32] J. R. Quinlan, C4.5: Programs for Machine Learning. San Francisco,
CA, USA: Morgan Kaufmann Publishers Inc., 1993.
[33] I. H. Witten and E. Frank, Data Mining: Practical Machine Learning
Tools and Techniques with Java Implementations. San Francisco, CA,
USA: Morgan Kaufmann Publishers Inc., 2000.
[34] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector
machines,” ACM Transactions on Intelligent Systems and Technology,
vol. 2, no. 3, pp. 1–27, 2011.
[35] T. D. Gautheir, “Detecting Trends Using Spearman’s Rank Correlation
Coefficient,” Environmental Forensics, vol. 2, no. 4, pp. 359–362, 2001.
[36] P. B. Brazdil, C. Soares, and J. P. Da Costa, “Ranking learning
algorithms: Using IBL and meta-learning on accuracy and time results,”
Machine Learning, vol. 50, no. 3, pp. 251–277, 2003.
[37] J. Demsar, “Statistical Comparisons of Classifiers over Multiple Data
Sets,” Journal of Machine Learning Research, vol. 7, pp. 1–30, 2006.