Abstract: Classification is an important topic in machine learning
and bioinformatics. Many datasets have been introduced for
classification tasks. A dataset contains multiple features, and the quality of features influences the classification accuracy of the dataset.
The power of classification for each feature differs. In this study, we
suggest the Classification Influence Index (CII) as an indicator of classification power for each feature. CII enables evaluation of the
features in a dataset and improved classification accuracy by transformation of the dataset. By conducting experiments using CII
and the k-nearest neighbor classifier to analyze real datasets, we confirmed that the proposed index provided meaningful improvement
of the classification accuracy.
Abstract: Classification is one of the primary themes in
computational biology. The accuracy of classification strongly
depends on quality of a dataset, and we need some method to
evaluate this quality. In this paper, we propose a new graphical
analysis method using 'Membership-Deviation Graph (MDG)' for
analyzing quality of a dataset. MDG represents degree of
membership and deviations for instances of a class in the dataset. The
result of MDG analysis is used for understanding specific feature and
for selecting best feature for classification.