An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

Alternative Methods to Rank the Impact of Object Oriented Metrics in Fault Prediction Modeling using Neural Networks

The aim of this paper is to rank the impact of Object Oriented(OO) metrics in fault prediction modeling using Artificial Neural Networks(ANNs). Past studies on empirical validation of object oriented metrics as fault predictors using ANNs have focused on the predictive quality of neural networks versus standard statistical techniques. In this empirical study we turn our attention to the capability of ANNs in ranking the impact of these explanatory metrics on fault proneness. In ANNs data analysis approach, there is no clear method of ranking the impact of individual metrics. Five ANN based techniques are studied which rank object oriented metrics in predicting fault proneness of classes. These techniques are i) overall connection weights method ii) Garson-s method iii) The partial derivatives methods iv) The Input Perturb method v) the classical stepwise methods. We develop and evaluate different prediction models based on the ranking of the metrics by the individual techniques. The models based on overall connection weights and partial derivatives methods have been found to be most accurate.

Application of Artificial Neural Network for Predicting Maintainability Using Object-Oriented Metrics

Importance of software quality is increasing leading to development of new sophisticated techniques, which can be used in constructing models for predicting quality attributes. One such technique is Artificial Neural Network (ANN). This paper examined the application of ANN for software quality prediction using Object- Oriented (OO) metrics. Quality estimation includes estimating maintainability of software. The dependent variable in our study was maintenance effort. The independent variables were principal components of eight OO metrics. The results showed that the Mean Absolute Relative Error (MARE) was 0.265 of ANN model. Thus we found that ANN method was useful in constructing software quality model.