An Empirical Evaluation of Performance of Machine Learning Techniques on Imbalanced Software Quality Data

The development of change prediction models can help the software practitioners in planning testing and inspection resources at early phases of software development. However, a major challenge faced during the training process of any classification model is the imbalanced nature of the software quality data. A data with very few minority outcome categories leads to inefficient learning process and a classification model developed from the imbalanced data generally does not predict these minority categories correctly. Thus, for a given dataset, a minority of classes may be change prone whereas a majority of classes may be non-change prone. This study explores various alternatives for adeptly handling the imbalanced software quality data using different sampling methods and effective MetaCost learners. The study also analyzes and justifies the use of different performance metrics while dealing with the imbalanced data. In order to empirically validate different alternatives, the study uses change data from three application packages of open-source Android data set and evaluates the performance of six different machine learning techniques. The results of the study indicate extensive improvement in the performance of the classification models when using resampling method and robust performance measures.

A Genetic Algorithm Based Classification Approach for Finding Fault Prone Classes

Fault-proneness of a software module is the probability that the module contains faults. A correlation exists between the fault-proneness of the software and the measurable attributes of the code (i.e. the static metrics) and of the testing (i.e. the dynamic metrics). Early detection of fault-prone software components enables verification experts to concentrate their time and resources on the problem areas of the software system under development. This paper introduces Genetic Algorithm based software fault prediction models with Object-Oriented metrics. The contribution of this paper is that it has used Metric values of JEdit open source software for generation of the rules for the classification of software modules in the categories of Faulty and non faulty modules and thereafter empirically validation is performed. The results shows that Genetic algorithm approach can be used for finding the fault proneness in object oriented software components.