Cross Project Software Fault Prediction at Design Phase

Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. Earlier we predicted the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven datasets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.




References:
[1] Menzies, T., Greenwald, J., Frank, “A.: Data mining static code
attributes to learn defect predictors” IEEE Trans. Softw. Eng. 33(1), 2–
13 (2007b)
[2] Lessmann, S., Baesens, B., Mues, C., Pietsch, S. “Benchmarking
classification models for software defect prediction: a proposed
framework and novel findings” IEEE Trans. Softw. Eng. 34(4), 485–496
(2008)
[3] C. Andersson, “A Replicated Empirical Study of a Selection Method for
Software Reliability Growth Models,” Empirical Software Eng., vol. 12,
no. 2, pp. 161-182, 2007.
[4] N. E. Fenton and N. Ohlsson, “Quantitative Analysis of Faults and
Failures in a Complex Software System,” IEEE Trans. Software Eng.,
vol. 26, no. 8, pp. 797-814, Aug. 2000.
[5] Tosun, A., Turhan, B., Bener, “A.: Practical considerations in deploying
AI for defect prediction: a case study within the Turkish
telecommunication industry.” In: Proceedings of the 5th International
Conference on Predictor Models in Software Engineering, pp. 1–9
(2009).
[6] Weyuker, E. J., Ostrand, T. J., Bell, R. M. “Comparing the effectiveness
of several modeling methods for fault prediction”. Empir. Softw. Eng.
15(3), 277–295 (2009)
[7] Zimmermann, T., Nagappan, N., Gall, H.: “Cross-project defect
prediction: a large scale experiment on data vs. domain vs. process,” In:
Proceedings of the 7th Joint Meeting of the European Software
Engineering Conference and the ACM SIGSOFT Symposium on The
Foundations of Software Engineering, pp. 91–100 (2009)
[8] Turhan, B., Menzies, T., Bener, A. “On the relative value of crosscompany
and within_company data for defect prediction,” Empir. Softw.
Eng. 14(5), 540–578 (2009)
[9] Watanabe, S., Kaiya, H., Kaijiri, K. “Adapting a fault prediction model
to allow inter language reuse,” In: Proceedings of the International
Workshop on Predictive Models in Software Engineering, pp. 19–24
(2008).
[10] http://promisedata.org/repository.
[11] Ostrand, T.J., Weyuker, E.J., Bell, R.M. “Predicting the location and
number of faults in large software systems,” IEEE Trans. Softw. Eng.
31(4), 340–355 (2005)
[12] D’Ambros, M., Lanza, M., Robbes, R. “An extensive comparison of bug
prediction approaches,” In: Proceedings of the 7th IEEE Working
Conference on Mining Software Repositories, pp. 31–41 (2010)
[13] Tosun, A., Bener, A., Kale, R. “ AI-based software fault predictors:
applications and benefits in a case study” In: Proceedings of the 22th
Innovative Applications of Artificial Intelligence Conference, pp. 1748–
1755 (2010)
[14] Nagappan, N., Ball, T “Use of relative code churn measures to predict
system fault density,” In: Proceedings of the 27th International
Conference on Software Engineering, pp. 284–292 (2005)
[15] Catal, C., Diri, B.: “A systematic review of software fault prediction
studies,” Expert Syst. Appl. 36(4), 7346–7354 (2009) [16] Turhan, B., Bener, A., Menzies, T. “Regularities in learning defect
predictor,” In: The 11th International Conference on Product Focused
Software Development and Process Improvement, pp. 116–130 (2010)
[17] Jureczko, M., Madeyski, L. “Towards identifying software project
clusters with regard to defect prediction,” In: Proceedings of the 6th
International Conference on Predictive Models in Software Engineering,
pp. 1–10 (2010)
[18] Nagappan, N., Ball, T., Zeller, A. “Mining metrics to predict component
failure” In: Proceedings of the 28th International Conference on
Software Engineering, pp. 452–461 (2006).
[19] http://www.cse.lehigh.edu/~gtan/bug/localCopies/nistReport.pdf
[20] Do-178b and mccabe iq. Available in http://www.
mccabe.com/iq_research_whitepapers.htm.
[21] N. Ohlsson and H. Alberg “Predicting fault-prone software modules in
telephone switches,” IEEE Transactions on Software Engineering,
22(12):886–894, 1996.
[22] K. El-Emam, S. Benlarbi, N. Goel, and S.N. Rai, “Comparing Case-
Based Reasoning Classifiers for Predicting High-Risk Software
Components,” J. Systems and Software, vol. 55, no. 3, pp. 301-320,
2001.
[23] T. M. Khoshgoftaar and N. Seliya, “Analogy-Based Practical
Classification Rules for Software Quality Estimation,” Empirical
Software Eng., vol. 8, no. 4, pp. 325-350, 2003.
[24] T. Fawcett, “An Introduction to ROC Analysis,” Pattern Recognition
Letters, vol. 27, no. 8, pp. 861-874, 2006.
[25] C. Wohlin, P. Runeson, M. Host, M.C. Ohlsson, B. Regnell, and A.
Wesslen, Experimentation in Software Engineering: An Introduction.
Kluwer Academic Publishers, 2000.
[26] L. Guo, Y. Ma, B. Cukic, and H. Singh, “Robust Prediction of Fault-
Proneness by Random Forests,” Proc. 15th Int’l Symp Software
Reliability Eng., 2004.
[27] M.J. Harrold, Testing: a roadmap, in: Proceedings of the Conference on
the Future of Software Engineering, ACM Press, New York, NY, 2000.