Anempirical comparative study of novel approaches for software defect prediction on class imbalance datasets

  • K. Nitalaksheswara Rao Andhra University, Visakhapatnam.
  • Ch. Satyananda Reddy Andhra University, Visakhapatnam.
Keywords: Knowledge Discovery, Software Defect datasets, Imbalanced data

Abstract

Software Defect Prediction using data mining techniques is one of the best practices for finding defective modules. The on hand classification techniques can be used for efficient knowledge discovery on class balance datasets.  The data in the real world are not completely balance in nature as any one of the class predominantly increases in ratio with other class. This type of data sources are known as class imbalance or skewed data sources. The defect prediction rate for the class imbalance datasets reduces with the increases in the class imbalance nature. The proposed algorithms consists of a novel oversampling, under sampling techniques implemented by removing noisy and weak instances from both majority and minority for better performance of class imbalance data streams. We conduct experiments on software defect datasets with class imbalance nature on three methods using four evaluation measures. The generated results suggest that the problem of class imbalanced software defect datasets can be effectively solved.

Author Biographies

K. Nitalaksheswara Rao, Andhra University, Visakhapatnam.

Department of Computer Science and Systems Engineering

 

Ch. Satyananda Reddy, Andhra University, Visakhapatnam.

Department of Computer Science and Systems Engineering

 

References

1. Abeer S. Desuky • SadiqHussain,” An Improved Hybrid Approach for Handling Class Imbalance Problem”, Arabian Journal for Science and Engineering (2021) 46:3853–3864.https://doi.org/10.1007/s13369-021-05347-7.
2. GgSahar K. Hussin , Salah M. Abdelmageid, Adel Alkhalil, Yasser M. Omar,Mahmoud I. Marie, and Rabie A. Ramadan,” Handling Imbalance Classification Virtual Screening Big DataUsing Machine Learning Algorithms”,HindawiComplexity, Volume 2021, Article ID 6675279, 15 pages, https://doi.org/10.1155/2021/6675279
3. Moses A. Agebure, Peter A. Agbedemnab,” Addressing Class Imbalance in Software Defect Prediction by Averaging”, International Journal of Software and Web Sciences, 19(1), December 2016- February 2017, pp. 09-14
4. SatyaSrinivasMaddipata and MalladiSrinivas,”Software Defect Prediction using KPCA & CSANFIS”,Turkish Journal of Computer and Mathematics Education Vol.12 No.9 (2021), 2429– 2436 .
5. Shamsul Huda, Kevin Liu (Shigang Liu), Mohamed Abdelrazek, Amani Ibrahim, Sultan Alyahya, Hmood Al-Dossari and Shafiq Ahmad,” An ensemble oversampling model for class imbalance problem in software defect prediction”, 2169-3536 (c) 2018 IEEE. Translations, DOI 10.1109/ACCESS.2018.2817572, IEEE Access.
6. Victoria Lopez, Alberto Fernandez, Salvador Garcia, Vasile Palade, Francisco Herrera,” An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics”, Information Sciences 250 (2013) 113–141.
7. Ashwini N, Bharathi R,” Class Imbalance Learning for Software Defect Prediction”, International Journal of Engineering Research & Technology (IJERT), www.ijert.org, NCRTS`14 Conference Proceedings, ISSN: 2278-0181.
8. Peter Gnip, LiberiosVokorokos and Peter Drotár, “Selective oversampling approach for strongly imbalanced data”,PeerJComput. Sci., DOI 10.7717/peerj-cs.604,2021.
9. SikhaBagui and Kunqi Li,” Resampling imbalanced data for network intrusion detection datasets”, (2021) 8:6, https://doi.org/10.1186/s40537-020-00390-x.
10. Pradeep Kumar, RoheetBhatnagar, Kuntal Gaur, and AnuragBhatnagar,” Classification of Imbalanced Data:Review ofMethods and Applications”, IOP Conf. Series: Materials Science and Engineering 1099 (2021) 012077, IOP Publishing, doi:10.1088/1757-899X/1099/1/012077.
11. Minh Thanh Vo, Anh H. Vo, Trang Nguyen, Rohit Sharmaand Tuong Le,” Dealing with the Class Imbalance Problem in the Detection of Fake Job Descriptions”, CMC, 2021, vol.68, no.1,Computers,Materials& Continua, DOI:10.32604/cmc.2021.015645.
12. Ge Song and Yunming Ye,” A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance”,Hindawi Publishing Corporation Scientific World JournalVolume 2014, Article ID 497354, 11 pages,http://dx.doi.org/10.1155/2014/497354.
13. K. Sri Kavya, Dr. Y. Prasanth,” An Ensemble DeepBoost Classifier for Software Defect Prediction”, International Journal of Advanced Trends in Computer Science and Engineering, 9(2), March - April 2020, 2021 – 2028.
14. Cui Yin Huang and Hong Liang Dai,” Learning from class-imbalanced data: review of data driven methods and algorithm driven methods”, Data Science in Finance and Economics Volume 1, Issue 1, 21–36.
15. M. Mostafizur Rahman and D. N. Davis,” Addressing the Class Imbalance Problem in Medical Datasets”, International Journal of Machine Learning and Computing, Vol. 3, No. 2, April 2013,
16. Mateusz Ochal, MassimilianoPatacchiola, Jose Vazquez, Amos Storkey, Sen Wang,” Few-Shot Learning with Class Imbalance”,
Published
2021-12-31