Coronary Artery Disease Prediction Using Decision Trees and Multinomial Naïve Bayes with k-Fold Cross Validation

  • Endang S Kresnawati Jurusan Matematika, FMIPA, Universitas Sriwijaya, Indonesia
  • Yulia Resti Jurusan Matematika, FMIPA, Universitas Sriwijaya, Indonesia
  • Bambang Suprihatin Jurusan Matematika, FMIPA, Universitas Sriwijaya, Indonesia
  • M. Rendy Kurniawan Jurusan Matematika, FMIPA, Universitas Sriwijaya, Indonesia
  • Widya Ayu Amanda Jurusan Matematika, FMIPA, Universitas Sriwijaya, Indonesia

Abstract

Coronary artery disease has been the leading cause of death in the world population for at least two decades (2000-2019) and has experienced the largest increase in mortality in that time span compared to other causes of death. The success of predicting coronary artery disease early based on medical data is not only beneficial for patients, but also beneficial for the stability of the country's economy. This paper discusses the prediction of coronary artery disease risk by implementing two statistical learning methods, namely Multinomial Naïve Bayes and Decision Tree with 10-fold cross validation, where numerical variables are discretized to obtain categorical variables. The results showed that the Decision Tree method has better performance than the Multinomial Naïve Bayes method in predicting coronary artery disease. The performance measure of the Decision Tree method obtained an accuracy rate of 99.63%, 100% sensitivity, 99.33% specificity, 99.23% precision, and 100% Negative Prediction Value. These measures indicate that the Decision Tree method is appropriate for predicting coronary artery disease, including independent data (other coronary artery disease data with the same predictor variables). The results of this study also show that the different references to previous studies in discretizing numerical variables can improve the performance of the method in predicting coronary artery disease.

References

Aini, S. H. A., Sari, Y. A., dan Arwan, Achmad. (2018). Seleksi Fitur Information Gain untuk Klasifikasi Penyakit Jantung Menggunakan Kombinasi Metode K-Nearest Neighbor dan Naïve Bayes. Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer Vol. 2, No. 9.

Alcalá-Fdez, J., Sánchez, L., García, M.J. del Jesus, S., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernández, J.C., Herrera, F. (2009). KEEL: A Software Tool to Assess Evolutionary Algorithms to Data Mining Problems. Soft Computing 307-318

Alcalá-Fdez, J., Fernandez, A., Luengo, J., Derrac, J., García, S., Sánchez, L., Herrera, F. (2011). KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. Journal of Multiple-Valued Logic and Soft Computing 17:2-3 (2011) 255-287.

Aulia, W. (2018). Sistem Pakar Diagnosa Penyakit Jantung Koroner Dengan Metode Probabilistic Fuzzy Decision Tree. Jurnal Sains dan Informatika. 4(12):106-117.

Bhatia, Sujata K. (2010). Biomaterials for clinical applications (Online-Ausg. ed.). New York: Springer. p. 23. ISBN 9781441969200. Archived from the original on 10 January 2017.

Borghi, C., Dormi, A., L’Italien, G., Lapuerta, P., Franklin, S.S., Collatina, S., Gaddi, A. (2003). The Relationship Between Systolic Blood Pressure and Cardiovascular Risk-Results of the Brisighella Heart Study. The Journal of Clinical Hypertension, Vol. V, No. 1, January/February.

Burger, S. V. (2018). Introduction to Machine Learning with R: Rigorous Mathematical Analysis. Oreilly.

Chen, H., Fu, D. (2018). An Improved Naïve Bayes Classifier for Large Scale Text. Advances in Intelligent Systems Research, volume 146, pp.33-36.

Chowdary, G., J., Suganya, G., Premalatha, M. (2020). Effective Prediction of Cardiovascular Disease Using Cluster of Machine Learning Algorithms. Journal of Critical Reviews, Vol.7 (18), 2192 – 2201.

David, H. B. F., Belcy, S. A. (2018). Heart Disease Prediction using Data Mining Techniques. ICTACT Journal on Soft Computing 9 (1), 1817 - 1823, October.

Gathak, A. (2017). Machine Learning with R. Springer.

Han, J., Kamber, M., dan Pei, J. (2012). Data Mining: Concept and Techniques, Third Edition. Waltham: Morgan Kaufmann.

Hastie, T., Tibshirani, R., Friedman, J.H. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. California: Springer.

Indrajani, Bahana, R., Kosala., R, Haryadi, Y. (2018). Aplikasi Informasi Kesehatan dan Diagnosa Penyakit Jantung Berbasis Android, Seminar Nasional Teknologi Informasi, Komunikasi dan Industri (SNTIKI-10), November.

Maniruzzaman, M., Kumar, N., Abedin, M. M., Islam, M. S., Suri, H.S., El-Baz, A. S., Suri, J.S. (2017). Comparative Approaches for Classification of Diabetes Mellitus Data: Machine Learning Paradigm. Computer Methods and Programs in Biomedicine, vol. 152, pp. 23–34, 2017, doi: 10.1016/j.cmpb.2017.09.004.

Mendis, S., Puska, P., Norrving, B. (2015). Global atlas on cardiovascular disease prevention and control, 1st ed. Geneva: World Health Organization in collaboration with the World Heart Federation and the World Stroke Organization. pp. 3–18. ISBN 9789241564373.

Normawati, D., dan Winiarti, S. (2017). Seleksi Fitur Menggunakan Penambangan Data Berbasis Variable Precision Rough Set (VPRS) Untuk Diagnosis Penyakit Jantung Koroner. Jurnal Ilmu Teknik Elektro Komputer dan Informatika (JITEKI) Vol. 3, No. 2.

Palatini, P. (1999). Need for a Revision of the Normal Limits of Resting Heart Rate. Hypertension, 33:622-625.

Pan, Y., Gao, H., Lin, H., Liu, Z., Tang, L., Li, S. (2018). Identification of Bacteriophage Virion Proteins Using Multinomial Naïve Bayes with g-Gap Feature Tree. International Journal of Molecular Science, 19, 1779; doi:10.3390/ijms19061779.

Pangaribuan J. J., Tedja, C., dan Wibowo, S. (2019). Perbandingan Metode Algoritma C4.5 Dan Extreme Learning Machine untuk Mendiagnosis Penyakit Jantung Koroner. Informatics Engineering Research and Technology Vol. 1, No.1.

Purushottam, Saxena, K., Sharma, R. (2016). Efficient Heart Disease Prediction System. Procedia Computer Science, 85 962 – 969.

Retnasari, T., dan Rahmawati, E. (2017). Diagnosa Prediksi Penyakit Jantung Dengan Model Algoritma Naïve Bayes dan Algoritma C4.5, Konferensi Nasional Ilmu Sosial & Teknologi (KNiST), pp. 7-12, Maret 2017.

Riani, A., Susianto, Y., Rahman, Nur. (2019). Implementasi Data Mining Untuk Memprediksi Penyakit Jantung Mengunakan Metode Naive Bayes. Journal of Innovation Information Technology and Application, Vol.1, No.01, Desember, pp.25-34, DOI: 10.35970/jinita.v1i01.64.

Rodrı´guez, J. D., Rez, A. P., Lozano, J. A. (2010). Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Transaction on Pattern Analysis and Machine Intelligence, vol. 32, no. 3, pp. 569–575, doi: 0162-8828/10/$26.00.

Santoso, H. 2012. Analisis Dan Prediksi Pada Perilaku Mahasiswa Diploma Untuk Melanjutkan Studi Ke Jenjang Sarjana Menggunakan Teknik Decision Tree dan Support Vektor Machine. Tesis, Universitas Sumatera Utara.

Third Report of the National Cholesterol Education Program (NCEP). (2001). Expert Panel on Detection, Evaluation, and Treatment of High Blood Cholesterol in Adults (Adult Treatment Panel III). Executive Summary. National Heart, Lung, and Blood Institute. National Institutes of Health, United State, No. 01-3670, May.

Woodward, M., Webster, R., Murakami, Y., Barzi, F., Lam, T-H., Fang, X., Suh, I., Batty, G. D., Huxley, R., Rodgers, A. (2014). The Association Between Resting Heart Rate, Cardiovascular Disease and Mortality: Evidence From 112,680 Men and Women in 12 Cohorts. European Journal of Preventive Cardiology, Vol 21 (6), 719-726.

World Health Organization (WHO), (2019). Cardiovascular diseases (CVDs). Diambil dari https://www. who.int/card iovascular_diseases/en/. [Accessed: 24-Des-2020].

Published
2021-07-31