Klasifikasi Risiko Kesehatan Ibu Hamil Menggunakan Random Oversampling Untuk Mengatasi Ketidakseimbangan Data


Authors

  • Riska Aryanti Universitas Bina Sarana Informatika, Jakarta, Indonesia
  • Titik Misriati Universitas Bina Sarana Informatika, Jakarta, Indonesia
  • Rahmat Hidayat Universitas Bina Sarana Informatika, Jakarta, Indonesia

DOI:

https://doi.org/10.30865/klik.v3i5.728

Keywords:

Classification; Imbalance Data; PSO; Random Oversampling; Health Risks for Pregnant Women

Abstract

Data imbalance is a common problem in classification, including in maternal health risk classification. Data imbalance occurs when the number of samples in the positive class is much less than the negative class. Data imbalance can cause the classification model to be inaccurate and tend to predict the majority class. One way to overcome the problem of data imbalance is to use the random oversampling technique. In this study, the random oversampling method is applied to overcome the problem of data imbalance in the classification of maternal health risks. Particle swarm optimization (PSO) is used for attribute weighting, improving the results of random oversampling and model performance. The results show that random oversampling can improve accuracy and reduce errors in predicting minority classes. In addition, the PSO technique also significantly contributed to improving the model's accuracy. The results of testing the random forest algorithm using 10-fold cross-validation on the health risks of pregnant women have an accuracy of 80.77%. After going through the random oversampling technique, the accuracy rate reaches 81.86%, and after optimization using the PSO technique, there is an increase of 2.15%, so the accuracy rate reaches 82.92%.

Downloads

Download data is not yet available.

References

W. H. Organization, “Maternal Mortality,” 2021. https://www.who.int/news-room/fact-sheets/detail/maternal-mortality.

M. Ahmed, M. A. Kashem, M. Rahman, and S. Khatun, “Review and Analysis of Risk Factor of Maternal Health in Remote Area Using the Internet of Things (IoT,” Lecture Notes in Electrical Engineering, vol. 632, pp. 357–365, 2020, doi: 10.1007/978-981-15-2317-5_30.

P. K. Wardani, “Faktor-Faktor yang Mempengaruhi Terjadinya Perdarahan Pasca Persalinan,” Jurnal Aisyah?: Jurnal Ilmu Kesehatan, vol. 2, no. 1, pp. 51–60, 2017, doi: 10.30604/jika.v2i1.32.

P. R. Sihombing and I. F. Yuliati, “Penerapan Metode Machine Learning dalam Klasifikasi Risiko Kejadian Berat Badan Lahir Rendah di Indonesia,” MATRIK?: Jurnal Manajemen, Teknik Informatika Dan Rekayasa Komputer, vol. 20, no. 2, pp. 417–426, 2021, doi: 10.30812/matrik.v20i2.1174.

Ardiyansyah and P. A. Rahayuningsih, “Penerapan Teknik Sampling Untuk Mengatasi Imbalance,” Jurnal Teknik Informatika Kaputama (JTIK, vol. 4, no. 1, pp. 7–15, 2020.

N. V Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic Minority Over-Sampling Technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002.

M. Sulistiyono, Y. Pristyanto, S. Adi, and G. Gumelar, “Implementasi Algoritma Synthetic Minority Over-Sampling Technique Untuk Menangani Ketidakseimbangan Kelas Pada Dataset Klasifikasi,” SISTEMASI: Jurnal Sistem Informasi, vol. 10, no. 2, pp. 445–459, 2021.

A. A. Arifiyanti and E. D. Wahyuni, “Smote: Metode Penyeimbang Kelas Pada Klasifikasi Data Mining,” SCAN Jurnal Teknologi Informasi Dan Komunikasi, vol. 15, no. 1, pp. 34–39.

G. L. Pritalia, “Analisis Komparatif Algoritme Machine Learning Pada Klasifikasi Kualitas Air Layak Minum,” KONSTELASI: Konvergensi Teknologi Dan Sistem Informasi, vol. 2, no. 1, pp. 43–55, 2022.

W. Ja?kowski, P. Liskowski, M. Szubert, and K. Krawiec, “The performance profile: A multi–criteria performance evaluation method for test–based problems,” International Journal of Applied Mathematics and Computer Science, vol. 26, no. 1, pp. 215–229, Mar. 2016, doi: 10.1515/amcs-2016-0015.

R. D. Fitriani, H. Yasin, and Tarno, “Penanganan Klasifikasi Kelas Data Tidak Seimbang Dengan Random Oversampling Pada Naive Bayes (Studi Kasus: Status Peserta KB Iud di Kabupaten Kendal,” Jurnal Gaussian, vol. 10, no. 1, pp. 11–20, 2021.

M. P. Paing and S. Choomchuay, “Improved Random Forest (RF) Classifier for Imbalanced Classification of Lung Nodules,” in 2018 International Conference on Engineering, Applied Sciences, and Technology (ICEAST), IEEE, Jul. 2018, pp. 1–4. doi: 10.1109/ICEAST.2018.8434402.

Q. Dai, J. Liu, and J.-L. Zhao, “Distance-based arranging oversampling technique for imbalanced data,” Neural Comput Appl, vol. 35, no. 2, pp. 1323–1342, 2023, doi: 10.1007/s00521-022-07828-8.

Y. E. Achyani, “Penerapan Metode Particle Swarm Optimization Pada Optimasi Prediksi Pemasaran Langsung,” Jurnal Informatika, vol. 5, no. 1, pp. 1–11, 2018.

M. Bourel and A. M. Segura, “Multiclass classification methods in ecology,” Ecol Indic, vol. 85, pp. 1012–1021, Feb. 2018, doi: 10.1016/j.ecolind.2017.11.031.

K. Iwata and K. Ogasawara, “Assessment of the Efficiency of Non-Invasive Diagnostic Imaging Modalities for Detecting Myocardial Ischemia in Patients Suspected of Having Stable Angina,” Healthcare (Switzerland, vol. 11, no. 1, pp. 1–12, 2023, doi: 10.3390/healthcare11010023.

F. Moons and E. Vandervieren, “Measuring agreement among several raters classifying subjects into one-or-more (hierarchical) nominal categories. A generalisation of Fleiss’ kappa,” 2023, [Online]. Available: http://arxiv.org/abs/2303.12502


Bila bermanfaat silahkan share artikel ini

Berikan Komentar Anda terhadap artikel Klasifikasi Risiko Kesehatan Ibu Hamil Menggunakan Random Oversampling Untuk Mengatasi Ketidakseimbangan Data

Dimensions Badge

ARTICLE HISTORY


Published: 2023-04-30
Abstract View: 1117 times
PDF Download: 888 times

Issue

Section

Articles