Research Details

Application of Machine Learning Techniques to Predict Breast Cancer Survival

ชื่อเรื่อง Application of Machine Learning Techniques to Predict Breast Cancer Survival
ผู้แต่ง

Jaree Thongkam Vatinee Sukmak Papidchaya Klangnok

ประเภท บทความวิจัย
Abstract Despite recent significant advances in big data analytics, there is substantial evidence of machine learning techniques that perform poorly when building prediction models. This research aimed to investigate the performance and effectiveness of machine learning techniques including Naive Bayes (NB), PART, Random Forest (RF), Support Vector Machine (SVM), Adaboost, and Bagging in order to advance existing understandings of model behavior with big data. A large dataset of hospital-based breast cancer from the SEER data file with diagnostic information was used from 2005 to 2014. To address outliers and imbalance issues, we used C4.5 and Synthetic Minority Oversampling TEchnique (SMOTE) to eliminate outliers and balance the dataset. Stratified 10-fold cross-validation was used to divide the dataset to reduce bias and variance of experimental results. Accuracy, G-mean (G), F-measure, and Matthews correlation coefficient (MCC) are employed as criteria to present the overall performance of the models. Moreover, sensitivity, specificity, and precision are utilized as criteria to show the insightful performance of the models. The experimental results indicate that RF is superior to Naive Bayes (NB), PART, Support Vector Machine (SVM), Adaboost, and Bagging in all criteria. Also, models generated from datasets with few outliers and balanced data outperform the original dataset in terms of insight and overall performances.
ชื่อวารสาร In book: Multi-disciplinary Trends in Artificial Intelligence
ปีที่
ฉบับที่ / เล่มที่
ปีที่พิมพ์ 2021
หน้าที่ 141-151
Website https://link.springer.com/chapter/10.1007/978-3-030-80253-0_13
ไฟล์ข้อมูล
ขนาดไฟล์
จำนวนผู้เข้าถึง
Format
N/A
0