
Machine Learning Methods Investigate Liver Cancer Prediction Problem
HU Xuemei, LI Jiali, JIANG Huifeng
Journal of Systems Science and Mathematical Sciences
2022, 42 (2):
417433.
DOI: 10.12341/jssms21168
Liver cancer has the second highest fatality rate among all cancers. Machine learning methods can improve the accuracy of disease prediction. Therefore, in this paper we mainly apply machine learning methods to study the prediagnosis problem for liver cancer, and improve the prediction accuracy to liver cancer. Firstly, 10 indicators affecting liver cancer are selected as predictors, and 579 liver cancer patients are divided into two groups:A training sample composed of 492 patients are randomly selected, and a testing sample composed of the remaining 87 patients. Then, we take advantage of the training samples to establish six classifiers:Logistic regression, $L_{2}$ penalized logistic regression, Support Vector Machine (SVM), Gradient Boosting Decision Tree (GBDT), Artificial Neural Network (ANN) and eXtreme Gradient Boosting (XGBoost), where logistic regression and $L_{2}$ penalized logistic regression adopt NewtonRaphson algorithm to obtain the iterative weighted least squares estimators for model parameters, calculate the probability estimate of malignant and benign tumor cells in patients, and determine the optimal threshold to predict tumor traits. Finally, the confusion matrix, sensitivity and specificity are calculated by the testing samples, and the ROC curve is drawn to evaluate the prediction accuracy. The results show that in terms of prediction accuracy, $L_{2}$ penalized logistic regression ranks the first, SVM prediction accuracy ranks second, XGBoost prediction accuracy ranks third, logistic regression prediction accuracy ranks fourth, GBDT prediction accuracy ranks fifth, and the prediction accuracies for ANN and random forest are the worst.
Reference 
Related Articles 
Metrics

