### Analysis of Influence Factors and Prediction for Employee Turnover

WANG Guanpeng, QIN Shuangyan, CUI Hengjian

1. School of Mathematical Sciences, Capital Normal University, Beijing 100048
• Received:2021-06-11 Revised:2022-01-28 Online:2022-06-25 Published:2022-07-29

WANG Guanpeng, QIN Shuangyan, CUI Hengjian. Analysis of Influence Factors and Prediction for Employee Turnover[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(6): 1616-1632.

This article adopts high-dimensional variable screening method to make analysis of influence factors for employee turnover, as well as to predict the possibility of employee turnover. For high-dimensional data, MV (mean of variance, see Cui, et al. (2015)) method and LASSO method are used to select variables related to employee turnover, which can be entered the classification model. To ensure the prediction accuracy of the classification model, this paper uses four models including support vector machine, random forest, XGBoost and Logistic model to predict the possibility of employee turnover. In 100 experiments, compared to other 7 models combined with MV method, the average classification accuracy of the random forest model combined with the MV variable selection is more higher, as high as 95.43%. The above experimental results are verified by changing the ratio of training set to validation set, sampling 80% sample data, and adding random disturbances. It is found that the average classification accuracy of random forest model with MV method is still higher, this means the model has robustness.

CLC Number:

 [1] Cui H, Li R, Zhong W.Model-free feature screening for ultrahigh dimensional discriminant analysis.Journal of the American Statistical Association, 2015, 110(510):630-641.[2] Akaike H.Information theory and an extension of the maximum likelihood principle.2nd Information Symposium on Information Theory, Springer, 1973.[3] Schwarz G.Estimating the dimension of a model.Annals of Statistics, 1978, 6:461-464.[4] Fitzgerald M A, Bergman, C J, Resurreccion A P, et al.Partial least squares regression:A tutorial.Analytica Chimica Acta, 1986, 186:1-17.[5] Anderson T W.An introduction of multivariate statistical analysis, 3rd Ed.Wiley Series in Probability and Mathematical Statistics, 2003.[6] Hoerl A E, Kennard R W.Ridge regression:Biased estimation for non-orthogonal problems.Techometrics, 1970, 12:55-68.[7] Tibshirani R.Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society Series B, 1996, 58(1):273-282.[8] Zou H.The adaptive lasso and its oracle properties.Journal of the American Statistical Association, 2006, 101(476):1418-1429.[9] Fan J, Li R.Variable selection via non-concave penalized likelihood and its oracle properties.Journal of the American Statistical Association, 2001, 96(456):1348-1360.[10] Cui H, Zhong W.A distribution-free test of independence and its application to variable selection.Computational Statistics and Data Analysis, 2019, 139:117-133.[11] 代春倩,赵良伟,崔恒建.基于MV扫描和Logistic回归下的手机媒体性别营销.统计与管理, 2018, 6:60-66.(Dai C Q, Zhao L W, Cui H J.Mobile media gender marketing based on MV scanning and logistic regression.Statistics and Management, 2018, 6:60-66.)[12] 茆诗松,程依明,璞晓龙.概率论与数理统计教程(第二版).北京:高等教育出版社, 2011.(Miao S S, Cheng Y M, Pu X L.Probability Theory and Mathematical Statistics Course (Second Edition).Beijing:Higher Education Press, 2011.)[13] Huang D, Li R, Wang H.Feature screening for ultrahigh dimensional categorical data with applications.Journal of Business and Economics Statistics, 2014, 32(2):237-244.[14] Breiman L.Randomforests.Machine Learning, 2001, 45:5-32.[15] 李芸,胡可,董欣雨,等.基于SVM算法的企业员工离职预警研究.中国商论, 2020, 6:20-22.(Li Y, Hu K, Dong X Y, et al.Research on early warning of enterprise employee turnover based on SVM algorithm.China Business Review, 2020, 6:20-22.)[16] Chen T, Guestrin C.XGBoost:A scalable tree boosting system.The 22nd ACM SIGKDD International Conference, 2016.[17] Eugene C.Introduce to Deep Learning.Cambridge, MA:MIT Press, 2015.
 [1] LI Shanhai, WU Yanxiong, WANG Bei, XU Yan, LIU Yulong. Prediction of Enterprise Growth in Information Technology Listed Campanies Based on GA-BP Network [J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(4): 854-866. [2] HU Xuemei, LI Jiali, JIANG Huifeng. Machine Learning Methods Investigate Liver Cancer Prediction Problem [J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(2): 417-433. [3] WANG Yong, DONG Hengxin. The Forecast of China's Quarterly Unemployment Rate in the Background of Big Data --- Analysis Based on Network Search Data [J]. Journal of Systems Science and Mathematical Sciences, 2017, 37(2): 460-472. [4] ZHANG Yan, ZHANG Chenguang,ZHANG Xiahuan. BALANCED GRAPH BASED SEMI-SUPERVISED LEARNING METHOD [J]. Journal of Systems Science and Mathematical Sciences, 2016, 36(8): 1107-1118. [5] LIU Rongxuan, ZHU Xianyang, ZHU Shaoping ,LI Hua. THE EXCELLENT PROPERTIES OF THE EMPIRICAL BAYES ESTIMATION OF THE PARAMETERS OF THE THREE-PARAMETER BURR DISTRIBUTION FAMILIES [J]. Journal of Systems Science and Mathematical Sciences, 2013, 33(8): 913-921. [6] ZHANG Guoshan, WANG Yiming, WANG Shiwei, LIU Wanquan. IMPROVED METHOD TO SOLVE ORDINARY DIFFERENTIAL EQUATIONS APPROXIMATE SOLUTIONS BASED ON LS-SVMS [J]. Journal of Systems Science and Mathematical Sciences, 2013, 33(6): 695-707. [7] XU Honggui;ZHAO Kun;TIAN Yingjie. Robust Semi-supervised $\nu$-Support Vector Machines [J]. Journal of Systems Science and Mathematical Sciences, 2010, 30(2): 265-273. [8] YU Lean;WANG Shouyang. A Kernel Principal Component Analysis Based Least Squares Fuzzy Support Vector Machine Methodology with Variable Penalty Factors for Credit Classification [J]. Journal of Systems Science and Mathematical Sciences, 2009, 29(10): 1311-1326. [9] Ma Runing;Chen Tianping. OPERATOR APPROXIMATION METHOD BASED ON SUPPORT VECTOR MACHINE [J]. Journal of Systems Science and Mathematical Sciences, 2005, 25(5): 634-640.
Viewed
Full text

Abstract