• •

### 基于比例优势模型的有序数据分类

1. 中国科学院大学数学科学学院, 北京 100049
• 收稿日期:2022-01-18 修回日期:2022-05-06 发布日期:2022-11-04
• 基金资助:
国家自然科学基金(11731013)资助课题.

RUAN Tengfei, ZHANG Sanguo, SHEN Liyong. Ordered Data Classification Based on Proportional Odds Model[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(10): 2817-2833.

### Ordered Data Classification Based on Proportional Odds Model

RUAN Tengfei, ZHANG Sanguo, SHEN Liyong

1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049
• Received:2022-01-18 Revised:2022-05-06 Published:2022-11-04

According to whether the categories are ordered,the classification task can be divided into ordered data classification and unordered data classification.Traditional proportional odds model is popular and assumes that different categories of coefficient variables are the same,but this assumption is not always the suitable in practice.This article improves the proportional odds model,does not require the coefficient variables of different categories to be the same,and the fused-LASSO or fused-MCP regularization penalty is combined.We use MM algorithm to solve the model and select regularization parameters based on minimal BIC criterion.Both simulation studies and real data analysis demonstrate that POM-LASSO (Improved proportional odds model with fused-Lasso penalty) and POM-MCP (Improved proportional odds model with fused-MCP penalty) have better results than traditional proportional odds model when dealing with ordered multi-classification tasks.

MR(2010)主题分类:

()
 [1] Fisher R A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7(2):179-188.[2] Liu I, Agresti A. The analysis of ordered categorical data:An overview and a survey of recent developments. Test, 2005, 14(1):1-73.[3] Williams J P, Grizzle R E, Williams O, et al. Analyses of contingency tables having ordered response categories. Journal of the American Statistical Association, 1972,(67):55-63.[4] Simon G. Alternative analyses for the singly-ordered contingency table. Journal of the American Statistical Association, 1974, 69(348):971-976.[5] Mccullagh P. Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society Series B (Statistical Methodology), 1980, 42(2):109-142.[6] Anderson J A, Philips P R. Regression, discrimination and measurement models for ordered categorical variables. Journal of the Royal Statistical Society, 1981, 30(1):22.[7] Potharst R, Bioch J C. Decision trees for ordinal classification. Intelligent Data Analysis, 2000, 4(2):97-111.[8] 王鑫,王熙照,陈建凯,等.有序决策树的比较研究.计算机科学与探索, 2013, 7(11):1018-1025.(Wang X, Wang X Z, Chen J K, et al. Comparative study on ordinal decision trees. Journal of Frontiers of Computer Science and Technology, 2013, 7(11):1018-1025.)[9] 王雅辉,钱宇华,刘郭庆.基于模糊优势互补互信息的有序决策树算法.计算机应用, 2021, 41(10):8.(Wang Y H, Qian Y H, Liu G Q. Ordinal decision tree algorithm based on fuzzy advantage complementary mutual information. Journal of Computer Applications, 2021, 41(10):8.)[10] Brant R. Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics, 1990, 46(4):1171-1178.[11] Kirmani S, Gupta R. On the proportional odds model in survival analysis. Annals of the Institute of Statistical Mathematics, 2001, 53(2):203-216.[12] Tibshirani R J, Taylor J. The solution path of the generalized lasso. Annals of Statistics, 2011, 39(3):1335-1371.[13] 李根,邹国华,张新雨.高维模型选择方法综述.数理统计与管理, 2012,(4):640-658.(Li G, Zou G H, Zhang X Y. Model selection for high-dimensional data:A review. Journal of Applied Statistics and Management, 2012,(4):640-658.)[14] Breheny P, Huang J. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 2015, 25(2):173-187.[15] 李淞淋,李扬,易丹辉.有监督Group MCP方法的稳健性研究.统计与信息论坛, 2014,(6):11-17.(Li S L, Li Y, Yi D H. Robustness of the supervised group MCP in variable selection. Statistics&Information Forum, 2014,(6):11-17.)[16] Piepho H P, Ogutu J O. Regularized group regression methods for genomic prediction:Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. BMC Proceedings, 2014, 8(5):1-9.[17] Hunter D R, Lange K. A tutorial on MM algorithms. The American Statistician, 2004, 58(1):30-37.[18] 张晶,张喆,方匡南,等.基于稀疏结构连续比率模型的消费金融风控研究.统计研究, 2020, 37(11):57-67.(Zhang J, Zhang Z, Fang K N, et al. Sparse structural continuation ratio model with its application in consumer finance risk control. Statistical Research, 2020, 37(11):57-67.)
 [1] 于翠翠, 寇红红, 孙少龙. 基于多语言搜索引擎数据预测旅游需求研究[J]. 系统科学与数学, 2022, 42(9): 2383-2398. [2] 李莉, 季鹏成, 龚炜, 赵慧, 许佳, 于青云. 复杂系统近临界态分析与调控综述[J]. 系统科学与数学, 2022, 42(6): 1423-1437. [3] 张金岱. 记忆性特征驱动的成品油价格预测研究[J]. 系统科学与数学, 2022, 42(5): 1300-1313. [4] 龚谊承, 王晓杰, 邹一鸣. 基于蓄水池抽样的智能医保动态风险决策及应用[J]. 系统科学与数学, 2022, 42(4): 802-817. [5] 朱章鹏，陈长波. 基于机器学习的柱形代数分解变元择序[J]. 系统科学与数学, 2020, 40(8): 1492-1506. [6] 牟唯嫣，王春玲，赵昕. 基于空间填充准则的交叉验证方法及其应用[J]. 系统科学与数学, 2020, 40(2): 382-388. [7] 黄志刚，刘佳进，林朝颖. 基于机器学习的上市公司财报舞弊识别前沿方法比较研究[J]. 系统科学与数学, 2020, 40(10): 1882-1900.