• • 上一篇    

基于比例优势模型的有序数据分类

阮腾飞, 张三国, 申立勇   

  1. 中国科学院大学数学科学学院, 北京 100049
  • 收稿日期:2022-01-18 修回日期:2022-05-06 发布日期:2022-11-04
  • 基金资助:
    国家自然科学基金(11731013)资助课题.

阮腾飞, 张三国, 申立勇. 基于比例优势模型的有序数据分类[J]. 系统科学与数学, 2022, 42(10): 2817-2833.

RUAN Tengfei, ZHANG Sanguo, SHEN Liyong. Ordered Data Classification Based on Proportional Odds Model[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(10): 2817-2833.

Ordered Data Classification Based on Proportional Odds Model

RUAN Tengfei, ZHANG Sanguo, SHEN Liyong   

  1. School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049
  • Received:2022-01-18 Revised:2022-05-06 Published:2022-11-04
在机器学习的分类任务中,根据类别之间是否有序,可以把分类任务分为无序数据分类和有序数据分类.比例优势模型是处理有序数据分类的一个常用的模型.但是传统的比例优势模型假设不同类别系数变量是相同的,而在实际情况中并非总是如此.文章对比例优势模型进行改进,不要求不同类别的系数变量完全相同,并且引入fused-LASSO和fused-MCP正则化惩罚.利用MM算法求解模型,并通过最小化BIC准则对正则化参数进行选择.模拟数据研究和真实数据分析都说明改进后的模型比传统的比例优势模型在处理有序多分类任务时有更好的效果.
According to whether the categories are ordered,the classification task can be divided into ordered data classification and unordered data classification.Traditional proportional odds model is popular and assumes that different categories of coefficient variables are the same,but this assumption is not always the suitable in practice.This article improves the proportional odds model,does not require the coefficient variables of different categories to be the same,and the fused-LASSO or fused-MCP regularization penalty is combined.We use MM algorithm to solve the model and select regularization parameters based on minimal BIC criterion.Both simulation studies and real data analysis demonstrate that POM-LASSO (Improved proportional odds model with fused-Lasso penalty) and POM-MCP (Improved proportional odds model with fused-MCP penalty) have better results than traditional proportional odds model when dealing with ordered multi-classification tasks.

MR(2010)主题分类: 

()
[1] Fisher R A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 1936, 7(2):179-188.
[2] Liu I, Agresti A. The analysis of ordered categorical data:An overview and a survey of recent developments. Test, 2005, 14(1):1-73.
[3] Williams J P, Grizzle R E, Williams O, et al. Analyses of contingency tables having ordered response categories. Journal of the American Statistical Association, 1972,(67):55-63.
[4] Simon G. Alternative analyses for the singly-ordered contingency table. Journal of the American Statistical Association, 1974, 69(348):971-976.
[5] Mccullagh P. Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society Series B (Statistical Methodology), 1980, 42(2):109-142.
[6] Anderson J A, Philips P R. Regression, discrimination and measurement models for ordered categorical variables. Journal of the Royal Statistical Society, 1981, 30(1):22.
[7] Potharst R, Bioch J C. Decision trees for ordinal classification. Intelligent Data Analysis, 2000, 4(2):97-111.
[8] 王鑫,王熙照,陈建凯,等.有序决策树的比较研究.计算机科学与探索, 2013, 7(11):1018-1025.(Wang X, Wang X Z, Chen J K, et al. Comparative study on ordinal decision trees. Journal of Frontiers of Computer Science and Technology, 2013, 7(11):1018-1025.)
[9] 王雅辉,钱宇华,刘郭庆.基于模糊优势互补互信息的有序决策树算法.计算机应用, 2021, 41(10):8.(Wang Y H, Qian Y H, Liu G Q. Ordinal decision tree algorithm based on fuzzy advantage complementary mutual information. Journal of Computer Applications, 2021, 41(10):8.)
[10] Brant R. Assessing proportionality in the proportional odds model for ordinal logistic regression. Biometrics, 1990, 46(4):1171-1178.
[11] Kirmani S, Gupta R. On the proportional odds model in survival analysis. Annals of the Institute of Statistical Mathematics, 2001, 53(2):203-216.
[12] Tibshirani R J, Taylor J. The solution path of the generalized lasso. Annals of Statistics, 2011, 39(3):1335-1371.
[13] 李根,邹国华,张新雨.高维模型选择方法综述.数理统计与管理, 2012,(4):640-658.(Li G, Zou G H, Zhang X Y. Model selection for high-dimensional data:A review. Journal of Applied Statistics and Management, 2012,(4):640-658.)
[14] Breheny P, Huang J. Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 2015, 25(2):173-187.
[15] 李淞淋,李扬,易丹辉.有监督Group MCP方法的稳健性研究.统计与信息论坛, 2014,(6):11-17.(Li S L, Li Y, Yi D H. Robustness of the supervised group MCP in variable selection. Statistics&Information Forum, 2014,(6):11-17.)
[16] Piepho H P, Ogutu J O. Regularized group regression methods for genomic prediction:Bridge, MCP, SCAD, group bridge, group lasso, sparse group lasso, group MCP and group SCAD. BMC Proceedings, 2014, 8(5):1-9.
[17] Hunter D R, Lange K. A tutorial on MM algorithms. The American Statistician, 2004, 58(1):30-37.
[18] 张晶,张喆,方匡南,等.基于稀疏结构连续比率模型的消费金融风控研究.统计研究, 2020, 37(11):57-67.(Zhang J, Zhang Z, Fang K N, et al. Sparse structural continuation ratio model with its application in consumer finance risk control. Statistical Research, 2020, 37(11):57-67.)
[1] 于翠翠, 寇红红, 孙少龙. 基于多语言搜索引擎数据预测旅游需求研究[J]. 系统科学与数学, 2022, 42(9): 2383-2398.
[2] 李莉, 季鹏成, 龚炜, 赵慧, 许佳, 于青云. 复杂系统近临界态分析与调控综述[J]. 系统科学与数学, 2022, 42(6): 1423-1437.
[3] 张金岱. 记忆性特征驱动的成品油价格预测研究[J]. 系统科学与数学, 2022, 42(5): 1300-1313.
[4] 龚谊承, 王晓杰, 邹一鸣. 基于蓄水池抽样的智能医保动态风险决策及应用[J]. 系统科学与数学, 2022, 42(4): 802-817.
[5] 朱章鹏,陈长波. 基于机器学习的柱形代数分解变元择序[J]. 系统科学与数学, 2020, 40(8): 1492-1506.
[6] 牟唯嫣,王春玲,赵昕. 基于空间填充准则的交叉验证方法及其应用[J]. 系统科学与数学, 2020, 40(2): 382-388.
[7] 黄志刚,刘佳进,林朝颖. 基于机器学习的上市公司财报舞弊识别前沿方法比较研究[J]. 系统科学与数学, 2020, 40(10): 1882-1900.
阅读次数
全文


摘要