• • 上一篇    

基于MCP的非对称最小二乘估计

张晓琴1, 卫夏利2, 米子川1, 李顺勇3   

  1. 1. 山西财经大学统计学院, 太原 030006;
    2. 山西大学经济与管理学院, 太 原 030006;
    3. 山西大学数学科学学院, 太原 030006
  • 收稿日期:2021-05-13 修回日期:2021-10-19 出版日期:2022-05-25 发布日期:2022-07-23
  • 基金资助:
    国家社会科学基金项目(17BTJ010),山西省自然科学基金(201901D111320)资助课题.

张晓琴, 卫夏利, 米子川, 李顺勇. 基于MCP的非对称最小二乘估计[J]. 系统科学与数学, 2022, 42(5): 1344-1360.

ZHANG Xiaoqin, WEI Xiali, MI Zichuan, LI Shunyong. The Asymmetric Least Squares Estimator Based on Minimax Concave Penalty[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(5): 1344-1360.

The Asymmetric Least Squares Estimator Based on Minimax Concave Penalty

ZHANG Xiaoqin1, WEI Xiali2, MI Zichuan1, LI Shunyong3   

  1. 1. School of Statistics, Shanxi University of Finance and Economics, Taiyuan 030006;
    2. School of Economics and Management, Shanxi University, Taiyuan 030006;
    3. School of Mathematical Sciences, Shanxi University, Taiyuan 030006
  • Received:2021-05-13 Revised:2021-10-19 Online:2022-05-25 Published:2022-07-23
作为一种流行的非凸惩罚,极小极大凹惩罚(MCP)在变量选择中被广泛使用.非对称最小二乘回归(ALS)区别于最小二乘回归,能够研究响应变量的整个条件分布.文章基于MCP惩罚,提出带有MCP惩罚的稀疏非对称最小二乘回归模型(MCP-ALS),并得到了相应估计量的性质.文章证明:首先,在一定的正则化条件下,当协变量维度固定时,诱导估计量具有Oracle性质.在高维模型中,当回归误差具有有限阶矩时,诱导估计量具有弱化Oracle性质.其次,通过采取不同的非对称权重值,文章提出的方法能够识别出引起异方差的协变量.数值模拟表明,文章提出的方法在变量选择上有优良的表现,并且能有效检测异方差.最后,将所提方法应用于糖尿病数据集中,实例分析表明,所提方法在实现变量选择的同时,能够挖掘解释变量与响应变量之间的潜在关系,以期对糖尿病人病情的预测和控制提供借鉴.
As a promising nonconvex penalty, the minimax concave penalty (MCP) has been a widely used technique in variable selection. Asymmetric least squares regression is proposed as an alternative regression to investigate the whole conditional distribution of the response variable. In this paper, we investigate the minimax concave penalty in sparse asymmetric least squares regression models (MCP-ALS). Under some regular conditions, we prove that the MCP-ALS estimator enjoys oracle property when the covariate dimension is fixed. In high dimensional model, we obtain the weaken oracle property of the estimator when the error has finite moments. As a by-product, our proposed method is able to detect heteroscedasticity by taking different asymmetric weight values. The results from simulation show that the proposed method has good performance on variable selection and can detect heteroscedasticity efficiently. Finally, the proposed method is applied to the diabetes dataset. The real analysis shows that the proposed method can mine the potential relationship between explanatory variables and response variables while realizing variable selection to provide a reference for the prediction and control of the condition of diabetic patients.

MR(2010)主题分类: 

()
[1] 陈心洁,赵志豪.高维纵向数据的模型平均估计.系统科学与数学, 2020, 40(7):1297-1324.(Chen X J, Zhao Z H. Model average for high-dimensional longitudinal data. Journal of Systems Science and Mathematical Sciences, 2020, 40(7):1297-1324.)
[2] 宋瑞琪,朱永忠,王新军.高维数据中变量选择研究.统计与决策, 2019,35(2):13-16.(Song R Q, Zhu Y Z, Wang X J. Research on variable selection in high-dimensional data. Statistics&Decision, 2019, 35(2):13-16.)
[3] 舒时克,李路.正则稀疏化的多因子量化选股策略.计算机工程与应用, 2021, 57(1):110-117.(Shu S K, Li L. Multi-factor quantitative stock selection strategy based on sparsity penalty. Computer Engineering and Applications, 2019, 35(2):13-16.)
[4] Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 2001, 96(456):1348-1360.
[5] Kim Y, Choi H, Oh H. Smoothly clipped absolute deviation on high dimensions. Journal of the American Statistical Association, 2008, 103(484):1665-1673.
[6] Zhang C. Nearly unbiased variable selection under minimax concave penalty. The Annals of Statistics, 2010, 38(2):894-942.
[7] 张旭宇,赵丽华.基于MCP估计的两部模型及其在家庭医疗费用影响因素分析中的应用.中国卫生统计, 2020,37(4):605-609.(Zhang X Y, Zhao L H. Two-part model based on MCP estimation and its application in the analysis of influencing factors of family medical expenses. Chinese Journal of Health Statistics, 2020, 37(4):605-609.)
[8] Li N, Yang H. Nonnegative estimation and variable selection under minimax concave penalty for sparse high-dimensional linear regression models. Stat Papers, 2019, https://doi.org/10.1007/s00362-019-01107-w.
[9] Shi Y, Jiao Y, Cao Y, et al. An alternating direction method of multipliers for MCP-penalized regression with high-dimensional data. Acta Mathematica Sinica, 2018, 34(12):1892-1906.
[10] Wang L, Wu Y, Li R. Quantile regression for analyzing heterogeneity in ultra-high dimension. Journal of the American Statistical Association, 2012, 107(497):214-222.
[11] 金蛟,崔恒建.回归模型的同方差检验.系统科学与数学, 2006, 26(2):217-227.(Jin J, Cui H J. Test for homoskedasticity of variance in nonparametric regression model. Journal of Systems Science and Mathematical Sciences, 2006, 26(2):217-227.)
[12] 李顺勇,钱宇华,张晓琴,等.基于变量选择和聚类分析的两阶段异方差模型估计.应用概率统计, 2018, 34(2):191-200.(Li S Y, Qian Y H, Zhang X Q, et al. Two-stage estimation about heteroscedastic model based on variable selection and cluster analysis. Chinese J. Appl. Probab. and Statist., 2018, 34(2):191-200.)
[13] Wu Y, Liu Y. Variable selection in quantile regression. Statistica Sinica, 2009, 19(2):801-817.
[14] 陶丽,张元杰,田茂再.动态面板数据的自适应惩罚分位回归方法研究.系统科学与数学, 2017, 37(11):2245-2259.(Tao L, Zhang Y J, Tian M Z. Adaptive penalty quantile regression for dynamic panel data. Journal of Systems Science and Mathematical Sciences, 2017, 37(11):2245-2259.)
[15] Su M, Wang W. Elastic net penalized quantile regression model. Journal of Computational and Applied Mathematics, 2021, 392(6):113462.
[16] Koenker R, Bassett G. Robust tests for heteroscedasticity based on regression quantiles. Econometrica, 1982, 50(1):43-61.
[17] Newey W K, Powell J L. Asymmetric least squares estimation and testing. Econometrica, 1987, 55(4):819-847.
[18] 杨文华,卢露,周凯.基于Lasso-Expectile行业系统性风险测度.统计与决策, 2019, 35(16):151-154.(Yang W H, Lu L, Zhou K. Industry systematic risk measurement based on Lasso-Expectile. Statistics&Decision, 2019, 35(16):151-154.)
[19] Volker K, Henryk Z. Statistical inference for expectile-based risk measures. Scandinavian Journal of Statistics, 2017, 44(2):425-454.
[20] 张敏强,王宣承.异方差条件下两种回归方法的比较.统计与决策, 2011,(12):9-12.(Zhang M Q, Wang X C. Comparison of two regression methods under heteroscedasticity conditions. Statistics&Decision, 2011,(12):9-12.)
[21] Gu Y, Zou H. High-dimensional generalizations of asymmetric least squares regression and their applications. The Annals of Statistics, 2016, 44(12):2661-2694.
[22] Fan J, Li Q, Wang Y. Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions. Journal of the Royal Statistical Society Series B, Statistical Methodology, 2017, 79(1):247-265.
[23] Zhao J, Chen Y Y, Zhang Y. Expectile regression for analyzing heteroscedasticity in high dimension. Statistics&Probability Letters, 2018, 137:304-311.
[24] 刘丽萍.高维厚尾金融数据协方差阵的统计估计及应用.统计与信息论坛, 2018, 33(2):59-64.(Liu L P. Statistical estimation and application of covariance matrix of high dimensional heavytailed financial data. Journal of Statistics and Information, 2018, 33(2):59-64.)
[25] Cao C, Chen M, Wang Y. Heteroscedastic replicated measurement error models under asymmetric heavy-tailed distributions. Computational Statistics, 2018, 33(1):319-338.
[26] Liao L, Park C, Choi H. Penalized expectile regression:An alternative to penalized quantile regression. Annals of the Institute of Statistical Mathematics, 2019, 71(2):409-438.
[27] Xu Q F, Ding X H, Jiang C X, et al. An elastic-net penalized expectile regression with applications. Journal of Applied Statistics, 2021, 48(12):2205-2230.
[28] Zhao J, Yan G, Zhang Y. Robust estimation and shrinkage in ultra-high dimensional expectile regression with heavy tails and variance heterogeneity. Statistical Papers, 2021, 1-28, DOI:10.48550/arXiv.1909.09302.
[29] 李顺勇,卫夏利,张晓琴.异方差下正则化Expectile回归的变量选择.河南理工大学学报(自然科学版), 2020,39(4):125-132.(Li S Y, Wei X L, Zhang X Q. Variable selection in regularized expectile regression with heteroscedasticity. Journal of Henan Polytechnic University (Natural Science), 2020, 39(4):125-132.)
[1] 赵慧, 董庆凯. 当前状态数据的可加风险模型变量选择方法[J]. 系统科学与数学, 2022, 42(5): 1314-1329.
[2] 罗孝敏, 彭定涛, 张弦. 基于MCP正则的最小一乘回归问题研究[J]. 系统科学与数学, 2021, 41(8): 2327-2337.
[3] 姜云卢, 邓罡, 文诗涵, 刘峻成. 高维稳健典型相关分析研究与应用[J]. 系统科学与数学, 2021, 41(10): 2965-2976.
[4] 张立文,朱周帆,郝鸿. 基于深度学习的乘用车市场预警模型研究[J]. 系统科学与数学, 2020, 40(11): 2136-2150.
[5] 闫懋博,田茂再. 多种分布下选择后变量显著性分析及其在CEPS数据中的应用[J]. 系统科学与数学, 2020, 40(1): 141-155.
[6] 周建红,赵尚威. 高维泊松回归的模型平均方法[J]. 系统科学与数学, 2018, 38(6): 679-687.
[7] 季琳琳,廖军,宗先鹏. 异方差线性测量误差模型的平均估计[J]. 系统科学与数学, 2018, 38(6): 688-701.
[8] 冯盼峰,温永仙. 基于随机森林算法的两阶段变量选择研究[J]. 系统科学与数学, 2018, 38(1): 119-130.
[9] 林鹏. 一般线性混合效应模型的随机效应选择研究[J]. 系统科学与数学, 2015, 35(6): 617-626.
[10] 金立斌,戴晓文,石磊. 广义空间模型的方差齐性检验[J]. 系统科学与数学, 2015, 35(12): 1436-1445.
[11] 魏千舒,宋立新,王晓光.  带有外生变变量的动态条件相关模型[J]. 系统科学与数学, 2015, 35(12): 1479-1486.
[12] 林金官;韦博成. 非线性随机效应模型的异方差性检验[J]. 系统科学与数学, 2002, 22(2): 245-256.
阅读次数
全文


摘要