基于随机森林算法的两阶段变量选择研究

冯盼峰,温永仙

系统科学与数学 ›› 2018, Vol. 38 ›› Issue (1) : 119-130.

PDF(767 KB)
PDF(767 KB)
系统科学与数学 ›› 2018, Vol. 38 ›› Issue (1) : 119-130. DOI: 10.12341/jssms13325
论文

基于随机森林算法的两阶段变量选择研究

    冯盼峰,温永仙
作者信息 +

Two-Stage Stepwise Variable Selection Based on Random Forests

    FENG Panfeng, WEN Yongxian
Author information +
文章历史 +

摘要

变量选择在高维数据处理中尤为重要,其中变量的重要性评级是关键问题.文章提出基于随机森林两阶段逐步变量选择算法.第一阶段提出变量重要性排序改进方法,目的进一步提高重要变量与噪声变量的区分度.第二阶段基于随机森林的逐步变量选择.通过模拟数据验证该方法的有效性和可行性.对水稻数据QTL定位进行实证研究,将基于两阶段随机森林逐步变量选择算法与SCAD、Elastic Net、传统QTL定位WinQTLcart2.5 软件的运行结果比较,发现基于随机森林两阶段逐步变量选择算法能有效筛选变量.

Abstract

Variable selection is particularly important in high dimensional data processing, and the variable importance measure is a key problem. In this paper, we propose an algorithm of two-stage stepwise variable selection based on random forests (abbreviate as TSRF). The first stage is a new variable importance measure. The aim is to improve the dipartite degree between important variables and noise variables. The second stage is the improvement method of stepwise variable selection based on random forests. The feasibility and efficiency of the method are verified by Monte Carlo simulations. Example analysis on grains per panicle data in rice, we also apply the SCAD penalized regression and Elastic Net regression to dissect the example. Meanwhile, WinQTLcart2.5 that quantitative trait locus mapping software is used to analyse grains per panicle data. The result showed that TSRF can be effectively used for variable selection.

关键词

随机森林 / 变量选择 / 变量重要性 / QTL 定位.

引用本文

导出引用
冯盼峰 , 温永仙. 基于随机森林算法的两阶段变量选择研究. 系统科学与数学, 2018, 38(1): 119-130. https://doi.org/10.12341/jssms13325
FENG Panfeng , WEN Yongxian. Two-Stage Stepwise Variable Selection Based on Random Forests. Journal of Systems Science and Mathematical Sciences, 2018, 38(1): 119-130 https://doi.org/10.12341/jssms13325
PDF(767 KB)

388

Accesses

0

Citation

Detail

段落导航
相关文章

/