选择性集成学习可以使用较少数目的基分类器, 提
高集成分类能力. 基分类器间的差异性和平均精度是影响集成
性能的两个重要指标. 当集成系统中基分类器间差异性较
大时, 则其平均精度较小; 当基分类器的平均精度较大时,
则其差异性较小. 故二者之间的平衡状态可使集成性能达到
最优. 为了寻找该平衡状态, 提出一种基于反向二元萤火虫算
法和差异性测度的选择性集成方法 (RBGSODSEN). 首先, 采用
Bootstrap抽样方法独立训练出多个基极限学习机(extreme learning machine, ELM), 构
建原始基ELM池; 其次, 采用差异性测度对原始基ELM池进行预选择, 选择部
分差异性和预测精度均较优的基ELM, 剔除综合性能较差的基ELM, 降低选择
性集成的计算复杂度; 接着, 改进萤火虫的位置更新方式, 引入反向搜索、
协同进化和随机变异机制, 提出一种反向二元萤火虫算法 (RBGSO); 最
后, 采用RBGSO对预选择后剩下的基ELM进行二次选择, 选择出集成性
能最优的基ELM子集成. 在25个标准数据集上的实验结果表明, 与其他
选择性集成方法相比较, RBGSODSEN选择了较少规模的基ELM, 取得了
更优的预测性能, 具有较好的稳定性、有效性和显著性.
Abstract
Selective ensemble can enhance the ensemble
classification ability using less number of base classifiers.
Diversity and the average accuracy of base classifiers are
two important indicators that affect the ensemble performance.
If the base classifiers in an ensemble system have a large diversity,
then it achieves a low average accuracy. If the mean accuracy of
base classifiers is low, then it performs less diversity. Hence,
the balance between them can make the ensemble perform the best.
To find the balance, selective ensemble based on reverse binary
glowworm swarm optimization and diversity measure (RBGSODSEN) is
proposed. Firstly, an initial pool of base extreme learning
machine (ELM) is constructed through training some base ELMs
independently by using bootstrap sampling method; Secondly,
those base ELMs in the pool are pre-pruned using the diversity
measure, and the computation complexity can be remarkably reduced.
Then some base ELMs with better difference and prediction accuracy
are selected, and the base ELMs with poor comprehensive performance
are eliminated; Thirdly, reverse binary glowworm swarm optimization
(RBGSO) is proposed by improving moving way of glowworms, and
introducing reverse searching, co-evolution and random mutation
mechanisms; Finally, the optimal sub-ensemble of base ELMs is
selected from the remaining base ELMs after pre-pruning by
using RBGSO. Experimental results on 25 UCI datasets indicate
that RBGSODSEN obviously outperforms other selective ensemble
approaches with less size of base ELMs. It has relatively high
stability, effectiveness and significance.
关键词
选择性集成, 二元萤火虫算法, 反向搜索, 差异性测度, 极限学习机.
{{custom_keyword}} /