
海量数据下模型平均的分治算法
Divide and Conquer Algorithms for Model Averaging with Massive Data
随着数据收集技术在近年来的飞速发展, 传统的统计方法都面临着``海量数据"的挑战. 分治算法是应对海量数据的最有效方法之一: 其基本思想是将整个数据集分成若干份较小的数据, 在每份数据上单独拟合统计模型, 然后将多个模型的结果进行整合从而得到最终的结果. 模型平均是当代统计学和计量经济学研究的国际前沿方法, 在经济、金融、生物、医学等方面有着 广泛的应用. 针对线性模型的MMA和JMA方法, 以及广义线性模型的模型平均方法, 文章分别提 出了它们在海量数据下的分治算法, 并通过模拟和实际数据分析来说明算法的有效性和实用性.
With the rapid development of data collection techniques in recent years, traditional statistical methods face the challenge of ``massive data''. Divide and conquer is one of the most efficient ways to deal with massive data. Its basic idea is to divide the whole data to several subsets, fit a statistical model in each single subset, and combine the results from all the subsets to obtain the final result. Model averaging is a frontier method in statistics and economics. It has wide applications in many areas such as economics, finance, biology and medicine. In this paper, we study the divide and conquer algorithms for Mallows model averaging, Jackknife model averaging and model averaging for generalized linear models. Empirical results are provided to support the proposed algorithms.
/
〈 |
|
〉 |