### Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model

LI Haoqi1,2, LI Yuan1

1. 1. School of Economics and Statistics, Guangzhou University, Guangzhou 510006, China; 2. School of Mathematics and Statistics, Yangtze Normal University, Chongqing 408100, China
• Received:2020-09-17 Revised:2021-01-07 Published:2022-06-20
• Supported by:
This research was supported by the National Natural Science Foundation of China under Grant No. 62073096, the Science Center Program of National Natural Science Foundation of China under Grant No. 62188101 and the Heilongjiang Touyan Team Program.

LI Haoqi, LI Yuan. Estimating the Size of an Open Population with Massive Datasets Based on a Generalized Varying-Coefficient Model[J]. Journal of Systems Science and Complexity, 2022, 35(3): 1116-1136.

A generalized varying-coefficient model is proposed to estimate a population size at a specific time from multiple lists of an open population. The research datasets have millions of records with a very long time span (38 years), bringing challenges to calculations. The authors develop a regularization iterative algorithm to overcome this difficulty. The asymptotic distribution of the proposed estimators is derived. Simulation studies show that the procedure works well. The method is applied to estimate the number of drug abusers in Hong Kong, China over the period 1977-2014.
 [1] Cormack R, Log-linear models for capture-recapture, Biometrics, 1989, 45:395-413.[2] Fienberg S, The multiple recapture census for closed population and incomplete 2k contingency tables, Biometrika, 1972, 59:591-603.[3] Lin H, Yip P, and Chen F, Estimating the population size for a multiple list problem with an open population, Statistica Sinica, 2009, 19:177-196.[4] International Working Group for Disease Monitoring and Forecasting, Capture-recapture and multiple-record systems estimation, I:History and theoretical development, Am. J. Epidemiol, 1995a, 142:1047-1058.[5] International Working Group for Disease Monitoring and Forecasting. Capture-recapture and multiple-record systems estimation, II:Applications in human diseases, Am. J. Epidemiol, 1995b, 142:1059-1068.[6] Cormack R and Jupp P, Inference for Poisson and multinomial models for capture-recapture experiments, Biometrika, 1991, 78:911-916.[7] Chao A and Lee S, Estimating the number of classes via sample coverage, J. Amer. Statist. Assoc, 1992, 87:210-217.[8] Huggins R and Yip P, Estimation of the size of the open population from capture-recapture data using weighted martingale methods, Biometrics, 1999, 55:387-395.[9] Huggins R, Yang H, Chao A, et al., Population size estimation using local sample coverage for open populations, J. Statist. Plann. Inference, 2003, 113:699-714.[10] Yang H, Huggins R, and Clark A, Estimation of the size of an open population using local estimating equations II:A partially parametric approach, Biometrics, 2003, 59:365-374.[11] Alho J, Logistic regression in capture-recapture models, Biometrics, 1990, 46:623-635.[12] Huggins R, On the statistical analysis of capture experiments, Biometrika, 1989, 76:133-140.[13] Zwane E and Van Der Heijden P, Semiparametric models for capture-recapture studies with covariates, Computational Statistics&Data Analysis, 2004, 47:729-743.[14] Hwang W and Huggins R, A semiparametric model for a functional behavioural response to capture in capture-recapture experiments, Australian&New Zealand Journal of Statistics, 2011, 53:191-202.[15] Stoklosa J and Huggins R, A robust P-spline approach to closed population capture-recapture models with time dependence and heterogeneity, Computational Statistics&Data Analysis, 2012, 56:408-417.[16] Huggins R, Yip P, and Stoklosa J, Nonparametric estimation of the size of an open population from repeated multiple list, Australian&New Zealand Journal of Statistics, 2016, 58:1-13.[17] Huggins R, Stoklosa J, Roach C, et al., Estimating the size of an open population using sparse capture-recapture data, Biometrics, 2018, 74:280-288.[18] Stoklosa J, Hwang W, Yip P, et al., Accounting for contamination and outliers in covariates for open population capture-recapture models, Journal of Statistical Planning and Inference, 2016, 176:52-63.[19] Li H, Lin H, Yip P, et al., Estimating population size of heterogeneous populations with large data sets and a large number of parameters, Computational Statistics&Data Analysis, 2019, 139:34-44.[20] Chen K, Parametric and semiparametric models for recapture and removal studies:A likelihood approach, J. R. Statist. Soc. B, 2001, 63:607-619.[21] Gray R, Flexible methods for analyzing survival data using splines, with application to breast cancer prognosis, Journal of the American Statistical Association, 1992, 87, 942-951.[22] Michelot T, Langrock R, Kneib T, et al., Maximum penalized likelihood estimation in semiparametric mark-recapture-recovery models, Biometrical Journal, 2016, 58:222-239.[23] Lehmann E, Elements of Large-Sample Theory, Springer, New York, 1999.
 No related articles found!
Viewed
Full text

Abstract