Testing High-Dimensional Nonparametric Behrens-Fisher Problem

MENG Zhen1,2, LI Na1,2, YUAN Ao3

1. 1. Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China;2. University of Chinese Academy of Sciences, Beijing 100049, China;3. Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University, Washington 20057, USA
• Received:2020-10-12 Revised:2020-11-02 Published:2022-06-20
• Supported by:
This research was supported by the National Natural Science Foundation of China under Grant No. 61903312, Huiyan Project for Research on Innovation and Application of Space Science and Technology under Grant No. CD2B65B6.

MENG Zhen, LI Na, YUAN Ao. Testing High-Dimensional Nonparametric Behrens-Fisher Problem[J]. Journal of Systems Science and Complexity, 2022, 35(3): 1098-1115.

For high-dimensional nonparametric Behrens-Fisher problem in which the data dimension is larger than the sample size, the authors propose two test statistics in which one is U-statistic Rankbased Test (URT) and another is Cauchy Combination Test (CCT). CCT is analogous to the maximumtype test, while URT takes into account the sum of squares of differences of ranked samples in different dimensions, which is free of shapes of distributions and robust to outliers. The asymptotic distribution of URT is derived and the closed form for calculating the statistical significance of CCT is given. Extensive simulation studies are conducted to evaluate the finite sample power performance of the statistics by comparing with the existing method. The simulation results show that our URT is robust and powerful method, meanwhile, its practicability and effectiveness can be illustrated by an application to the gene expression data.
 [1] Ozaki K, Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction, Nature Genetics, 2002, 32(4):650-654.[2] Klein R J, Zeiss C, Chew E Y, et al., Complement factor H polymorphism in age-related macular degeneration, Science, 2005, 308(5720):385-389.[3] Potthoff R F, Use of the Wilcoxon statistic for a generalized Behrens-Fisher problem, Annals of Mathematical Statistics, 1963, 34:1596-1599.[4] Xie T, Cao R, and Yu P, Rank-based test for partial functional linear regression models, Journal of Systems Science and Complexity, 2020, 33(5):1571-1584.[5] Brunner E, Munzel U, and Puri M L, The multivariate nonparametric Behrens-Fisher problem, Journal of Statistical Planning and Inference, 2002, 108:37-53.[6] O'Brien P C, Procedures for comparing samples with multiple endpoints, Biometrics, 1984, 40:1079-1087.[7] Huang P, Tilley B C, Woolson R F, et al., Adjusting O'Brien's test to control type I error for the generalized nonparametric Behrens-Fisher problem, Biometrics, 2005, 61:532-539.[8] Liu A, Li Q, Liu C, et al., A rank-based test for comparison of multidimensional outcomes, Journal of the American Statistical Association, 2010, 105:578-587.[9] Li Z, Cao F, Zhang J, et al., Summation of absolute value test for multiple outcome comparison with moderate effect, Journal of Systems Science and Complexity, 2013, 26(3):462-469.[10] Bonferroni C E, Teoria statistica delle classi e calcolo delle probabilità, Pubblicazioni del R Istituto Superiore di Scienze Economiche e Commerciali di Firenze, 1936, 8:3-62.[11] Mann H B and Whitney D R, On a test of whether one of two random variables is stochastically larger than the other, Annals of Mathematical Statistics, 1947, 18(1):50-60.[12] Liu Y and Xie J, Cauchy combination test:A powerful test with analytic p-value calculation under arbitrary dependency structures, Journal of the American Statistical Association, 2019, 115:393-402.[13] Bu D L, Yang Q L, Meng Z, et al., Truncated tests for combining evidence of summary statistics, Genetic Epidemiology, 2020, 44:687-701.[14] Yankner B A, A century of cognitive decline, Nature, 2000, 404(6774):125.[15] Lu T, Pan Y, Kao S, et al., Gene regulation and DNA damage in the ageing human brain, Nature, 2004, 429:883-891.[16] Li Z B, Liu A, Li Z, et al., Rank-based tests for comparison of multiple endpoints among several populations, Statistics and Its Interface, 2014, 7(1):9-18.[17] Li J, Zhang W, Zhang S, et al., A theoretic study of a distance-based regression model, Science in China Series A Mathematics, 2019, 62:979-998.[18] Wang J, Li J, Xiong W, et al., Group analysis of distance matrices, Genetic Epidemiology, 2020,44:620-628.[19] Koroljuk V S and Borovskich Yu V, Theory of U-Statistics, Kluwer Academic Publishers, The Netherlands, 1994.[20] Hoeffding W and Robbins H, The central limit theorem for dependent random variables, Duke Mathematics Journal, 1948, 15:773-780.[21] Diananda P H, The central limit theorem for m-dependent variables, Mathematical Proceedings of the Cambridge Philosophical Society, 1955, 51:92-95.[22] Orey S A, Central limit theorems for m-dependent random variables, Duke Mathematics Journal, 1958, 25:543-546.[23] Berk K N, A central limit theorem for m-dependent random variables with unbounded m, Annals of Probability, 1973, 1:352-354.[24] Romano J P and Wolf M, A more general central limit theorem for m-dependent random variables with unbounded m, Statistics and Probability Letters, 2000, 47:115-124.
 No related articles found!
Viewed
Full text

Abstract