在统计机器学习研究中, 基于折交叉验证的AUC (Area Under ROC Curve) 度量常常被用作分类算法性能的评价. 然而, 点估计显然没有考虑方差的信息, 为此, 基于正态假定的折交叉验证分布构造的AUC度量的通用对称置信区间(区间估计) 被提出. 但是, 这些对称置信区间往往表现出低的置信度或长的区间长度, 从而容易导致激进的(liberal) 统计推断结果. 通过对AUC度量的理论分析, 发现AUC度量的真实分布实际上是非对称的, 此时简单使用对称分布去近似它显然是不合适的. 因此, 针对二类分类问题, 本文提出了一种新的基于折交叉验证Beta分布的AUC度量的非对称置信区间, 在模拟和真实数据实验上验证了提出的置信区间相对于传统的基于折交叉验证分布的对称置信区间的优越性.
In statistical machine learning research, the AUC (Area Under ROC Curve) measure based on -fold cross-validation is always used to measure the classification algorithm performance. However, the point estimation obviously does not consider the variance information. For this reason, the commonly used symmetrical confidence interval (interval estimation) of AUC measure constructed by the -fold cross-validated distribution based on the normal assumption is proposed. But these symmetrical confidence intervals always exhibit low degrees of confidence or long interval lengths. This may easily result in liberal statistical inference results. Through the theoretical analysis of AUC measure, we find that the real distribution of AUC measure is actually asymmetrical. In this case, it is obviously inappropriate to use symmetrical distribution to approximate asymmetrical distribution. Therefore, for the two-class classification problem, this paper proposes a new asymmetrical confidence interval based on -fold cross-validated Beta distribution. Simulated and real data experiments show the superiority of the proposed confidence interval compared to the traditional symmetrical confidence interval based on -fold cross-validated distribution.