• • 上一篇    

基于预训练语言模型的健康谣言检测

许诺, 赵薇, 尚柯源, 陈浩宇   

  1. 中国传媒大学, 北京 100024
  • 收稿日期:2022-05-09 修回日期:2022-06-28 发布日期:2022-11-04
  • 基金资助:
    中国传媒大学中央高校基本科研业务费专项(CUC220C008,CUC220B013)资助课题.

许诺, 赵薇, 尚柯源, 陈浩宇. 基于预训练语言模型的健康谣言检测[J]. 系统科学与数学, 2022, 42(10): 2582-2589.

XU Nuo, ZHAO Wei, SHANG Keyuan, CHEN Haoyu. Health Rumor Detection based on Pre-Trained Language Model[J]. Journal of Systems Science and Mathematical Sciences, 2022, 42(10): 2582-2589.

Health Rumor Detection based on Pre-Trained Language Model

XU Nuo, ZHAO Wei, SHANG Keyuan, CHEN Haoyu   

  1. Communication University of China, Beijing 100024
  • Received:2022-05-09 Revised:2022-06-28 Published:2022-11-04
当前大多数谣言检测主要面向社交媒体数据,所处理文本序列较短,然而面向包含多个句子的段落或长序列文本篇章输入时,因不能提取有效特征进而影响模型识别效果.为获取谣言检测的有效信息,文章提出基于I-BERT-BiLSTM (Improved-BERT-BiLSTM)的健康类谣言检测方法,通过提取文档级长序列文本的摘要,并输入到以多层注意力机制为框架的深层神经网络进行特征提取,最后输入到BiLSTM进行谣言分类.实验结果表明:文章提出的I-BERT-BiLSTM模型在自建健康类谣言数据集与公开数据集上达到了97.75%和91.15%的准确率.
Currently,most studies on rumor detection mainly focus on social media data and the length of text sequence is short.We argue that existing methods could not capture effective features from health rumors with long texts and then affect the validity of methods.To solve this,we propose an improved BERT-BiLSTM model (I-BERT-BiLSTM),which leverages effective information extracted from texts with long sequences for the health rumor detection.We first conduct text summarization from document-level text.The results are regarded as the input of the deep network model with multi-layer self-attention mechanisms for feature extraction.Finally,we feed the output into BiLSTM for rumor classification.The experimental results show that the model we proposed in this paper achieves 97.75% and 91.15% accuracy on the self-built health rumor data and public data.

MR(2010)主题分类: 

()
[1] 刘知远,张乐,涂存超,等.中文社交媒体谣言统计语义分析.中国科学:信息科学, 2015, 45:1536-1546.(Liu Z Y, Zhang L, Tu C C, et al. Statistical and semantic analysis of rumors in Chinese social media. Sci. Sin. Inform., 2015, 45:1536-1546.)
[2] Castillo C, Mendoza M, Poblete B. Information credibility on Twitter. Proceedings of International Conference on World Wide Web, Hyderabad, 2011, 675-684.
[3] Zhang Q, Zhang S Y, Dong J, et al. Automatic detection of rumor on social network. Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing, Nanchang, 2015, 113-122.
[4] Qazvinian V, Rosengren E, Radev D R, et al. Rumor has it:Identifying misinformation in microblogs. Proceedings of the Conference on Empirical Methods in Natural Language Processing, Edinburgh, 2011, 1589-1599.
[5] Zhao Z, Resnick P, Mei Q Z. Enquiring minds:Early detection of rumors in social media from enquiry posts. Proceedings of the 24th International Conference on World Wide Web, Florence, 2015, 1395-1405.
[6] Yang F, Liu Y, Yu X H, et al. Automatic detection of rumor on Sina Weibo. Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics, Beijing, 2012, 13-20.
[7] Zhang Q, Zhang S Y, Dong J, et al. Automatic detection of rumor on social network. Proceedings of the 4th CCF Conference on Natural Language Processing and Chinese Computing, Nanchang, 2015, 113-122.
[8] Liang G, He W B, Xu C, et al. Rumor Identification in Microblogging systems based on users'behavior. IEEE Trans. Comput. Soc. Syst., 2015, 2:99-108.
[9] LeCun Y, Boser B, Denker J S, et al. Backpropagation applied to handwritten zip code recognition. Neural Comput., 1989, 1:541-551.
[10] Elman J L. Finding structure in time. Cogn. Sci., 1990, 14:179-211.
[11] Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput., 1997, 9:1735-1780.
[12] Ma J, Gao W, Wei Z Y, et al. Detect rumors using time series of social context information on microblogging websites. Proceedings of the 24th ACM International Conference on Information and Knowledge Management, Melbourne, 2015, 1751-1754.
[13] 廖祥文,黄知,杨定达,等.基于分层注意力网络的社交媒体谣言检测.中国科学:信息科学, 2018, 48(11):1558-1574.(Liao X W, Huang Z, Yang D D, et al. Rumor detection in social media based on a hierarchical attention network. Sci. Sin. Inform., 2018, 48(11):1558-1574.)
[14] Mihalcea R, Tarau P. TextRank:Bringing order into texts. Proceedings of Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004, 404-411.
[15] Devlin J, Chang M W, Lee K, et al. BERT:Pre-training of deep bidirectional transformers for language understanding. Computation and Language, 2018, 23(2):3-19.
[16] 李铮,陈莉,张爽.基于ELMo和Bi-SAN的中文文本情感分析.计算机应用研究, 2021, 38(8):2303-2307.(Li Z, Chen L, Zhang S. Chinese text sentiment analysis based on ELMo and Bi-SAN. Application Research of Computers, 2021, 38(8):2303-2307.)
[17] Sun M S, Li J Y, Guo Z P, et al. THUCTC:An Efficient Chinese Text Classifier, 2016.
[18] 王紫音,于青.基于BERT-BiGRU模型的文本分类研究.天津理工大学学报, 2021, 37(4):40-46.(Wang Z Y, Yu Q. Research on text classification based on BERT-BiGRU model. Journal of Tianjin University of Technology, 2021, 37(4):40-46.)
[1] 曹丽娜,唐锡晋. BBS话题的地理分布分析[J]. 系统科学与数学, 2016, 36(5): 671-682.
阅读次数
全文


摘要