基于文本挖掘的网络舆情分类研究
Research on Network Public Opinion Classification Based on Text Mining
文章基于机器学习中的无监督学习Kmeans文本聚类算法,依据中宣部舆情分类标准,实证研究了天涯杂谈2012年1月1日到2015年12 月31日帖子的舆情分布情况.并对各类别的点击量和回复量之间的显著性差异进行了秩和检验.结论如下: 1)政治性网络舆情所占比重最大,其次是社会性网络舆情,经济性网络舆情与文化性网络舆情占比相差不大,占比最小的为复合性网络舆情; 2)各类舆情4 年的占比基本保持稳定; 3)不同类别帖子的回复量和点击存在显著性差异.
Based on the unsupervised learning Kmeans text clustering algorithm in machine learning and public opinion classification standard of publicity department of the CPC central committee, this paper provided an empirical study on the distribution of public opinion of tianya's posts from January 1, 2012 to December 31, 2015. According to the classification criteria, posts are divided into 5 categories. During the classification process, the distribution of posts is shown visually by drawing the word cloud diagram. The variation of the public opinions are observed in different years. The rank sum test was performed for the significant difference between the hits and replies of posts in different categories. The results are as follows: 1) The 4-year sampling data consistently show that political network public opinion accounts for the largest proportion, followed by social network public opinion. The proportion of cultural network public opinion and economic network public opinion is not much different, they are in the third and fourth place respectively, and the smallest is composite network public opinion; 2) The proportion of all kinds of public opinions remained stable for 4 years; 3) The rank-sum test verified that there were significant differences in the number of replies and clicks among posts of different categories.
/
〈 | 〉 |