使用python可以轻松统计词频,做文章的词频统计也是轻而易举的事情。

1、添加自定义字典(如:超级赛亚人、奥里给等)

2、jieba分词

PS:直接将文章丢进 tf.txt 文件里,将自定义字典丢进 dict.txt 文件里就OK了

import jieba  
txt = open("tf.txt", encoding="utf-8").read()
jieba.load_userdict("dict.txt")
words  = jieba.lcut(txt)  
counts = {}
for word in words:  
    counts[word] = counts.get(word,0) + 1
items = list(counts.items())  
items.sort(key=lambda x:x[1], reverse=True)
for i in range(100):  
    word, count = items[i]
    #print (word)
    #print(count)
    print ("{0:<10}{1:>5}".format(word, count))
print('\n')
for i in range(100):  
    word, count = items[i]
    #print(count/35323)
    #print ("{0:<10}{1:>5}".format(word, count / 35323))

示例图: 

【python】文章、文本内容做词频统计(使用jieba分词,添加自定义字典)

相关文章:

  • 2021-10-11
  • 2022-12-23
  • 2022-12-23
  • 2021-04-21
  • 2021-06-30
  • 2022-12-23
  • 2022-12-23
  • 2022-12-23
猜你喜欢
  • 2021-07-13
  • 2022-01-14
  • 2022-12-23
  • 2021-08-15
  • 2022-12-23
  • 2021-08-07
  • 2022-12-23
相关资源
相似解决方案