【问题标题】:Count frequency of word in text file in Python在Python中计算文本文件中单词的频率
【发布时间】:2016-12-07 07:19:13
【问题描述】:

我正试图弄清楚如何制作一个程序,该程序接受用户选择的文件(通过输入文件名)并计算用户输入的每个单词的频率。

我有大部分,但是当我输入多个单词供程序查找时,只有第一个单词显示正确的频率,其余显示为“0 次出现”

file_name = input("What file would you like to open? ")
f = open(file_name, "r")
the_full_text = f.read()
words = the_full_text.split()
search_word = input("What words do you want to find? ").split(",")
len_list = len(search_word) 

word_number = 0
print()
print ('... analyzing ... hold on ...')
print()
print ('Frequency of word usage within', file_name+":")
for i in range(len_list):

    frequency = 0
    for word in words:
        word = word.strip(",.")
        if search_word[word_number].lower() == word.lower():
            frequency += 1
    print ("   ",format(search_word[word_number].strip(),'<20s'),"/", frequency, "occurrences")
    word_number = word_number + 1

就像一个示例输出:

What file would you like to open? assignment_8.txt
What words do you want to find? wey, rights, dem

... analyzing ... hold on ...

Frequency of word usage within assignment_8.txt:
    wey                  / 96 occurrences
    rights               / 0 occurrences
    dem                  / 0 occurrences

我的程序有什么问题?请帮忙:o

【问题讨论】:

  • 如果你在","拆分,你的输入不应该是"wey,rights,dem",没有空格吗?

标签: python file text counter frequency


【解决方案1】:

您需要去掉搜索词中的空格。

但是,您当前的算法效率非常低,因为它必须为每个搜索词重新扫描整个文本。这是一种更有效的方法。首先,我们清理搜索词并将它们放入列表中。然后,我们从该列表中构建一个字典,以便在我们在文本文件中找到每个单词时存储它们的计数。

file_name = input("What file would you like to open? ")
with open(file_name, "r") as f:
    words = f.read().split()

search_words = input("What words do you want to find? ").split(',')
search_words = [word.strip().lower() for word in search_words]
#print(search_words)
search_counts = dict.fromkeys(search_words, 0)

print ('\n... analyzing ... hold on ...')
for word in words:
    word = word.rstrip(",.").lower()
    if word in search_counts:
        search_counts[word] += 1

print ('\nFrequency of word usage within', file_name + ":")
for word in search_words:
    print("   {:<20s} / {} occurrences".format(word, search_counts[word]))

【讨论】:

    【解决方案2】:

    有很多方法可以做到这一点,下面是一个读取 .txt 文件并创建包含 Wordlist 和词频的字典的程序,我们还拆分和识别句子。

    """
    Created on Fri Jun 11 17:06:52 2021
    
    @author: Vijayendra Dwari
    """
    
    sentences = []
    wordlist = []
    
    digits = "1,2,3,4,5,6,7,8,9,0"
    punc = "!,@,$,%+,^,&,*,(),>,‚·<,},{,[],#,_ï,-,/,',’"
    drop =    "a,is,are,when,then,an,the,we,us,upto,,them,their,from,for,in,of,at,to,out,in,and,into,any,but,also,too,that"
    import os
    
    FileName = input("Please enter the file name: ")
    f = open('FileName',"r")
    for line in f:    
    line = " ".join(line.split())
    line = "".join([c for c in line if c not in digits])   
    line = "".join([c for c in line if c not in punc])
    line = "".join(line.split('  '))
    
    temp = line.split('.')
    temp2 = line.split(' ')
    sentences.append(temp)
    wordlist.append(temp2)
    word_dict = {'wordlist':'word_freq'}
    wordcount=0
    for i in range(0,len(sentences)):
        for word in wordlist[i]:
            if word not in drop:                        
                word_dict[word] = word_dict.get(word, 0) + 1
                wordcount += 1
            i=i+1
            word_freq = []    
    for key, value in word_dict.items():
        word_freq.append((value, key))
       
    f.close()
    print(word_freq)
    print(wordlist)
    print(sentences)
    

    【讨论】:

      猜你喜欢
      • 2011-05-30
      • 2020-04-06
      • 2016-01-23
      • 2015-01-07
      • 2015-06-14
      • 1970-01-01
      • 2017-03-29
      • 1970-01-01
      相关资源
      最近更新 更多