【问题标题】:Word count from a txt file programtxt 文件程序的字数统计
【发布时间】:2020-08-26 08:00:56
【问题描述】:

我正在使用以下代码计算 txt 文件的字数:

#!/usr/bin/python
file=open("D:\\zzzz\\names2.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
print (word,wordcount)
file.close();

这给了我这样的输出:

>>> 
goat {'goat': 2, 'cow': 1, 'Dog': 1, 'lion': 1, 'snake': 1, 'horse': 1, '': 1, 'tiger': 1, 'cat': 2, 'dog': 1}

但我希望以下列方式输出:

word  wordcount
goat    2
cow     1
dog     1.....

我还在输出中得到一个额外的符号 ()。我怎样才能删除它?

【问题讨论】:

标签: python


【解决方案1】:

您遇到的有趣符号是 UTF-8 BOM (Byte Order Mark)。要摆脱它们,请使用正确的编码打开文件(我假设您使用的是 Python 3):

file = open(r"D:\zzzz\names2.txt", "r", encoding="utf-8-sig")

另外,为了计数,你可以使用collections.Counter

from collections import Counter
wordcount = Counter(file.read().split())

显示它们:

>>> for item in wordcount.items(): print("{}\t{}".format(*item))
...
snake   1
lion    2
goat    2
horse   3

【讨论】:

  • 但是如何排序呢?
【解决方案2】:
#!/usr/bin/python
file=open("D:\\zzzz\\names2.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
for k,v in wordcount.items():
    print k, v

【讨论】:

    【解决方案3】:
    FILE_NAME = 'file.txt'
    
    wordCounter = {}
    
    with open(FILE_NAME,'r') as fh:
      for line in fh:
        # Replacing punctuation characters. Making the string to lower.
        # The split will spit the line into a list.
        word_list = line.replace(',','').replace('\'','').replace('.','').lower().split()
        for word in word_list:
          # Adding  the word into the wordCounter dictionary.
          if word not in wordCounter:
            wordCounter[word] = 1
          else:
            # if the word is already in the dictionary update its count.
            wordCounter[word] = wordCounter[word] + 1
    
    print('{:15}{:3}'.format('Word','Count'))
    print('-' * 18)
    
    # printing the words and its occurrence.
    for  (word,occurance)  in wordCounter.items(): 
      print('{:15}{:3}'.format(word,occurance))
    
    #
        Word           Count
        ------------------
        of               6
        examples         2
        used             2
        development      2
        modified         2
        open-source      2
    

    【讨论】:

      【解决方案4】:
      import sys
      file=open(sys.argv[1],"r+")
      wordcount={}
      for word in file.read().split():
          if word not in wordcount:
              wordcount[word] = 1
          else:
              wordcount[word] += 1
      for key in wordcount.keys():
        print ("%s %s " %(key , wordcount[key]))
      file.close();
      

      【讨论】:

      • 你的python版本是多少?
      • 您可以在compileonline.com/execute_python_online.php查看示例有效性
      • @user3068762: 关于 AttributeError: 'dict' object has no attribute 'key': 行错误应该是for key in wordcount.keys(): -- 注意@987654324末尾的“s”字符@.
      • sys.argv[0] 返回此文件,而不是第一个参数。我看不出这是你想要的。修改你的答案。
      【解决方案5】:

      如果您使用的是graphLab,则可以使用此功能。真的很给力

      products['word_count'] = graphlab.text_analytics.count_words(your_text)
      

      【讨论】:

        【解决方案6】:
        #!/usr/bin/python
        file=open("D:\\zzzz\\names2.txt","r+")
        wordcount={}
        for word in file.read().split():
            if word not in wordcount:
                wordcount[word] = 1
            else:
                wordcount[word] += 1
        
        for k,v in wordcount.items():
            print k,v
        file.close();
        

        【讨论】:

        • 请添加一些解释。
        【解决方案7】:

        你可以这样做:

        file= open(r'D:\\zzzz\\names2.txt')
        file_split=set(file.read().split())
        print(len(file_split))
        

        【讨论】:

        • 如果您解释了您提供的代码如何回答问题,这将是一个更好的答案。
        【解决方案8】:

        Python | How to Count the frequency of a word in the text file? 的以下代码对我有用。

         import re
            frequency = {}
            #Open the sample text file in read mode.
            document_text = open('sample.txt', 'r')
            #convert the string of the document in lowercase and assign it to text_string variable.
            text = document_text.read().lower()
            pattern = re.findall(r'\b[a-z]{2,15}\b', text)
            for word in pattern:
                 count = frequency.get(word,0)
                 frequency[word] = count + 1
             frequency_list = frequency.keys()
             for words in frequency_list:
                 print(words, frequency[words])
        

        输出:

        【讨论】:

          【解决方案9】:
          print("sorted counting values:-")
          from collections import Counter
          
          fname = open(filename)
          
          fname = fname.read()
          
          fsplit = fname.split()
          
          user  = Counter(fsplit)
          
          for i,v in sorted(user.items()):
          
             print((v,i))
          

          【讨论】:

          • 请解释您的解决方案如何以及您的解决方案如何/为什么比现有解决方案更好/不同。
          猜你喜欢
          • 2019-03-18
          • 1970-01-01
          • 1970-01-01
          • 2023-03-07
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2014-02-16
          • 2012-04-19
          相关资源
          最近更新 更多