计算python中的唯一单词答案

【问题标题】：Counting unique words in python计算python中的唯一单词
【发布时间】：2012-08-07 15:36:26
【问题描述】：

到目前为止，我的代码是这样的：

from glob import glob
pattern = "D:\\report\\shakeall\\*.txt"
filelist = glob(pattern)
def countwords(fp):
    with open(fp) as fh:
        return len(fh.read().split())
print "There are" ,sum(map(countwords, filelist)), "words in the files. " "From directory",pattern

我想添加一个代码来计算模式中的唯一单词（此路径中有 42 个 txt 文件），但我不知道如何。有人可以帮帮我吗？

【问题讨论】：

唯一词是指仅出现一次的词，还是您想要计算每个词的计数？

标签： python word-count

【解决方案1】：

在 Python 中计算对象的最佳方法是使用为此目的而创建的 collections.Counter 类。它的作用类似于 Python dict，但在计数时使用起来更容易一些。您可以只传递一个对象列表，它会自动为您计算它们。

>>> from collections import Counter
>>> c = Counter(['hello', 'hello', 1])
>>> print c
Counter({'hello': 2, 1: 1})

Counter 还有一些有用的方法，例如 most_common，请访问documentation 了解更多信息。

Counter 类的一个非常有用的方法是更新方法。通过传递对象列表实例化 Counter 后，您可以使用 update 方法执行相同操作，它将继续计数而不会删除对象的旧计数器：

>>> from collections import Counter
>>> c = Counter(['hello', 'hello', 1])
>>> print c
Counter({'hello': 2, 1: 1})
>>> c.update(['hello'])
>>> print c
Counter({'hello': 3, 1: 1})

【讨论】：

看来我发布的答案与您的非常相似。我正在删除我的，但我建议您添加提及 Counter 对象的 update() 方法。

【解决方案2】：

如果您想计算每个唯一单词的数量，请使用 dicts：

words = ['Hello', 'world', 'world']
count = {}
for word in words :
   if word in count :
      count[word] += 1
   else:
      count[word] = 1

你会得到字典

{'Hello': 1, 'world': 2}

【讨论】：

另外，set() 会是一个更好的选择。

【解决方案3】：

print len(set(w.lower() for w in open('filename.dat').read().split()))

将整个文件读入内存，使用空格，转换每个单词小写，从小写单词创建一个（唯一的）集合，计算它们并打印输出

【讨论】：