从文件创建字典答案

【问题标题】：Create a dictionary from a file从文件创建字典
【发布时间】：2013-12-06 19:36:34
【问题描述】：

我正在创建一个代码，允许用户输入他们选择的 .txt 文件。因此，例如，如果文本为：

“我就是你。你就是我。”

我希望我的代码创建一个类似于以下内容的字典：
{我：2，上午：1，你：2，是：1}

以文件中的单词作为键，以次数作为值。大小写应该无关紧要，所以 = ARE = ArE = arE = etc...

到目前为止，这是我的代码。有什么建议/帮助吗？

>> file = input("\n Please select a file")
>> name = open(file, 'r')    
>> dictionary = {}
>> with name:
     >> for line in name:
          >> (key, val) = line.split()
          >> dictionary[int(key)] = val

【问题讨论】：

标签： python-3.x

【解决方案1】：

看看这个答案中的例子：

Python : List of dict, if exists increment a dict value, if not append a new dict

您可以使用collections.Counter() 轻松地做您想做的事情，但如果由于某种原因您不能使用它，您可以使用defaultdict 甚至是一个简单的循环来构建您想要的字典。

这是解决您问题的代码。这适用于 Python 3.1 及更高版本。

from collections import Counter
import string

def filter_punctuation(s):
    return ''.join(ch if ch not in string.punctuation else ' ' for ch in s)

def lower_case_words(f):
    for line in f:
        line = filter_punctuation(line)
        for word in line.split():
            yield word.lower()

def count_key(tup):
    """
    key function to make a count dictionary sort into descending order
    by count, then case-insensitive word order when counts are the same.
    tup must be a tuple in the form: (word, count)
    """
    word, count = tup
    return (-count, word.lower())

dictionary = {}

fname = input("\nPlease enter a file name: ")
with open(fname, "rt") as f:
    dictionary = Counter(lower_case_words(f))

print(sorted(dictionary.items(), key=count_key))

从您的示例中，我可以看出您希望去掉标点符号。因为我们要在空白处分割字符串，所以我编写了一个将标点符号过滤到空白处的函数。这样，如果你有一个像 hello,world 这样的字符串，当我们在空白处分割时，它会被分割成 hello 和 world。

函数lower_case_words() 是一个生成器，它一次读取一行输入文件，然后从每一行一次生成一个单词。这巧妙地将我们的输入处理放入一个整洁的“黑匣子”中，然后我们可以简单地调用Counter(lower_case_words(f))，它为我们做了正确的事情。

当然，您不必打印已排序的字典，但我认为这样看起来更好。我做了排序顺序，将最高计数放在首位，如果计数相等，则按字母顺序排列单词。

根据您的建议输入，这是结果输出：

[('i', 2), ('you', 2), ('am', 1), ('are', 1)]

由于排序，它总是按上述顺序打印。

【讨论】：