【问题标题】:Sorting and counting words from a text file对文本文件中的单词进行排序和计数
【发布时间】:2016-11-15 08:43:58
【问题描述】:

我是编程新手,一直坚持当前的程序。我必须从文件中读取一个故事,对单词进行排序,并计算每个单词出现的次数。它会计算单词,但不会对单词进行排序、删除标点符号或重复单词。我不知道为什么它不起作用。任何意见将是有益的。

ifile = open("Story.txt",'r')
fileout = open("WordsKAI.txt",'w')
lines = ifile.readlines()

wordlist = []
countlist = []

for line in lines:
    wordlist.append(line)
    line = line.split()
    # line.lower()

    for word in line:
        word = word.strip(". ,  ! ? :  ")
        # word = list(word)
        wordlist.sort()
        sorted(wordlist)
        countlist.append(word)

        print(word,countlist.count(word))

【问题讨论】:

标签: python python-3.x sorting count


【解决方案1】:

您的代码中的主要问题在于第 9 行:

    wordlist.append(line)

您将整行附加到wordlist,我怀疑这就是您想要的。执行此操作时,添加的单词在添加到 wordlist 之前不是 .strip()ed。

您需要做的是仅在您拥有strip()ed 后添加该词,并确保您仅在检查没有其他相同的词(无重复)后才这样做:

ifile = open("Story.txt",'r')
lines = ifile.readlines()

wordlist = []
countlist = []

for line in lines:
    # Get all the words in the current line
    words = line.split()
    for word in words:
        # Perform whatever manipulation to the word here
        # Remove any punctuation from the word
        word = word.strip(".,!?:;'\"")
        # Make the word lowercase
        word = word.lower()

        # Add the word into wordlist only if it is not in wordlist
        if word not in wordlist:
            wordlist.append(word)

        # Add the word to countlist so that it can be counted later
        countlist.append(word)

# Sort the wordlist
wordlist.sort()

# Print the wordlist
for word in wordlist:
    print(word, countlist.count(word))

您可以这样做的另一种方法是使用字典,将单词存储为键,并将出现次数存储为值:

ifile = open("Story.txt", "r")
lines = ifile.readlines()

word_dict = {}

for line in lines:
    # Get all the words in the current line
    words = line.split()
    for word in words:
        # Perform whatever manipulation to the word here
        # Remove any punctuation from the word
        word = word.strip(".,!?:;'\"")
        # Make the word lowercase
        word = word.lower()

        # Add the word to word_dict
        word_dict[word] = word_dict.get(word, 0) + 1

# Create a wordlist to display the words sorted
word_list = list(word_dict.keys())
word_list.sort()

for word in word_list:
    print(word, word_dict[word])

【讨论】:

  • 非常感谢。最后一个问题我什么时候将单词转换为小写?
  • @KennyI。您只需要在将.append() 发送到wordlist 之前对单词进行任何操作。查看最近的编辑。
【解决方案2】:

您必须为排序方法提供一个关键功能。 尝试这个 r = sorted(wordlist, key=str.lower)

【讨论】:

  • 您无需提供密钥。这完全取决于您想要什么。
【解决方案3】:
punctuation = ".,!?: "
counts = {}
with open("Story.txt",'r') as infile:
    for line in infile:
        for word in line.split():
            for p in punctuation:
                word = word.strip(p)
            if word not in counts:
                counts[word] = 0
            counts[word] += 1

with open("WordsKAI.txt",'w') as outfile:
    for word in sorted(counts):  # if you want to sort by counts instead, use sorted(counts, key=counts.get)
        outfile.write("{}: {}\n".format(word, counts[word]))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2023-03-08
    • 1970-01-01
    • 2018-12-09
    • 1970-01-01
    • 2021-12-12
    • 2023-03-09
    • 1970-01-01
    • 2012-05-11
    相关资源
    最近更新 更多