【问题标题】:How do I count the number of words spoken by each character in a dialogue and store the count in a dictionary?如何计算对话中每个角色所说的单词数并将计数存储在字典中?
【发布时间】:2021-07-02 01:43:00
【问题描述】:

我正在尝试计算字符 "Michael""Jim" 在以下对话中说出的单词数,并将它们存储在类似于 {"Michael:":15, "Jim:":10} 的字典中。

string = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."

我想创建一个包含字符名称作为键的空字典,将字符串按" " 拆分,然后通过使用键作为参考来计算字符名称之间结果列表元素的数量,然后存储计数词作为值。这是我目前使用的代码:

dict = {"Michael:" : 0,
        "Jim:" : 0}

list = string.split(" ")

indices = [i for i, x in enumerate(list) if x in dict.keys()]
nums = []
for i in range(1,len(indices)):
    nums.append(indices[i] - indices[i-1])
print(nums)

结果是一个打印为 [15, 10, 15, 9] 的列表

我想我需要以下帮助:

  1. 如果可能的话,一个更好的方法
  2. 当该行是对话的最后一行时,一种计算角色说出的单词数的方法
  3. 一种通过自动计算角色说出的单词数来更新字典的方法

最后一点至关重要,因为我试图复制这个过程以获得一集的引语。

提前谢谢你!

【问题讨论】:

  • 不使用内置函数作为变量
  • @Sujay 的意思是 string 是一个 std 库模块,因此您可以通过将其用作变量名使其不可用(是的,您可以 import string as still_available_string)。
  • @JLPeyret,还有listdict
  • 对,请注意,因为我只使用了 OPs 字符串定义。
  • @beginnerprogrammerforever 好...接受答案或为您认为有帮助的人点赞是这里感谢人们的常用方式。

标签: python string dictionary parsing


【解决方案1】:

遍历单词,不断增加适当的计数。

dialogue_dict = {"Michael:" : 0, "Jim:" : 0}

words = string.split(" ")
current_character = None
for word in words:
    if word in dialogue_dict:
        current_character = word
    elif current_character:
        dialogue_dict[current_character] += 1

顺便说一句,不要使用 listdict 作为变量名,这会用这些名称覆盖内置函数。

【讨论】:

  • 谢谢,巴马尔。我有一些后续问题以确保我清楚地理解这一点 - 1. 你为什么不使用 ``` if word in dialog_dict.keys(): ``` ?我们不应该只看钥匙吗?
  • 当 dict 用作​​可迭代对象时,它只返回键。所以in dialogue_dictin dialogue_dict.keys() 是一样的。
  • 当你做for key in dialogue_dict:时你可以看到同样的事情
【解决方案2】:
string_ = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."

import re
from collections import defaultdict

#This assumes a character name has no blanks and is followed by a `:`
pat = re.compile("([A-Z][a-z'-]+:)")

#splitting like returns the delimeters (characters) as well
li = [v for v in pat.split(string_) if v]

# split 2 by 2
def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in range(0, len(l), n))

#use a defaultdict to start new characters at 0
#collections.Counter could also work
counter = defaultdict(int)

pairs = chunks(li,2)
for character, line in pairs:
    counter[character.rstrip(":")] += len(line.split())
 
print(f"{counter=}")

输出:

counter=defaultdict(<class 'int'>, {'Michael': 38, 'Jim': 17})

【讨论】:

    【解决方案3】:

    我们可以使用正则表达式来做到这一点。无需提供演讲者姓名

    import re
    
    string = "Michael: All right Jim. Your quarterlies look very good. How are things at the library? Jim: Oh, I told you. I couldn’t close it. So… Michael: So you’ve come to the master for guidance? Is this what you’re saying, grasshopper? Jim: Actually, you called me in here, but yeah. Michael: All right. Well, let me show you how it’s done."
    dialog_count = {}
    
    #extract speakers using regex
    speakers = re.findall(r'\w+:',string)
    #split sentences using regex
    sentencs = re.split(r'\w+:',string)
    speakers = filter(lambda x: x.strip()!='' ,speakers)
    sentencs = filter(lambda x: x.strip()!='' ,sentencs)
    
    #remap each speaker to it's sentence
    dialogs = zip(list(speakers),list(sentencs))
    
    #count total words
    for speaker,dialog in dialogs:
        dialog = dialog.split(" ")
        dialog = list(filter(lambda x: x.strip()!='',dialog))
        dialog_count[speaker] = dialog_count.get(speaker,0) + len(dialog)
    print(dialog_count)
    
    
    {'Michael:': 38, 'Jim:': 17}
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-02-11
      • 2021-12-27
      • 1970-01-01
      • 2021-04-17
      • 1970-01-01
      相关资源
      最近更新 更多