【问题标题】:Count words in list of strings based on words array and making dictionary from it根据单词数组计算字符串列表中的单词并从中制作字典
【发布时间】:2019-07-08 06:48:21
【问题描述】:

我有一个字符串列表:

string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']

还有一个单词列表:

words=['hope','court','mention','maryland']

现在,我只想将字符串列表中出现的列表单词计数放入单独的字典中,键为 'doc_(index),值作为嵌套字典,键作为出现的单词,值作为计数。预期输出为:

words_dict={'doc_1':{'court':2,'hope':1},'doc_2':{'court':1,'hope':1},'doc_3':{'mention':1,'hope':1,'maryland':1}}

我的第一步是:

docs_dict={}
count=0
for i in string_list:
    count+=1
    docs_dic['doc_'+str(count)]=i
print (docs_dic)

{'doc_1': 'philadelphia court excessive disappointed court hope', 'doc_2': 'hope jurisdiction obscures acquittal court', 'doc_3': 'mention hope maryland signal held problem internal reform life bolster level grievance'}

在此之后,我无法了解如何获得字数。到目前为止我做了什么:

docs={}
for k,v in words_dic.items():
    split_words=v.split()
    for i in words:
        if i in split_words:
            docs[k][i]+=1
        else:
            docs[k][i]=0

【问题讨论】:

    标签: arrays python-3.x dictionary


    【解决方案1】:

    您可以在python中使用count来获取句子中的字数。

    检查此代码:

    words_dict = {}
    string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']
    words_list=['hope','court','mention','maryland']
    for i in range(len(string_list)): #iterate over string list
        helper = {} #temporary dictionary
        for word in words_list: #iterate over word list
            x = string_list[i].count(word) #count no. of occurrences of word in sentence
            if x > 0:
                helper[word]=x
        words_dict["doc_"+str(i+1)]=helper #add temporary dictionary into final dictionary
    
    #Print dictionary contents
    for i in words_dict:
        print(i + ": " + str(words_dict[i]))
    

    以上代码的输出为:

    doc_3: {'maryland': 1, 'mention': 1, 'hope': 1}                                                                                                                                     
    doc_2: {'court': 1, 'hope': 1}                                                                                                                                                      
    doc_1: {'court': 2, 'hope': 1}
    

    【讨论】:

    • 你能解释一下我在第二段代码中做错了什么吗?
    • @Learner 您的代码不清楚。请正确更正代码。在第一部分中,您将字典定义为“docs_dict”并使用了“docs_dic”。在代码的第二部分,最后,在 'docs[k][i]+=1' 处,您正在更新字典而不初始化任何值。这就是问题所在。
    【解决方案2】:

    使用Counter 获取每个文档中的字数。

    试试这个,

    >>> from collections import Counter
    >>> string_list = ['philadelphia court excessive disappointed court hope', 'hope jurisdiction obscures acquittal court', 'mention hope maryland signal held problem internal reform life bolster level grievance']
    >>> words=['hope','court','mention','maryland']
    >>> d = {}
    >>> for i,doc in enumerate(string_list):
            for word,count in Counter(doc.split()).items():
                if word in words:
                    d.setdefault("doc_{}".format(i), {})[word]=count
    

    输出:

    >>> d
    {'doc_0': {'court': 2, 'hope': 1}, 'doc_1': {'hope': 1, 'court': 1}, 'doc_2': {'mention': 1, 'hope': 1, 'maryland': 1}}
    

    【讨论】:

      【解决方案3】:

      看起来here 的问题可以提供帮助。

      以下是我对可以满足您需求的代码的尝试。

      from collections import Counter
      string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']
      words=['hope','court','mention','maryland']
      
      
      result_dict = {}
      
      for index, value in enumerate(string_list):
           string_split = value.split(" ")
           cntr = Counter(string_split)
           result = { key: cntr[key] for key in words }
           result_dict['doc'+str(index)] = result
      
      
      

      希望你觉得它有用。

      【讨论】:

      • string_list_list = [x.split(" ") for x in string_list]有什么用?
      • 这是一个列表理解。在这种情况下,它会创建一个“子”列表。子列表是根据每个句子中的单词创建的。例如,创建的第一个子列表是['philadelphia', 'court' 'excessive', 'disappointed' , 'court', 'hope']
      • 你是对的!删除。抱歉,最后使用枚举器更容易做到这一点。我不小心把它丢了,
      【解决方案4】:

      试试这个,

      from collections import Counter
      
      string_list = ['philadelphia court excessive disappointed court hope',
                     'hope jurisdiction obscures acquittal court',
                     'mention hope maryland signal held problem internal reform life bolster level grievance']
      words = ['hope', 'court', 'mention', 'maryland']
      
      result = {f'doc_{i + 1}': {key: value for key, value in Counter(string_list[i].split()).items() if key in words} for i in range(len(string_list))}
      print(result)
      

      输出:

      {'doc_1': {'court': 2, 'hope': 1}, 'doc_2': {'hope': 1, 'court': 1}, 'doc_3': {'mention': 1, 'hope': 1, 'maryland': 1}}
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2017-03-22
        • 1970-01-01
        • 1970-01-01
        • 2016-02-05
        • 2014-03-18
        • 1970-01-01
        • 2018-11-07
        • 2023-03-21
        相关资源
        最近更新 更多