根据单词数组计算字符串列表中的单词并从中制作字典答案

【问题标题】：Count words in list of strings based on words array and making dictionary from it根据单词数组计算字符串列表中的单词并从中制作字典
【发布时间】：2019-07-08 06:48:21
【问题描述】：

我有一个字符串列表：

string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']

还有一个单词列表：

words=['hope','court','mention','maryland']

现在，我只想将字符串列表中出现的列表单词计数放入单独的字典中，键为 'doc_(index)，值作为嵌套字典，键作为出现的单词，值作为计数。预期输出为：

words_dict={'doc_1':{'court':2,'hope':1},'doc_2':{'court':1,'hope':1},'doc_3':{'mention':1,'hope':1,'maryland':1}}

我的第一步是：

docs_dict={}
count=0
for i in string_list:
    count+=1
    docs_dic['doc_'+str(count)]=i
print (docs_dic)

{'doc_1': 'philadelphia court excessive disappointed court hope', 'doc_2': 'hope jurisdiction obscures acquittal court', 'doc_3': 'mention hope maryland signal held problem internal reform life bolster level grievance'}

在此之后，我无法了解如何获得字数。到目前为止我做了什么：

docs={}
for k,v in words_dic.items():
    split_words=v.split()
    for i in words:
        if i in split_words:
            docs[k][i]+=1
        else:
            docs[k][i]=0

【问题讨论】：

标签： arrays python-3.x dictionary

【解决方案1】：

您可以在python中使用count来获取句子中的字数。

检查此代码：

words_dict = {}
string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']
words_list=['hope','court','mention','maryland']
for i in range(len(string_list)): #iterate over string list
    helper = {} #temporary dictionary
    for word in words_list: #iterate over word list
        x = string_list[i].count(word) #count no. of occurrences of word in sentence
        if x > 0:
            helper[word]=x
    words_dict["doc_"+str(i+1)]=helper #add temporary dictionary into final dictionary

#Print dictionary contents
for i in words_dict:
    print(i + ": " + str(words_dict[i]))

以上代码的输出为：

doc_3: {'maryland': 1, 'mention': 1, 'hope': 1}                                                                                                                                     
doc_2: {'court': 1, 'hope': 1}                                                                                                                                                      
doc_1: {'court': 2, 'hope': 1}

【讨论】：

你能解释一下我在第二段代码中做错了什么吗？
@Learner 您的代码不清楚。请正确更正代码。在第一部分中，您将字典定义为“docs_dict”并使用了“docs_dic”。在代码的第二部分，最后，在 'docs[k][i]+=1' 处，您正在更新字典而不初始化任何值。这就是问题所在。

【解决方案2】：

使用Counter 获取每个文档中的字数。

试试这个，

>>> from collections import Counter
>>> string_list = ['philadelphia court excessive disappointed court hope', 'hope jurisdiction obscures acquittal court', 'mention hope maryland signal held problem internal reform life bolster level grievance']
>>> words=['hope','court','mention','maryland']
>>> d = {}
>>> for i,doc in enumerate(string_list):
        for word,count in Counter(doc.split()).items():
            if word in words:
                d.setdefault("doc_{}".format(i), {})[word]=count

输出：

>>> d
{'doc_0': {'court': 2, 'hope': 1}, 'doc_1': {'hope': 1, 'court': 1}, 'doc_2': {'mention': 1, 'hope': 1, 'maryland': 1}}

【讨论】：

【解决方案3】：

看起来here 的问题可以提供帮助。

以下是我对可以满足您需求的代码的尝试。

from collections import Counter
string_list=['philadelphia court excessive disappointed court hope','hope jurisdiction obscures acquittal court','mention hope maryland signal held problem internal reform life bolster level grievance']
words=['hope','court','mention','maryland']


result_dict = {}

for index, value in enumerate(string_list):
     string_split = value.split(" ")
     cntr = Counter(string_split)
     result = { key: cntr[key] for key in words }
     result_dict['doc'+str(index)] = result

希望你觉得它有用。

【讨论】：

string_list_list = [x.split(" ") for x in string_list]有什么用？
这是一个列表理解。在这种情况下，它会创建一个“子”列表。子列表是根据每个句子中的单词创建的。例如，创建的第一个子列表是['philadelphia', 'court' 'excessive', 'disappointed' , 'court', 'hope']。
你是对的！删除。抱歉，最后使用枚举器更容易做到这一点。我不小心把它丢了，

【解决方案4】：

试试这个，

from collections import Counter

string_list = ['philadelphia court excessive disappointed court hope',
               'hope jurisdiction obscures acquittal court',
               'mention hope maryland signal held problem internal reform life bolster level grievance']
words = ['hope', 'court', 'mention', 'maryland']

result = {f'doc_{i + 1}': {key: value for key, value in Counter(string_list[i].split()).items() if key in words} for i in range(len(string_list))}
print(result)

输出：

{'doc_1': {'court': 2, 'hope': 1}, 'doc_2': {'hope': 1, 'court': 1}, 'doc_3': {'mention': 1, 'hope': 1, 'maryland': 1}}

【讨论】：