【发布时间】:2020-05-21 11:56:53
【问题描述】:
我有一个 json 文件,其中包含一些关于单词的信息。该结构是一个带有 dicts 的列表,如下所示:
file = [{"index": "1", "text": "uhm", "eos": false}, {"index": "2", "text": "moeten", "eos": false}, {"index": "3", "text": "langs", "eos": false}, {"index": "4", "text": "uhm", "eos": true}, {"index": "1", "text": "uh", "eos": false}, {"index": "2", "text": "om", "eos": false}, {"index": "3", "text": "die", "eos": false}, {"index": "4", "text": "afsluiters", "eos": true}]
我需要对数据进行预处理以进行进一步分析。因此,我编写了以下函数。它工作正常,但看起来不是很优雅。如何改进它以使其更具可读性、更少冗余和美观 =)
def prepare(file):
# set up variables
text = []
sent_dict = {}
sentence = ""
chunks = []
ngram = ""
maxn = 5
for word in file:
if word["eos"] == False:
# concatenate words
sentence += word["text"] + " "
# get last five elements of sentence excluding last space and make chunk
chunk = " ".join(sentence.split(" ")[:-1][-maxn:])
index = word["index"]
chunks.append({index: {"ngram" : chunk}})
else:
# concatenate words without last space
sentence += word["text"]
# get last five elements of sentence and make chunk
chunk = " ".join(sentence.split(" ")[-maxn:])
index = word["index"]
chunks.append({index: {"ngram" : chunk}})
# make dict with sentence and list of chunks
sent_dict["sentence"] = sentence
sent_dict["chunks"] = chunks
text.append(sent_dict)
# set variables back to default
sent_dict = {}
sentence = ""
chunks = []
return(text)
如果你编译prepare(file),它会返回一个类似如下的列表:
[{'sentence' : 'uhm moeten langs uhm', 'chunk' : [{'1' : 'uhm'}, {'2' : 'uhm moeten'}, {'3' : 'uhm moeten langs'}, {'4' : 'uhm moeten langs uhm'}]}]
【问题讨论】:
-
请向我们展示您编写的函数的示例输出。还向我们展示您想要的输出示例。
-
如果您将问题顶部的列表插入到函数中,它将完全返回我想要的输出。这是一个工作示例
-
是的,但无论如何请发布一个示例。很多人不看代码就能想出解决方案。
标签: python list dictionary for-loop