【发布时间】:2020-05-02 04:58:11
【问题描述】:
我有一个元组列表:
我想要做的是只保留每个元组中具有唯一编号的entities。
dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
从这里,预期的输出是:
[('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
到目前为止的代码:
dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
seen_values = []
clean_data = []
# loop through each sentence and dict of values
for sentence, values in dataset:
for value in values['entities']:
if value[0] in seen_values:
# remove if we have seen this before
values['entities'].remove(value)
else:
# add to list if we have not seen this before
seen_values.append(value[0])
clean_data.append((sentence, values))
print(clean_data)
这给了[('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
谁能帮我解决这个问题
【问题讨论】:
标签: python python-3.x list dictionary tuples