【问题标题】:Keeping duplicate tuple values if they are present inside separate tuples but only keeping unique tuple values if they are in the same tuple如果它们存在于单独的元组中,则保留重复的元组值,但如果它们位于同一元组中,则仅保留唯一的元组值
【发布时间】:2020-05-02 04:58:11
【问题描述】:

我有一个元组列表:

我想要做的是只保留每个元组中具有唯一编号的entities

dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]

从这里,预期的输出是:

[('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]

到目前为止的代码:

dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]

seen_values = []
clean_data = []

# loop through each sentence and dict of values
for sentence, values in dataset:
    for value in values['entities']:

        if value[0] in seen_values:
            # remove if we have seen this before
            values['entities'].remove(value)
        else:
            # add to list if we have not seen this before
            seen_values.append(value[0])
    clean_data.append((sentence, values))

print(clean_data)

这给了[('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]

谁能帮我解决这个问题

【问题讨论】:

    标签: python python-3.x list dictionary tuples


    【解决方案1】:

    试试这个:

    data = dataset = [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT'), (12, 19, 'PRODUCT')]}),('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (30, 15, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
    res = []
    for x in data:
        t, flag = [], set()
        for d in x[1]['entities']:
            if d[0] not in flag and d[1] not in flag:
                    t.append(d)
            flag.update(d[:2])
        res.append((x[0], {'entities': t}))
    
    print(res)
    

    输出:

    [('made of iron oxide', {'entities': [(12, 16, 'PRODUCT'), (17, 20, 'PRODUCT'), (15, 24, 'PRODUCT')]}), ('made of ferric oxide', {'entities': [(10, 15, 'PRODUCT'), (17, 20, 'PRODUCT'), (624, 651, 'PRODUCT'), (1937, 1956, 'PRODUCT')]})]
    

    【讨论】:

      猜你喜欢
      • 2022-11-10
      • 1970-01-01
      • 2011-05-19
      • 2019-02-24
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-02-09
      相关资源
      最近更新 更多