如何根据 Python 中的公共键值对有效地将键值从一个字典列表插入到另一个字典列表中？答案

【问题标题】：How to efficiently insert key-value from one list of dictionaries to another based on a common key-value pair in Python?如何根据 Python 中的公共键值对有效地将键值从一个字典列表插入到另一个字典列表中？
【发布时间】：2021-05-14 00:41:18
【问题描述】：

我在 python 中有这么大的字典列表。下面是一个例子：

big_list_dictionary = [{
    'name': 'test = 1',
    'id': 1,
    'value': 30
},{
    'name': 'apple = 1',
    'id': 2,
    'value': 70
},{
    'name': 'orange = 1',
    'id': 3,
    'value': 10
},{
    'name': 'balloon = 1',
    'id': 4,
    'value': 20
},{
    'name': 'airplane = 1',
    'id': 5,
    'value': 40
}]

我有一个包含两个字典及其总值的列表

total1 = [{
    'name': 'test',
    'total': 130
},{
    'name': 'apple',
    'total': 270
},{
    'name': 'orange',
    'total': 310
},{
    'name': 'balloon',
    'total': 420
},{
    'name': 'airplane',
    'total': 540
}]

total2 = [{
    'name': 'test',
    'total': 230
},{
    'name': 'apple',
    'total': 570
},{
    'name': 'orange',
    'total': 3210
},{
    'name': 'balloon',
    'total': 620
},{
    'name': 'airplane',
    'total': 940
}]

如果您注意到，total1 和 total2 中的 name 与 big_list_dictionary 略有不同，其中 = 1 被省略。

如何将 total1 和 total2 的总值添加到 big_list_dictionary 以便最终结果如下所示：

[{
    'name': 'test = 1',
    'id': 1,
    'value': 30,
    'total2': 230,
    'total1': 130
},{
    'name': 'apple = 1',
    'id': 2,
    'value': 70,
    'total2': 570,
    'total1': 270
},{
    'name': 'orange = 1',
    'id': 3,
    'value': 10,
    'total2': 3210,
    'total1': 310
},{
    'name': 'balloon = 1',
    'id': 4,
    'value': 20,
    'total2': 620,
    'total1': 420
},{
    'name': 'airplane = 1',
    'id': 5,
    'value': 40,
    'total2': 940,
    'total1': 540
}]

目前，我这样做的方式很慢。

    for item in big_list_dictionary:
        for t1,t2 in zip(total1,total2):
            if t1['name'] in item['name']:
                item['total1] = t1['total']
                item['total2'] = t2['total']

我怎样才能有效地做到这一点？

【问题讨论】：

标签： python list dictionary optimization

【解决方案1】：

如果多余的字符总是=1，那么你可以创建一个中间关系，然后使用下面的代码。

big_list_dictionary = [{'name': 'test = 1','id': 1,'value': 30},{'name': 'apple = 1','id': 2,'value': 70},{'name': 'orange = 1','id': 3,'value': 10},{'name': 'balloon = 1','id': 4,'value': 20},{'name': 'airplane = 1','id': 5,'value': 40}]

total1 = [{ 'name': 'test','total': 130},{'name': 'apple','total': 270},
{'name': 'orange','total': 310},{'name': 'balloon','total': 420},{'name': 'airplane','total': 540}]
total2 = [{'name': 'test','total': 230},{'name': 'apple','total': 570},{'name': 'orange','total': 3210},{'name': 'balloon','total': 620},{'name': 'airplane','total': 940}]


intermediate = {i['name'].split('=')[0].strip():i for i in big_list_dictionary}

for t1, t2 in zip(total1, total2):
    intermediate[t1['name']]['total1'] = t1['total']
    intermediate[t1['name']]['total2'] = t2['total']

print(big_list_dictionary)

输出

[{'name': 'test = 1', 'id': 1, 'value': 30, 'total1': 130, 'total2': 230},
 {'name': 'apple = 1', 'id': 2, 'value': 70, 'total1': 270, 'total2': 570},
 {'name': 'orange = 1', 'id': 3, 'value': 10, 'total1': 310, 'total2': 3210},
 {'name': 'balloon = 1', 'id': 4, 'value': 20, 'total1': 420, 'total2': 620},
 {'name': 'airplane = 1', 'id': 5, 'value': 40, 'total1': 540, 'total2': 940}]

基准测试

%%timeit -n10 -r10 与 big_list_dictionary、total1 和 total2 每个长度为 1000。增加了长度以显示效率。

这个解决方案

513 µs ± 17.2 µs per loop (mean ± std. dev. of 10 runs, 10 loops each)

您的解决方案

91.6 ms ± 1.19 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

此解决方案更快。 (513 µs < 91.6 ms) 两种解决方案之间的差异只会随着长度的增加而增加。

基于 cmets 编辑：

我相信test1 和test2 中有一些元素不在 big_list_dictionary，因为假设 test1 和 test2 具有相同顺序的相同元素，您可以遍历列表之一，如果找不到则追加到 big_list_dictionary 并将其添加到 @ 987654338@。这会将所有新字典追加到big_list_dictionary 的末尾，在末尾添加比在随机位置插入要快。但是，如果您确实关心订单，那么我相信您的解决方案将尽善尽美。

免责声明：我没有测试这部分代码，因为我没有输入或输出来检查所需的行为。

big_list_dictionary = [{'name': 'test = 1','id': 1,'value': 30},{'name': 'apple = 1','id': 2,'value': 70},{'name': 'orange = 1','id': 3,'value': 10},{'name': 'balloon = 1','id': 4,'value': 20},{'name': 'airplane = 1','id': 5,'value': 40}]
total1 = [{ 'name': 'test','total': 130},{'name': 'apple','total': 270},{'name': 'orange','total': 310},{'name': 'balloon','total': 420},{'name': 'airplane','total': 540}]
total2 = [{'name': 'test','total': 230},{'name': 'apple','total': 570},{'name': 'orange','total': 3210},{'name': 'balloon','total': 620},{'name': 'airplane','total': 940}]


intermediate = {i['name'].split('=')[0].strip():i for i in big_list_dictionary}


for t1 in total1:
    if t1['name'] not in intermediate:         
        temp_dict = {'name': t1['name'],'id': 0,'value': 0}
        big_list_dictionary.append(temp_dict)
        intermediate[t1['name']] = temp_dict


# insert rest of the answer from code above

【讨论】：

感谢您的解决方案。这看起来很棒。我现在正在测试它。到目前为止，它看起来相当快。目前正在运行 187000 个项目。
如果可行，请投票并接受解决方案
会的。我已经赞成该解决方案。得到结果后会尽快选择作为答案。
确定 .. 我遇到了与您的解决方案无关的其他问题。一旦我修复了这些并获得了完整的结果，我就可以添加时差。
@Kuni 格式在 cmets 中搞砸了，您介意链接到 pastebin 或 dpaste 吗？