通过 Python 中的相似值合并两个嵌套字典列表答案

【问题标题】：Merging two lists of nested dictionaries by the similar values in Python通过 Python 中的相似值合并两个嵌套字典列表
【发布时间】：2018-09-08 03:43:38
【问题描述】：

我有两个嵌套字典列表：

lofd1 = [{'A': {'facebook':{'handle':'https://www.facebook.com/pages/New-Jersey/108325505857259','logo_id': None}, 'contact':{'emails':['nj@nj.gov','state@nj.gov']},'state': 'nj', 'population':'12345', 'capital':'Jersey','description':'garden state'}}]
lofd2 = [{'B':{'building_type':'ranch', 'city':'elizabeth', 'state':'nj', 'description':'the state close to NY'}}]

我需要：

使用“state”键的值合并列表中的相似字典（例如，将“state”=“nj”的所有字典合并到一个字典中
它应该包括在两个字典中出现一次的键/值组合（例如，两者的“状态”应该是“nj”）
它应该包括一个字典中不存在的键/值组合（例如，来自 lofd1 的“population”、“capital”和来自 lofd2 的“building_type”、“city”）。
应排除字典中的某些值，例如，'logo_id':None
将两个字典中“description”中的值放入字符串列表中，例如 '"description" : ['garden state', 'the state close to NY']'

最终的数据集应该如下所示：

lofd_final = [{'state': 'nj', 'facebook':{'handle':'https://www.facebook.com/pages/New-Jersey/108325505857259'},'population':'12345', 'capital':'Jersey', 'contact':{'emails':['nj@nj.gov','state@nj.gov']}, 'description': ['garden state','the state close to NY'],'building_type':'ranch', 'city':'elizabeth'}]

什么是有效的解决方案？

【问题讨论】：

您是否看过这里提出的任何解决方案：stackoverflow.com/questions/38987/…？可能对您想要实现的目标有所帮助
@Jesse 我看了一个类似的解决方案，问题是我有一个字典列表，而不是一个独立的字典
字典的可能结构是什么。每个字典是否只有一个顶级键，例如您的示例？或者单个字典可以有多个键，例如“A”、“B”、“C”
@the-realtom lofd2 有一个顶层，lofd1 有一些嵌套字典和字典内的列表（我更新了示例）

标签： python-3.x list dictionary intersection

【解决方案1】：

这是一个非常适合您的案例的解决方案。就时间复杂度而言，它是； O(n*m)，n 是列表中的字典数，m 是字典中的键数。您只需要查看每个字典中的每个键一次。

def extract_data(lofd, output):
    for d in lofd:
        for top_level_key in d: # This will be the A or B key from your example
            data = d[top_level_key] 
            state = data['state']
            if state not in output: # Create the state entry for the first time
                output[state] = {}
            # Now update the state entry with the data you care about
            for key in data:
                # Handle descriptions
                if key == 'description':
                    if 'description' not in output[state]:
                        output[state]['description'] = [data['description']]
                    else:
                        output[state]['description'].append(data['description'])
                # Handle all other keys
                else:
                    # Handle facebook key (exclude logo_id)
                    if key == 'facebook':
                        del data['facebook']['logo_id']
                    output[state][key] = data[key]

output = {}
extract_data(lofd1, output)
extract_data(lofd2, output)
print(list(output.values()))

output 将是 dicts 的 dict，其中顶级键作为状态。要将其转换为您指定的方式，只需将值提取到一个平面列表中：list(output.values())（参见上面的示例）。

注意：我假设不需要深拷贝。所以在你提取数据之后，我假设你不会去操作lofd1 和lofd2 中的值。这也完全基于给出的规格，例如如果需要排除更多嵌套键，则需要自己添加额外的过滤器。

【讨论】：