检查内容时展平嵌套字典答案

【问题标题】：Flatten nested dictionaries while checking content检查内容时展平嵌套字典
【发布时间】：2014-09-04 03:13:53
【问题描述】：

我有一本这样的字典：

source = {

    'Section 1' : {
        'range'       : [0, 200],
        'template'    : 'ID-LOA-XXX',
        'nomenclature': True
    },

    'Section 2' : {
        'range'       : [201, 800],
        'template'    : 'ID-EPI-XXX',
        'nomenclature': False,
        'Subsection 1' : {
            'range'       : [0, 400],
            'template'    : 'ID-EPI-S1-XXX',
            'nomenclature': False,
            'Subsubsection 1' : {
                'range'       : [0, 400],
                'template'    : 'ID-EPI-S12-XXX',
                'nomenclature': False
            }
        },
        'Subsection 2' : {
            'range'       : [0, 400],
            'template'    : 'ID-EPI-S2-XXX',
            'nomenclature': False
        }
    }, 

    # etc.

}

从 JSON 文件加载。我想将其“展平”到以下字典：

target = {

    'Section 1' : {
        'range'       : [0, 200],
        'template'    : 'ID-LOA-XXX',
        'nomenclature': True,
        'location'    : './Section 1/'
    },

    'Section 2' : {
        'range'       : [201, 800],
        'template'    : 'ID-EPI-XXX',
        'nomenclature': False,
        'location'    : './Section 2/'
    },

    'Subsection 1' : {
        'range'       : [0, 400],
        'template'    : 'ID-EPI-S1-XXX',
        'nomenclature': False,
        'location'    : './Section 2/Subsection 1/'
    },

    'Subsubsection 1' : {
        'range'       : [0, 400],
        'template'    : 'ID-EPI-S12-XXX',
        'nomenclature': False,
        'location'    : './Section 2/Subsection 1/Subsubsection 1'
    },

    'Subsection 2' : {
        'range'       : [0, 400],
        'template'    : 'ID-EPI-S2-XXX',
        'nomenclature': False,
        'location'    : './Section 2/Subsection 2/'
    },

    # etc.

}

我也许能够更改原始 JSON 文件的生成方式，但我不想去那里。

word 中的 JSON 文件：每个部分至少包含三个键，可能包含其他键。那些其他键被解释为包含在当前节中的子节，每个都是具有相同属性的dict。这种模式原则上可以无限深地递归。

我还想执行一些断言：

是否存在所有必填字段（'range'、'template' 和 'nomenclature'）
必填字段的值通过某些断言

到目前为止，我只设法进行了这些检查：

import json

key_requirements = {
    "nomenclature": lambda x : isinstance(x, bool),
    "template"    : lambda x : isinstance(x, str)  and "X" in x,
    "range"       : lambda x : isinstance(x, list) and len(x)==2 and all([isinstance(y,int) for y in x]) and x[1] > x[0]
}

def checkSection(section):

    for key in section:
        if key not in key_requirements:            
            checkSection(section[key])

        elif not key_requirements[key]( section[key] ): 
            # error: assertion failed
            pass

        else:      
            # error: key not present
            pass

for key in source # json.load(open(myJsonFile))
    checkSection(data[key])

但目前，再多的咖啡也无法让我想出一种高效、优雅、pythonic 的方式来将所需的转换编织到这个方案中......

有什么建议或想法吗？

【问题讨论】：

标签： python json dictionary map flatten

【解决方案1】：

这个问题需要递归遍历，除非你想要一些第三方库（是的，有解决方案），否则你需要一个简单的本地递归遍历

注意路径语义可能与我在 windows 上的不同

实施

def flatten(source):
    target = {}
    def helper(src, path ='.', last_key = None):
        if last_key: 
            target[last_key] = {}
            target[last_key]['location'] = path
        for key, value in src.items():
            if isinstance(value, dict):
                helper(value, os.path.join(path, key), key)

            else:
                target[last_key][key] = value

    helper(source)
    return target

输出

>>> pprint.pprint(source)
{'Section 1': {'nomenclature': True,
               'range': [0, 200],
               'template': 'ID-LOA-XXX'},
 'Section 2': {'Subsection 1': {'Subsubsection 1': {'nomenclature': False,
                                                    'range': [0, 400],
                                                    'template': 'ID-EPI-S12-XXX'},
                                'nomenclature': False,
                                'range': [0, 400],
                                'template': 'ID-EPI-S1-XXX'},
               'Subsection 2': {'nomenclature': False,
                                'range': [0, 400],
                                'template': 'ID-EPI-S2-XXX'},
               'nomenclature': False,
               'range': [201, 800],
               'template': 'ID-EPI-XXX'}}
>>> pprint.pprint(flatten(source))
{'Section 1': {'location': '\\Section 1',
               'nomenclature': True,
               'range': [0, 200],
               'template': 'ID-LOA-XXX'},
 'Section 2': {'location': '\\Section 2',
               'nomenclature': False,
               'range': [201, 800],
               'template': 'ID-EPI-XXX'},
 'Subsection 1': {'location': '\\Section 2\\Subsection 1',
                  'nomenclature': False,
                  'range': [0, 400],
                  'template': 'ID-EPI-S1-XXX'},
 'Subsection 2': {'location': '\\Section 2\\Subsection 2',
                  'nomenclature': False,
                  'range': [0, 400],
                  'template': 'ID-EPI-S2-XXX'},
 'Subsubsection 1': {'location': '\\Section 2\\Subsection 1\\Subsubsection 1',
                     'nomenclature': False,
                     'range': [0, 400],
                     'template': 'ID-EPI-S12-XXX'}}

【讨论】：

很好，很好...但是None: {} 在那里做什么？ :)
为什么不把path = '.\\'改成path = '.'？
@RodyOldenhuis：已修复，我忽略了一些错误。
是的，就是这样。比我的优雅多了。非常感谢！

【解决方案2】：

我最终得到了这个解决方案：

import os

key_requirements = {
    "nomenclature": lambda x : isinstance(x, bool),
    "template"    : lambda x : isinstance(x, str)  and "X" in x,
    "range"       : lambda x : isinstance(x, list) and len(x)==2 and all([isinstance(y,int) for y in x]) and x[1] > x[0]
}


def checkAndFlattenData(data):

    def merge_dicts(dict1,dict2):
        return dict(list(dict1.items()) + list(dict2.items()))


    def check_section(section, section_content):

        section_out = {
            'range'   : section_content['range'],
            'template': section_content['template'],
            'location': section
        }
        nested_section_out = {}

        for key,value in section_content.iteritems():

            if key not in key_requirements:
                if not isinstance(value,dict):
                    # error: invalid key
                    pass

                else:
                    nested_section_out[key], recurse_out = check_section(key,value)
                    nested_section_out = merge_dicts(nested_section_out, recurse_out)


            elif not key_requirements[key](value):
                print "ASSERTION FAILED!"# error: field assertion failed
                pass

        for key in nested_section_out:
            nested_section_out[key]['location'] = os.path.join(section, nested_section_out[key]['location'])

        return section_out, nested_section_out

    new_data = {}
    for key,value in data.iteritems():
        new_data[key], nested_data = check_section(key, value)
        new_data = merge_dicts(new_data, nested_data)

    for key,value in new_data.iteritems():
        new_data[key]['location'] = os.path.join('.', new_data[key]['location'])

    return new_data


target = checkAndFlattenData(source)

但我不禁觉得这一切都可以做得更 Pythonic（和/或更高效）......如果有人有任何建议，请不要犹豫，复制粘贴并进行改进在一个独立的答案中，所以我可以接受。

【讨论】：

【解决方案3】：

这适用于您的情况：

output = {}
for key, value in source.iteritems():
    item = {}
    for nested_key, nested_value in value.iteritems():
        if type(nested_value) == type({}):
            nested_item = {}
            for nested_key_2, nested_value_2 in nested_value.iteritems():
                nested_item[nested_key_2] = nested_value_2
            output[nested_key] = nested_item
        else:
            item[nested_key] = nested_value
    output[key] = item

【讨论】：

...除了 “原则上这种模式可以无限深递归。”，以及额外的 location 键。但这可以重新设计:)