【问题标题】:Python parse nested JSON file and take out specific attributesPython解析嵌套JSON文件并取出特定属性
【发布时间】:2021-10-22 18:38:32
【问题描述】:

所以我在这里有一个大的 JSON 文件,看起来像这样:

data = {
    "Module1": {
        "Description": "",
        "Layer": "1",
        "SourceDir": "pathModule1",
        "Attributes": {
            "some",
        },
        "Vendor": "comp",
        "components":{
            "Component1": {
               "path": "something",
               "includes": [
                   "include1",
                   "include2",
                   "include3",
                   "include4",
                   "include5"
               ]
               "generated:" "txt"
               "memory:" "txt"
               etc
            },
            "Component2":{
               "path": "something",
               "includes": [
                   "include1",
                   "include2",
                   "include3",
                   "include4",
                   "include5"
               ]
               "generated:" "txt"
               "memory:" "txt"
               etc
            }
        }
    },
    "Module2": {
        "Description": "",
        "Layer": "2",
        "SourceDir": "pathModule2",
        "Attributes": {
            "some",
        },
        "Vendor": "comp",
        "components":{
            "Component1": {
               "path": "something",
               "includes": [
                   "include1",
                   "include2",
                   "include3",
                   "include4",
                   "include5"
               ]
               "generated:" "txt"
               "memory:" "txt"
               etc
            },
            "Component2":{
               "path": "something",
               "includes": [
                   "include1",
                   "include2",
                   "include3",
                   "include4",
                   "include5"
               ]
               "generated:" "txt"
               "memory:" "txt"
               etc
            }
        }
    },
    "Module3": {
        "Description": "",
        "Layer": "3",
        "SourceDir": "path",
        "Attributes": {
            "some",
        },
        "Vendor": "",
    },
    "Module4": {
        "Description": "",
        "Layer": "4",
        "SourceDir": "path",
        "Attributes": {
            "some",
        }
    }
}

我必须经历并从中取出一些东西,所以最后我得到了这个:

只要 Vendor 字段等于“comp”,就考虑该模块,考虑它的 SourceDir 字段、所有组件、它们的路径和包含。

所以输出将是:

Module1, "pathModule1", components: [Component1, path, [includes: include1, include2 ,include3 ,include4 ,include5 ]], [Component2, path, includes: [include1, include2 ,include3 ,include4 ,include5 ]]

Module2, "pathModule2", components: [Component1, path, [includes: include1, include2 ,include3 ,include4 ,include5 ]], [Component2, path, includes: [include1, include2 ,include3 ,include4 ,include5 ]]

我真的很难访问我需要的所有字段。

我当前的代码是这样的:

with open ("DB.json", 'r') as f:
    modules= json.load(f)

for k in modules.keys():
    try:
        if swc_list[k]["Vendor"] == "comp":
            list_components.append(k)
            sourceDirList.append(swc_list[k]['SourceDir'])
            for i in swc_list[k]['sw_objects']:
                 list_sw_objects.append((swc_list[k]['sw_objects']))
    except KeyError:
        continue

我设法只获得 Module1 和 sourceDir,但没有获得 Component1、2 及其属性.. 我怎样才能做到这一点?

谢谢!

【问题讨论】:

    标签: python json attributes


    【解决方案1】:

    我会先过滤掉您不感兴趣的项目,方法是:

    data = {k: v for k,v in data.items() if v.get("Vendor") == "comp"}
    

    这会删除所有您不想要的模块。这有点低效,因为您要再次解析字典以获取所需格式的数据,但第一步更容易推理,这很有帮助!

    此时,如果需要,您可以再次遍历字典 - 您会得到如下内容:

    {'Module1': {'Attributes': {'some'},
                 'Description': '',
                 'Layer': '1',
                 'SourceDir': 'pathModule1',
                 'Vendor': 'comp',
                 'components': {'Component1': {'includes': ['include1',
                                                            'include2',
                                                            'include3',
                                                            'include4',
                                                            'include5'],
                                               'path': 'something'},
                                'Component2': {'includes': ['include1',
                                                            'include2',
                                                            'include3',
                                                            'include4',
                                                            'include5'],
                                               'path': 'something'}}},
     'Module2': {'Attributes': {'some'},
                 'Description': '',
                 'Layer': '2',
                 'SourceDir': 'pathModule2',
                 'Vendor': 'comp',
                 'components': {'Component1': {'includes': ['include1',
                                                            'include2',
                                                            'include3',
                                                            'include4',
                                                            'include5'],
                                               'path': 'something'},
                                'Component2': {'includes': ['include1',
                                                            'include2',
                                                            'include3',
                                                            'include4',
                                                            'include5'],
                                               'path': 'something'}}}}
    

    要仅打印源目录和组件,您可以这样做:

    for k,v in data2.items():
        print(k, v["SourceDir"], v["components"])
    

    这会给你:

    Module1 pathModule1 {'Component1': {'path': 'something', 'includes': ['include1', 'include2', 'include3', 'include4', 'include5']}, 'Component2': {'path': 'something', 'includes': ['include1', 'include2', 'include3', 'include4', 'include5']}}
    Module2 pathModule2 {'Component1': {'path': 'something', 'includes': ['include1', 'include2', 'include3', 'include4', 'include5']}, 'Component2': {'path': 'something', 'includes': ['include1', 'include2', 'include3', 'include4', 'include5']}}
    

    编辑: 要进一步细化输出,您可以将上述循环更改为:

    for k,v in data2.items():
        components = [(comp_name, comp_data["path"], comp_data["includes"]) for comp_name, comp_data in v["components"].items()]
        print(k, v["SourceDir"], components)
    

    这会给你:

    Module1 pathModule1 [('Component1', 'something', ['include1', 'include2', 'include3', 'include4', 'include5']), ('Component2', 'something', ['include1', 'include2', 'include3', 'include4', 'include5'])]
    Module2 pathModule2 [('Component1', 'something', ['include1', 'include2', 'include3', 'include4', 'include5']), ('Component2', 'something', ['include1', 'include2', 'include3', 'include4', 'include5'])]
    

    【讨论】:

    • 这是一个很棒的解决方案人,只需 3 行就可以得到东西.. 还有一件事,我实际上在 Component1 或 Component2 等中有更多属性......但我只需要它的名称,路径并包括就像现在打印的一样,但要排除所有其他属性,我现在会更新我的问题,对此感到抱歉
    • 编辑添加了一些更有针对性的过滤 - 希望这就是你所追求的?
    • 就是这样!谢谢你,兄弟!太棒了..我现在尝试了 50 行..你用 3 行做到了..
    • 我现在尝试创建一些 .txt 文件来存储所有这些文件时遇到了很大的问题,我可以更新问题或发布一个新问题吗?
    • 你应该为那个位打开一个新问题 - 编辑应该只是为了澄清问题,而不是扩展它!
    猜你喜欢
    • 2021-12-13
    • 2021-12-10
    • 1970-01-01
    • 1970-01-01
    • 2023-01-23
    • 1970-01-01
    • 2019-03-31
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多