【问题标题】:Nested dictionary path find by value嵌套字典路径按值查找
【发布时间】:2021-11-30 21:04:50
【问题描述】:

我正在使用递归从嵌套的 JSON 中找到一些特定值的树路径。例如,在给定的 JSON 上,我试图找到 src 元素的完整路径树。 注意我有 2 个具有相同值的 src 元素,我当前的代码适用于不同的 src 值,但是当我的 src 值相同时键,结果不是预期的。

当前 JSON:

{
  "imagepanel": {
    "image": [
      {
        "scaled_image": {
          "classes": "w-full",
          "aspect_ratios": "frame sm:4:3 xmed:4:3",
          "art_directions": [
            {
              "alt": "River waves",
              "src": "path/to/file/53339c03d67e6ee5-lesson-3.jpg",
              "type": "jpg",
              "media": "(min-width:900.1px)",
              "sizes": "50vw",
              "intrinsicwidth": "1411",
              "intrinsicheight": "1000"
            },
            {
              "alt": "River waves",
              "src": "path/to/file/53339c03d67e6ee5-lesson-3.jpg",
              "type": "jpg",
              "media": "(max-width:900px)",
              "sizes": "100vw",
              "intrinsicwidth": "0",
              "intrinsicheight": "0"
            }
          ]
        }
      }
    ],
    "title": "<p><span class=\"drop-cap\" drop-cap=\"true\">WHAT TO WATCH</span></p>",
    "cta_label": "SEE THE LIST",
    "left_image": true
  }
}

当前代码:

import json
import pprint
def breadcrumb_finder(json_dict_or_list, value):
    if json_dict_or_list == value:
        return [json_dict_or_list]
    elif isinstance(json_dict_or_list, dict):
        for k, v in json_dict_or_list.items():
            child = breadcrumb_finder(v, value)
            if child:
                return [k] + child
    elif isinstance(json_dict_or_list, list):
        lst = json_dict_or_list
        for i in range(len(lst)):
            child = breadcrumb_finder(lst[i], value)
            if child:
                return [str(i)] + child

data = r'''{"imagepanel":{"image":[{"scaled_image":{"classes":"w-full","aspect_ratios":"frame sm:4:3 xmed:4:3","art_directions":[{"alt":"River waves","src":"path/to/file/53339c03d67e6ee5-lesson-3.jpg","type":"jpg","media":"(min-width:900.1px)","sizes":"50vw","intrinsicwidth":"1411","intrinsicheight":"1000"},{"alt":"River waves","src":"path/to/file/53339c03d67e6ee5-lesson-3.jpg","type":"jpg","media":"(max-width:900px)","sizes":"100vw","intrinsicwidth":"0","intrinsicheight":"0"}]}}],"title":"<p><span class=\"drop-cap\" drop-cap=\"true\">WHAT TO WATCH</span></p>","cta_label":"SEE THE LIST","left_image":true}}'''
data = json.loads(data)

all_src = ['path/to/file/53339c03d67e6ee5-lesson-3.jpg', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg']
for src in all_src:
    nested_path = breadcrumb_finder(data, src)
    print(nested_path)

电流输出:

['imagepanel', 'image', '0', 'scaled_image', 'art_directions', '0', 'src', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg']
['imagepanel', 'image', '0', 'scaled_image', 'art_directions', '0', 'src', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg']

                                                               ^^^ note index here

预期输出:

['imagepanel', 'image', '0', 'scaled_image', 'art_directions', '0', 'src', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg']
['imagepanel', 'image', '0', 'scaled_image', 'art_directions', '1', 'src', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg']

                                                               ^^^ note index here

【问题讨论】:

  • 我这里有点迷茫,如果最初找到正在搜索的同一个src,为什么要索引1?或者,如果已经找到某个索引,您是否只想跳过它?
  • 删除已经找到的元素怎么样?
  • @ZaidAlShatle 我同意你的观点,但我的要求是找到给定 src 值的所有路径。因此,如果在多个地方找到 src,我需要这些多条路径:)
  • @ZhubeiFederer 似乎是个好主意,但不确定如何在更大的嵌套字典上有效地做到这一点。

标签: python dictionary recursion


【解决方案1】:

经过一点调试,我发现问题出在这段代码上:

elif isinstance(json_dict_or_list, list):
    lst = json_dict_or_list
    for i in range(len(lst)):
        child = breadcrumb_finder(lst[i], value)
        if child:
            return [str(i)] + child

由于child数组不为空时调用return,如果条件满足,也会忽略lst[i]之后的所有元素。

因此,我通过使用回溯稍微更改了您的代码:

import json
import pprint
results = []
def breadcrumb_finder(json_dict_or_list, value, path, result):
    if json_dict_or_list == value:
        path.append(json_dict_or_list)
        result.append(path.copy())
        path.pop()
    elif isinstance(json_dict_or_list, dict):
        for k, v in json_dict_or_list.items():
            path.append(k)
            child = breadcrumb_finder(v, value, path, result)
            path.pop()
                
    elif isinstance(json_dict_or_list, list):
        lst = json_dict_or_list
        for i in range(len(lst)):
            path.append(i)
            child = breadcrumb_finder(lst[i], value, path, result)
            path.pop()       

data = r'''{"imagepanel":{"image":[{"scaled_image":{"classes":"w-full","aspect_ratios":"frame sm:4:3 xmed:4:3","art_directions":[{"alt":"River waves","src":"path/to/file/53339c03d67e6ee5-lesson-3.jpg","type":"jpg","media":"(min-width:900.1px)","sizes":"50vw","intrinsicwidth":"1411","intrinsicheight":"1000"},{"alt":"River waves","src":"path/to/file/53339c03d67e6ee5-lesson-3.jpg","type":"jpg","media":"(max-width:900px)","sizes":"100vw","intrinsicwidth":"0","intrinsicheight":"0"}]}}],"title":"<p><span class=\"drop-cap\" drop-cap=\"true\">WHAT TO WATCH</span></p>","cta_label":"SEE THE LIST","left_image":true}}'''
data = json.loads(data)

all_src = ['path/to/file/53339c03d67e6ee5-lesson-3.jpg']
for src in all_src:
    breadcrumb_finder(data, src, [], results)
    print(results)

这将确保列表将一直迭代。 结果:

[['imagepanel', 'image', 0, 'scaled_image', 'art_directions', 0, 'src', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg'], ['imagepanel', 'image', 0, 'scaled_image', 'art_directions', 1, 'src', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg']]

编辑:我更新了全局变量results,这样它就不会与局部变量混淆了

【讨论】:

    【解决方案2】:
    import json
    import pprint
    def breadcrumb_finder(json_dict_or_list, value):
        if json_dict_or_list == value:
            return [json_dict_or_list]
        elif isinstance(json_dict_or_list, dict):
            for k, v in json_dict_or_list.items():
                child = breadcrumb_finder(v, value)
                if child:
                    return [k] + child
        elif isinstance(json_dict_or_list, list):
            lst = json_dict_or_list
            for i in range(len(lst)):
                child = breadcrumb_finder(lst[i], value)
                if child:
                    if child[0] != "src" or str(i) not in found_srcs:
                        found_srcs.append(str(i))
                        return [str(i)] + child
    
    data = r'''{"imagepanel":{"image":[{"scaled_image":{"classes":"w-full","aspect_ratios":"frame sm:4:3 xmed:4:3","art_directions":[{"alt":"River waves","src":"path/to/file/53339c03d67e6ee5-lesson-3.jpg","type":"jpg","media":"(min-width:900.1px)","sizes":"50vw","intrinsicwidth":"1411","intrinsicheight":"1000"},{"alt":"River waves","src":"path/to/file/53339c03d67e6ee5-lesson-3.jpg","type":"jpg","media":"(max-width:900px)","sizes":"100vw","intrinsicwidth":"0","intrinsicheight":"0"}]}}],"title":"<p><span class=\"drop-cap\" drop-cap=\"true\">WHAT TO WATCH</span></p>","cta_label":"SEE THE LIST","left_image":true}}'''
    data = json.loads(data)
    
    all_src = ['path/to/file/53339c03d67e6ee5-lesson-3.jpg', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg']
    found_srcs = []
    for src in all_src:
        nested_path = breadcrumb_finder(data, src)
        print(nested_path)
    

    通过存储找到的 src,您可以跳过它们。我发现最简单的方法是在添加 src 编号之前,您可以像在代码中一样检查并完全跳过该迭代。

    输出:

    ['imagepanel', 'image', '0', 'scaled_image', 'art_directions', '0', 'src', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg']
    ['imagepanel', 'image', '0', 'scaled_image', 'art_directions', '1', 'src', 'path/to/file/53339c03d67e6ee5-lesson-3.jpg']
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2015-09-11
      • 1970-01-01
      • 2013-10-23
      • 2017-12-27
      • 2022-10-12
      • 1970-01-01
      • 2018-10-08
      • 1970-01-01
      相关资源
      最近更新 更多