【问题标题】:Python - How to assign/map non-sequential JSON fields onto a dictPython - 如何将非顺序 JSON 字段分配/映射到字典
【发布时间】:2021-08-18 00:35:21
【问题描述】:

我有一个 JSON,其中包含一个并不总是存在的键字典,至少不是所有的键都一直在同一个位置。例如,“生产者”并不总是出现在数组 dict [2] 上,或者“导演”并不总是出现在 JSON 的 [1] 上,它完全取决于我传递给函数的 JSON。根据 ['plist']['dict']['key'] 中可用的内容,内容被映射到 dict 0,1,2,3(工作室除外)...

如何找到演员、导演、制片人等对应的数组,因为他们每个人并不总是位于相同的数组编号?! 最后,我总是希望能够为正确的字段提取正确的数据,即使 ['plist']['dict']['key'] 有时可能会根据映射的字典而有所不同。

...
def get_plist_meta(element):
    if isinstance(element, dict):
        return element["string"]
    return ", ".join(i["string"] for i in element)

...
### Default map if all fields are present
# 0 = cast
# 1 = directors
# 2 = producers
# 3 = screenwriters
plist_metadata = json.loads(dump_json)
### make fields match the given sequence 0 = cast, 1 = directors etc. ()
if 'cast' in plist_metadata['plist']['dict']['key']:
    print("Cast: ", get_plist_meta(plist_metadata['plist']['dict']['array'][0]['dict']))
if 'directors' in plist_metadata['plist']['dict']['key']:
    print("Directors: ", get_plist_meta(plist_metadata['plist']['dict']['array'][1]['dict']))
if 'producers' in plist_metadata['plist']['dict']['key']:
    print("Producers: ", get_plist_meta(plist_metadata['plist']['dict']['array'][2]['dict']))
if 'screenwriters' in plist_metadata['plist']['dict']['key']:
    print("Screenwriters: ", get_plist_meta(plist_metadata['plist']['dict']['array'][3]['dict']))
if 'studio' in plist_metadata['plist']['dict']['key']:
    print("Studio: ", plist_metadata['plist']['dict']['string'])

JSON:

{
   "plist":{
      "@version":"1.0",
      "dict":{
         "key":[
            "cast",
            "directors",
            "screenwriters",
            "studio"
         ],
         "array":[
            {
               "dict":[
                  {
                     "key":"name",
                     "string":"Martina Piro"
                  },
                  {
                     "key":"name",
                     "string":"Ralf Stark"
                  }
               ]
            },
            {
               "dict":{
                  "key":"name",
                  "string":"Franco Camilio"
               }
            },
            {
               "dict":{
                  "key":"name",
                  "string":"Kai Meisner"
               }
            }
         ],
         "string":"Helix Films"
      }
   }
}

也可以在这里获取JSON:https://pastebin.com/JCXRs3Rw

提前致谢

【问题讨论】:

标签: python json mapping


【解决方案1】:

如果您更喜欢pythonic 解决方案,请试试这个:

# We will use this function to extract the names from the subdicts. We put single items in a new array so the result is consistent, no matter how many names there were.
def get_names(name_dict):
    arrayfied = name_dict if isinstance(name_dict, list) else [name_dict]
    return [o["string"] for o in arrayfied]

# Make a list of tuples
dict = plist_metadata['plist']['dict']
zipped = zip(dict["key"], dict["array"])

# Get the names from the subdicts and put it into a new dict
result = {k: get_names(v["dict"]) for k, v in zipped}

这会给你一个看起来像这样的新字典

{'cast': ['Martina Piro', 'Ralf Stark'], 'directors': ['Franco Camilio'], 'screenwriters': ['Kai Meisner']}

新 dict 将仅具有原始 dict 中存在的键。

我建议您查看 zipmap and so on 以及 list comprehensionsdict comprehensions 之类的内容。

【讨论】:

    【解决方案2】:

    我认为这可以解决您的问题:

    import json
    dump_json = """{"plist":{"@version":"1.0","dict":{"key":["cast","directors","screenwriters","studio"],"array":[{"dict":[{"key":"name","string":"Martina Piro"},{"key":"name","string":"Ralf Stark"}]},{"dict":{"key":"name","string":"Franco Camilio"}},{"dict":{"key":"name","string":"Kai Meisner"}}],"string":"Helix Films"}}}"""
    plist_metadata = json.loads(dump_json)
    
    roles = ['cast', 'directors', 'producers', 'screenwriters']                             # all roles
    names = {'cast': [], 'directors': [], 'producers': [], 'screenwriters': []}             # stores the final output
    
    j = 0                                                                                   # keeps count of which array entry we are looking at in plist_metadata['plist']['dict']['array']
    for x in names.keys():                                                                  # cycle through all the possible roles
        if x in plist_metadata['plist']['dict']['key']:                                     # if a role exists in the keys, we'll store it in names[role_name]
            y = plist_metadata['plist']['dict']['array'][j]['dict']                         # keep track of value
            if isinstance(plist_metadata['plist']['dict']['array'][j]['dict'], dict):       # if its a dict, encase it in a list
                y = [plist_metadata['plist']['dict']['array'][j]['dict']]
            j += 1                                                                          # add to our plist-dict-array index
            names[x] = list(map(lambda x: x['string'], y))                                  # map each of the entries from {"key":"name","string":"Martina Piro"} to just "Martina Piro"
    print(names)
    
    def list_names(role_name):
        if role_name not in names.keys():
            return f'Invalid list request: Role name "{role_name}" not found.'
        return f'{role_name.capitalize()}: {", ".join(names[role_name])}'
        
    print(list_names('cast'))
    print(list_names('audience'))
    

    输出:

    {'cast': ['Martina Piro', 'Ralf Stark'], 'directors': ['Franco Camilio'], 'producers': [], 'screenwriters': ['Kai Meisner']}
    Cast: Martina Piro, Ralf Stark
    Invalid list request: Role name "audience" not found.
    

    【讨论】:

    • 嘿,感谢您的回答:D 非常感谢。好吧,实际的拉出现在工作正常,但是如何(反向)将输出的字符串排序回演员、导演等。你能在一个简单的打印语句中显示这个吗?我后来需要将所有数据写入数据库。提前致谢
    • 你在哪里使用“present_roles”?
    • 基本上我需要一些我已经在我的问题中展示过的东西 (print("Cast: ", get_plist_meta(plist_metadata['plist']['dict']['array'][0][ 'dict']))) 每个键(演员、制片人等)的逗号分隔列表会很棒。如前所述,我最终必须将其写入数据库
    • 编辑添加。如果还有什么需要更改,请再次发表评论。
    • 太棒了,非常感谢,你拯救了我的一天
    猜你喜欢
    • 1970-01-01
    • 2015-10-17
    • 1970-01-01
    • 2012-06-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多