【问题标题】:Flattening nested JSON API dictionaries in Python在 Python 中展平嵌套的 JSON API 字典
【发布时间】:2020-12-15 19:15:30
【问题描述】:

我收到了使用以下代码收集的距离矩阵的以下 json 响应:

import requests
import json

payload = {
    "origins": [{"latitude": 54.6565153, "longitude": -1.6802816}, {"latitude": 54.6365153, "longitude": -1.6202816}], #surgery
    "destinations": [{"latitude": 54.6856522, "longitude": -1.2183634}, {"latitude": 54.5393295, "longitude": -1.2623914}, {"latitude": 54.5393295, "longitude": -1.2623914}], #oa - up to 625 entries
    "travelMode": "driving",
    "startTime": "2014-04-01T11:59:59+01:00",
    "timeUnit": "second"
}
headers = {"Content-Length": "497", "Content-Type": "application/json"}
paramtr = {"key": "INSERT_KEY_HERE"}
r = requests.post('https://dev.virtualearth.net/REST/v1/Routes/DistanceMatrix', data = json.dumps(payload), params = paramtr, headers = headers)
data = r.json()["resourceSets"][0]["resources"][0]

并且正在尝试变平:

destinations.latitude、destinations.longitude、origins.latitude、 origins.longitude,departmentTime,destinationIndex,originIndex, totalWalkDuration, travelDistance, travelDuration

来自:

    {'__type': 'DistanceMatrix:http://schemas.microsoft.com/search/local/ws/rest/v1',
 'destinations': [{'latitude': 54.6856522, 'longitude': -1.2183634},
  {'latitude': 54.5393295, 'longitude': -1.2623914},
  {'latitude': 54.5393295, 'longitude': -1.2623914}],
 'errorMessage': 'Request completed.',
 'origins': [{'latitude': 54.6565153, 'longitude': -1.6802816},
  {'latitude': 54.6365153, 'longitude': -1.6202816}],
 'results': [{'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 0,
   'originIndex': 0,
   'totalWalkDuration': 0,
   'travelDistance': 38.209,
   'travelDuration': 3082},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 1,
   'originIndex': 0,
   'totalWalkDuration': 0,
   'travelDistance': 40.247,
   'travelDuration': 2708},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 2,
   'originIndex': 0,
   'totalWalkDuration': 0,
   'travelDistance': 40.247,
   'travelDuration': 2708},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 0,
   'originIndex': 1,
   'totalWalkDuration': 0,
   'travelDistance': 34.857,
   'travelDuration': 2745},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 1,
   'originIndex': 1,
   'totalWalkDuration': 0,
   'travelDistance': 36.895,
   'travelDuration': 2377},
  {'departureTime': '/Date(1396349159000-0700)/',
   'destinationIndex': 2,
   'originIndex': 1,
   'totalWalkDuration': 0,
   'travelDistance': 36.895,
   'travelDuration': 2377}]}

我目前取得的最好成绩是:

json_normalize(outtie, record_path="results", meta="origins")

但是,这包含嵌套的起点和终点拒绝附加。我还尝试删除该类型以查看它是否有所作为,并探索了 max_level= 和 record_prefix='_' 但无济于事。

【问题讨论】:

    标签: python json pandas flatten json-normalize


    【解决方案1】:

    我以前遇到过这样的事情,我得到的最好的方法是创建一个 OrderedDict 的递归函数,然后我循环遍历它,所以就在这里。

    def flatten(data, sep="_"):
        import collections
    
        obj = collections.OrderedDict()
    
        def recurse(temp, parent_key=""):
            if isinstance(temp, list):
                for i in range(len(temp)):
                    recurse(temp[i], parent_key + sep + str(i) if parent_key else str(i))
            elif isinstance(temp, dict):
                for key, value in temp.items():
                    recurse(value, parent_key + sep + key if parent_key else key)
            else:
                obj[parent_key] = temp
    
        recurse(data)
        return obj
    

    当您遍历它时,您的数据将如下所示

    for key, value in flatten(a).items():
        print(key, value)
    
    destinations_0_latitude 54.6856522
    destinations_0_longitude -1.2183634
    destinations_1_latitude 54.5393295
    destinations_1_longitude -1.2623914
    destinations_2_latitude 54.5393295
    destinations_2_longitude -1.2623914
    

    我使用分隔符的原因是,它给了你可扩展性,所以你可以使用

    key.split("_")
    
    ['destinations', '0', 'latitude'] 54.6856522
    ['destinations', '0', 'longitude'] -1.2183634
    

    之后,您可以轻松地调整语句,例如

    if key.split("_")[2] = "latitude":
        do something...
    
    if key.endswith("latitude"):
        do something...
    

    【讨论】:

    • 谢谢 - 作为一个解决方案,这真的很有趣 - 我认为特伦顿的答案(使用索引)更有效,但我可能错了?
    • @JonathanFrancis 刚刚检查过,它看起来比我的干净多了,而且您似乎已经在使用 pandas,这是一个更好的选择。
    【解决方案2】:
    • 我认为这对于flatten_json 来说不是一个合适的问题,但是,它对于构造不够周到的 JSON 对象可能很有用。
    • destinations 中的list 对应于results 中的list,这意味着,当它们被归一化时,它们将具有相同的索引。
    • 数据帧可以正确连接,因为它们会有相应的索引。
    # create a dataframe for results and origins
    res_or = pd.json_normalize(data, record_path=['results'], meta=[['origins']])
    
    # create a dataframe for destinations
    dest = pd.json_normalize(data, record_path=['destinations'], record_prefix='dest_')
    
    # normalize the origins column in res_or
    orig = pd.json_normalize(res_or.origins).rename(columns={'latitude': 'origin_lat', 'longitude': 'origin_long'})
    
    # concat the dataframes
    df = pd.concat([res_or, orig, dest], axis=1).drop(columns=['origins'])
    
    # display(df)
                    departureTime  destinationIndex  originIndex  totalWalkDuration  travelDistance  travelDuration  origin_lat  origin_long  dest_latitude  dest_longitude
    0  /Date(1396349159000-0700)/                 0            0                  0          38.209            3082   54.656515    -1.680282      54.685652       -1.218363
    1  /Date(1396349159000-0700)/                 1            0                  0          40.247            2708   54.656515    -1.680282      54.539330       -1.262391
    2  /Date(1396349159000-0700)/                 2            0                  0          40.247            2708   54.656515    -1.680282      54.539330       -1.262391
    

    更新新的示例数据

    • Records 包含destinationsorigins 的索引,因此很容易为每个键创建单独的数据帧,然后为数据帧创建.merge
      • origdest 的索引对应于results 中的destinationIndexoriginsIndex
    # create three separate dataframe
    results = pd.json_normalize(data, record_path=['results'])
    dest = pd.json_normalize(data, record_path=['destinations'], record_prefix='dest_')
    orig = pd.json_normalize(data, record_path=['origins'], record_prefix='orig_')
    
    # merge them at the appropriate location
    df = pd.merge(results, dest, left_on='destinationIndex', right_index=True)
    df = pd.merge(df, orig, left_on='originIndex', right_index=True)
    
    # display(df)
                    departureTime  destinationIndex  originIndex  totalWalkDuration  travelDistance  travelDuration  dest_latitude  dest_longitude  orig_latitude  orig_longitude
    0  /Date(1396349159000-0700)/                 0            0                  0          38.209            3082      54.685652       -1.218363      54.656515       -1.680282
    1  /Date(1396349159000-0700)/                 1            0                  0          40.247            2708      54.539330       -1.262391      54.656515       -1.680282
    2  /Date(1396349159000-0700)/                 2            0                  0          40.247            2708      54.539330       -1.262391      54.656515       -1.680282
    3  /Date(1396349159000-0700)/                 0            1                  0          34.857            2745      54.685652       -1.218363      54.636515       -1.620282
    4  /Date(1396349159000-0700)/                 1            1                  0          36.895            2377      54.539330       -1.262391      54.636515       -1.620282
    5  /Date(1396349159000-0700)/                 2            1                  0          36.895            2377      54.539330       -1.262391      54.636515       -1.620282
    

    【讨论】:

      猜你喜欢
      • 2018-10-30
      • 1970-01-01
      • 2019-12-22
      • 1970-01-01
      • 2022-06-21
      • 2018-05-31
      • 2019-05-03
      • 2019-05-09
      • 2019-03-03
      相关资源
      最近更新 更多