如何将复杂的 Python 对象映射到 pandas 数据框？答案

【问题标题】：How to map a complex Python object to a pandas dataframe?如何将复杂的 Python 对象映射到 pandas 数据框？
【发布时间】：2021-12-01 06:07:18
【问题描述】：

我有一个 Json 文件，我将其作为对象读入 Python。 json文件如下：

{
  "ID": "1",
  "Container": {
    "DistributionOptions": [
      {
        "OptionId": 1,
        "OptionSet": [
          {
            "Location": {
              "Number": "1"
            },
            "Lines": [
              {
                "OrderLineId": 0,
                "Quantity": 2
              },
              {
                "OrderLineId": 1,
                "Quantity": 4
              }
            ]
          },
          {
            "Location": {
              "Number": "2"
            },
            "Lines": [
              {
                "OrderLineId": 2,
                "Quantity": 5
              },
              {
                "OrderLineId": 3,
                "Quantity": 7
              }
            ]
          }
        ]
      },
      {
        "OptionId": 2,
        "OptionSet": [
          {
            "Location": {
              "Number": "3"
            },
            "Lines": [
              {
                "OrderLineId": 0,
                "Quantity": 2
              },
              {
                "OrderLineId": 1,
                "Quantity": 4
              }
            ]
          },
          {
            "Location": {
              "Number": "4"
            },
            "Lines": [
              {
                "OrderLineId": 2,
                "Quantity": 5
              },
              {
                "OrderLineId": 3,
                "Quantity": 7
              }
            ]
          }
        ]
      }
    ]
  }
}

在python对象中读取json文件如下：

python_object = json.loads(raw, object_hook=lambda d: SimpleNamespace(**d))

现在，我想从 pandas 数据框中的节点“DistributionOptions”开始转换“python_object”，其中每个较低级别的属性或列表元素都重复较高级别的属性。

由于效率限制，我想防止不必要的 for 循环。到目前为止，我已经尝试过使用自定义函数的地图对象，但是我会丢失一些更高级别的属性。

关于如何使用（嵌套）对象列表将这个 python 对象（或直接 json）展平/映射到以下数据帧中的任何想法？

OptionId	Location	OrderlineId	Quantity
1	1	0	2
1	1	1	4
1	2	2	5
1	2	3	7
2	3	0	2
2	3	1	4
2	4	2	5
2	4	3	7

【问题讨论】：

你分享了整个 json 文件的示例 json 吗？
此 json 中的任何列表都可能包含任意数量的元素。除此之外，我还删除了一些当前不在范围内但项目稍后可能需要的节点。

标签： python json pandas mapping

【解决方案1】：

你可能想试试这个。

# path to your json file
p = Path(r'path/to/your/file.json')

# read json
with p.open('r', encoding='utf-8') as f:
    jsonRecord = json.loads(f.read())

# Extract `DistributionOptions` since you need to start from the node 'DistributionOptions'
data= jsonRecord["Container"]["DistributionOptions"]
# json_normalize
df = pd.json_normalize(data, record_path=['OptionSet','Lines'], meta=['OptionId',['OptionSet','Location', 'Number']], errors='ignore')

输出：

         OrderLineId  Quantity  OptionId OptionSet.Location.Number
0            0         2        1                         1
1            1         4        1                         1
2            2         5        1                         2
3            3         7        1                         2
4            0         2        2                         3
5            1         4        2                         3
6            2         5        2                         4
7            3         7        2                         4

但是，如果您的 json 中有多个 ID Container 对，则需要采用不同的方法。

【讨论】：