将带有列表和 dics 的 JSON 转换为数据框答案

【问题标题】：Convert JSON with Lists and dics into dataframe将带有列表和 dics 的 JSON 转换为数据框
【发布时间】：2019-09-12 02:55:49
【问题描述】：

我想将 JSON 文件转换为易于搜索和定位有用信息的数据框。 JSON 如下所示：

[
    {
        "testId": "test1",
        "testType": [
            {
                "value": "a",
                "startDate": "2019-01-01T08:00:00",
                "endDate": "2029-01-01T08:00:00"
            }
        ],
        "candidate": [
            {
                "value": {
                    "id": "b",
                    "name": "test"
                },
                "startDate": "2019-01-01T08:00:00",
                "endDate": "2029-01-01T08:00:00"
            }
        ],
        "testsite": [
            {
                "value": "c",
                "startDate": "2019-01-01T08:00:00",
                "endDate": "2029-01-01T08:00:00"
            }
        ]
    },
    {
        "testId": "test2",
        "testType": [
            {
                "value": "SG",
                "startDate": "2019-01-01T08:00:00",
                "endDate": "2029-01-01T08:00:00"
            }
        ],
        "maxcandidates": [
            {
                "value": {
                    "amount": "75"
                },
                "startDate": "2019-01-01T08:00:00",
                "endDate": "2029-01-01T08:00:00"
            }
        ],
        "candidate": [
            {
                "value": {
                    "id": "sei",
                    "name": "long island Limited"
                },
                "startDate": "2019-01-01T08:00:00",
                "endDate": "2029-01-01T08:00:00"
            }
        ],
        "mincandidates": [
            {
                "value": {
                    "amount": "5"
                },
                "startDate": "2018-04-01T08:00:00",
                "endDate": "2029-01-01T08:00:00"
            }
        ],
        "testSite": [
            {
                "value": "5227",
                "startDate": "2018-04-01T08:00:00",
                "endDate": "2029-01-01T08:00:00"
            }
        ]
    }
]

它是 json 文件的一部分。这个 JSON 文件包含列表，一些属性包含字典。 1 标准化这些数据的最有效方法是什么？ 2 如果我想在整个 JSON 文件中将“testType”转换为包含“testId”元数据的数据框，我该怎么做？

我用这个命令作为

import json
import pandas as pd
from pandas.io.json import json_normalize

with open('test.json') as f:
    d=json.load(f)

type=json_normalize(data=d[:]['testType'], 
                            meta=['testId'])

它出现了 TypeError: list indices must be integers or slices, not str

或者如果我使用

import json
import pandas as pd
from pandas.io.json import json_normalize

with open('test.json') as f:
    d=json.load(f)

type=json_normalize(data=d[0]['testType'], 
                            meta=['testId'])

我可以将它转换为数据框，但它只能给我数组中的第一个元素而不是所有元素。

【问题讨论】：

您的预期输出是什么？可以展示一下吗？
谢谢。我的预期输出是 json_normalize(data=d[0]['testType'], meta=['testId']) 但包含所有行的聚合。
基本上，我想实现代码进入json文件，获取所有'testType'，将它们转换为数据框并附加'testId'
@Allen 你好艾伦，我怎样才能得到所有而不是第一条记录？成本仅适用于第一条记录。 d[0] 但是， type=json_normalize(data=d[:]['testType'], meta=['testId']) 失败。
好的，刚刚发布了一个答案。请让我知道您是否想要。

标签： python json pandas dataframe

【解决方案1】：

您可能可以这样做：

pd.concat([json_normalize(data=e['testType'], meta=['testId']) for e in d])

value   startDate           endDate
0   a   2019-01-01T08:00:00 2029-01-01T08:00:00
0   SG  2019-01-01T08:00:00 2029-01-01T08:00:00

【讨论】：

谢谢艾伦。 JSON文件的大小重要吗？我将上面的 json 对象保存到 'test.json' 并运行代码， open('test.json') as f: d=json.load(f) json_normalize(data=d[:],record_path='testType ',meta=['testId']) 没有错误信息。但是当我对我的原始 json 运行相同的代码（有超过 100 条记录）时，它会失败并出现错误：KeyError: 'testType'
当我运行 testype=json_normalize(data=test[0:20], record_path='testType', meta=['testId']) 就可以了。但是当我将它增加到 50 当我运行 testype=json_normalize(data=test[0:50], record_path='testType', meta=['testId']) 时，会出现同样的错误。