熊猫嵌套 JSON - id 作为列名，值作为值答案

【问题标题】：Pandas Nested JSON - id as column name, value as value熊猫嵌套 JSON - id 作为列名，值作为值
【发布时间】：2018-11-12 09:55:54
【问题描述】：

JSON 示例

{
    "tickets": [
        {
            "url": "https://domain.zendesk.com/api/v2/tickets/10001.json",
            "id": 10001,
            "custom_fields": [
                {
                    "id": 360007982393,
                    "value": "Some Value"
                },
                {
                    "id": 360008063134,
                    "value": "Foo"
                },
                {
                    "id": 360007982273,
                    "value": "Bar"
                },
                {
                    "id": 360007982293,
                    "value": null
                }
            ],
            "satisfaction_rating": null
        },
        {
            "url": "https://domain.zendesk.com/api/v2/tickets/10002.json",
            "id": 10002,
            "custom_fields": [
                {
                    "id": 360007982393,
                    "value": "Another value"
                },
                {
                    "id": 360008063134,
                    "value": "Bar"
                },
                {
                    "id": 360007982273,
                    "value": "Foo"
                },
                {
                    "id": 360007982293,
                    "value": null
                }
            ],
            "satisfaction_rating": null
        }
    ],
    "count": 2,
    "next_page": "https://domain.zendesk.com/api/v2/incremental/tickets.json?start_time=1541167467",
    "end_time": 1541167467
}

Python 示例

json = << Above JSON >>
tickets_json = json['tickets']
result = json_normalize(data=tickets_json, sep='_')

df=pd.DataFrame(result)

查询说明

因此，通过上述方式，我得到了一个数据框：

url、id、custom_fields、satisfaction_rating

Custom_Fields 是我苦苦挣扎的地方，因为我需要得到：

url、id、custom_fields_360007982393、custom_fields_360008063134、custom_fields_360007982273、custom_fields_360007982293、满意评级

或与上述类似，实际上我需要自定义值中的 id 成为主数据框中列的一部分或成为列的名称。

我曾尝试将 record_path 与 meta 一起使用，但这会将数据框翻转为我在此处尝试实现的不可用格式。我尝试拉出 custom_fields 然后将其附加回来，但我只能找到一个随机数作为列名，然后使用相同的 id、值对作为每一行的值。

此数据将导入 MySQL 并用于根据 ID 报告值。在大多数情况下，custom_fields 的顺序相同，但我不能确定它们会永远保持这种状态。

JSON 来自 ZenDesk API (https://developer.zendesk.com/rest_api)

输出目标：

url, id, 360007982393, 360008063134, 360007982273, 360007982293, satisfaction_rating
"https:.." , 10001, "Some Value", "Foo", "Bar", null, null
"https:.." , 10002, "Another value", "Bar", "Foo", null, null

【问题讨论】：

标签： python-3.x pandas

【解决方案1】：

为上面的json格式写一个自定义解析器函数怎么样？在将一张票转换为可与json_normalize 一起使用的“平面”json 的脚本下方：

def parseCustoms(input):
    out = {'url': input['url'],
           'id': input['id'],
           'satisfaction_rating': input['satisfaction_rating']}
    cust_fields = [(str(x['id']),x['value']) for x in input['custom_fields']]
    for field in cust_fields:
        out['cf_' + field[0]] = field[1]
    return out

你会像这样创建你的解析数组

parsed_tickets = [parseCustoms(ticket) for ticket in tickets_json]

现在，json_normalize 将按预期运行

result = json_normalize(parsed_tickets)

【讨论】：

先生，您是完美的冠军。现在我可以完成脚本的其余部分了。
不客气。很高兴我能提供帮助。祝你的项目好运:)