如何使用包含字典的字符串规范嵌套 JSON？答案

【问题标题】：How to normalize nested JSON with strings consisting dictionary?如何使用包含字典的字符串规范嵌套 JSON？
【发布时间】：2020-07-14 00:43:54
【问题描述】：

我想用包含另一个字典的字符串从嵌套 JSON 规范化和创建数据帧。我已经试过了

with open('/content/drive/My Drive/conversation_data.json', 'r') as f:
  data = json.load(f)

table = pd.json_normalize(data, 'conversations')
table

但它返回按行分隔的所有单个字符串。如何返回带有 conversation_id、author_id 等的数据框表？

这是 JSON：

[
  {
    "data_loaded": "2019-12-21 12:00:22.189441 UTC",
    "ticket_id": "222815",
    "ticket_created_at": "2019-12-21T12:07:52Z",
    "conversations": "{\"conversations\":[{\"conversation_id\":\"866229422292\",\"author_id\":\"391349919632\",\"body\":\"==========Write below this ...\",\"created_at\":\"2019-12-21T12:07:52Z\",\"via_channel\":\"email\"}]}"
  }
]

【问题讨论】：

标签： python json pandas dataframe nested

【解决方案1】：

在下面试试这个：

data = [
  {
    "data_loaded": "2019-12-21 12:00:22.189441 UTC",
    "ticket_id": "222815",
    "ticket_created_at": "2019-12-21T12:07:52Z",
    "conversations": "{\"conversations\":[{\"conversation_id\":\"866229422292\",\"author_id\":\"391349919632\",\"body\":\"==========Write below this ...\",\"created_at\":\"2019-12-21T12:07:52Z\",\"via_channel\":\"email\"}]}"
  }
]

conversations = json.loads(data[0]['conversations'])

table = pd.json_normalize(conversations, 'conversations')
print(table)

【讨论】：

【解决方案2】：

字符串本身似乎是一个 JSON 片段。它实际上不包含那些反斜杠（这些是字符串表示打印的一部分），所以您需要做的就是将其反馈给 JSON 解析器。

json.load 和 json.dump 用于文件；对字符串进行操作的对应函数是json.loads 和json.dumps（用“s”表示“s”字符串）。

例如：

# pull out the embedded JSON string from the parsed JSON, then re-parse it
conversations = json.loads(data[0]["conversations"])

【讨论】：