熊猫'read_json'没有按预期工作答案

【问题标题】：Pandas 'read_json' not working as expected熊猫'read_json'没有按预期工作
【发布时间】：2021-05-07 12:29:37
【问题描述】：

我想用 pandas 加载一个 JSON 文件，但它没有像我预期的那样工作！我提到了this stackoverflow 的答案，但我的问题不是那个。 JSON 文件如下所示：

加载文件的代码：-

import pandas as pd
df = pd.read_json("BrowserHistory.json")
print(df)

输出：-

Output Pandas Dataframe

但我不希望只有 1 列包含每个 json 元素。我想要6列即 'favicon_url', 'page_transition', 'title', 'url', 'client_id' 和 'time_usec' 如上面'json file' 的照片所描述，然后每一列都应该包含它在每个元素中的值。

像这样：

favicon url   page_transition   title   url   client_id   time_user
    .                .            .      .        .           .
    .                .            .      .        .           .
    .                .            .      .        .           .
    .                .            .      .        .           .

JSON 文件：

{
    "Browser History": [
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com/",
            "client_id": "cliendid",
            "time_usec": 1620386529857946
},
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com/",
            "client_id": "cliendid",
            "time_usec": 1620386514845201
},
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com/",
            "client_id": "cliendid",
            "time_usec": 1620386499014063
},
        {
            "favicon_url": "https://ssl.gstatic.com/ui/v1/icons/mail/rfr/gmail.ico",
            "page_transition": "LINK",
            "title": "Gmail",
            "url": "https://mail.google.com/mail/u/0/#inbox",
            "client_id": "cliendid",
            "time_usec": 1620386492788783
}
  ]
}

【问题讨论】：

请添加 JSON 文件的字符串表示以重现您的情况。
请勿发布数据图片。我们不能从图像中复制/粘贴，如果经常是导致解决方案的第一步，也不能复制...
@SergeBallesta 我已经用前 4 个 JSON 元素更新了答案！对不起
@VictorErmakov 已添加！
@Sophia，看看，当你稍微改进你的问题时添加了多少答案:)

标签： python json pandas

【解决方案1】：

问题是因为您的文件周围有{}，pandas 认为 JSON 的第一级是列，因此它仅使用浏览器历史记录作为列。您可以使用此代码来解决您的问题：

import pandas as pd
df = pd.DataFrame(json.load(open('BrowserHistory.json', encoding='cp850'))['Browser History'])
print(df)

【讨论】：

给我这个错误UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 297565: character maps to <undefined>
可以尝试更改编码，这个问题是由文件中的某些字符引起的
我将编码更改为 'utf8' 并且可以正常工作！感谢您的帮助

【解决方案2】：

由于您的对象位于 JSON 的第二级列表中，因此您无法使用 read_json 将其直接读入数据框。相反，您可以将 json 读入一个变量，然后从中创建数据框：

import pandas as pd
import json

f = open("BrowserHistory.json")
js = json.load(f)
df = pd.DataFrame(js['Browser History'])
df
#                                          favicon_url page_transition  ... client_id         time_usec
# 0                 https://www.google.com/favicon.ico            LINK  ...  cliendid  1620386529857946
# 1                 https://www.google.com/favicon.ico            LINK  ...  cliendid  1620386514845201
# 2                 https://www.google.com/favicon.ico            LINK  ...  cliendid  1620386499014063
# 3  https://ssl.gstatic.com/ui/v1/icons/mail/rfr/g...            LINK  ...  cliendid  1620386492788783

请注意，您可能需要在 open 调用中指定文件编码，例如

f = open("BrowserHistory.json", encoding="utf8")

【讨论】：

给我这个错误UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 297565: character maps to <undefined>
您可能需要指定文件编码，例如f = open("BrowserHistory.json", encoding="utf8") 或任何适合您文件的内容。