Python：json_normalize 一个熊猫系列给出了 TypeError答案

【问题标题】：Python: json_normalize a pandas series gives TypeErrorPython：json_normalize 一个熊猫系列给出了 TypeError
【发布时间】：2018-01-01 16:09:29
【问题描述】：

我在pandas系列df["json"]987654321@

[{
    'IDs': [{
        'lotId': '1',
        'Id': '123456'
    }],
    'date': '2009-04-17',
    'bidsCount': 2,
}, {
    'IDs': [{
        'lotId': '2',
        'Id': '123456'
    }],
    'date': '2009-04-17',
    'bidsCount': 4,
}, {
    'IDs': [{
         'lotId': '3',
         'Id': '123456'
    }],
    'date': '2009-04-17',
    'bidsCount': 8,
}]

原始文件示例：

{"type": "OPEN","title": "rainbow","json": [{"IDs": [{"lotId": "1","Id": "123456"}],"date": "2009-04-17","bidsCount": 2,}, {"IDs": [{"lotId": "2","Id": "123456"}],"date": "2009-04-17","bidsCount": 4,}, {"IDs": [{"lotId": "3","Id": "123456"}],"date": "2009-04-17","bidsCount": 8,}]}
{"type": "CLOSED","title": "clouds","json": [{"IDs": [{"lotId": "1","Id": "23345"}],"date": "2009-05-17","bidsCount": 2,}, {"IDs": [{"lotId": "2","Id": "23345"}],"date": "2009-05-17","bidsCount": 4,}, {"IDs": [{"lotId": "3","Id": "23345"}],"date": "2009-05-17","bidsCount": 8,}]}


df = pd.read_json("file.json", lines=True)

我正在尝试将它们变成一个数据框，例如

Id      lotId      bidsCount    date
123456  1          2            2009-04-17
123456  2          4            2009-04-17
123456  3          8            2009-04-17

通过使用

json_normalize(df["json"])

但是我得到了

AttributeError: 'list' object has no attribute 'values'

我猜 json sn-p 被视为一个列表，但是我不知道如何使它工作。感谢您的帮助！

【问题讨论】：

如何先创建df？
请在此处粘贴您的数据框头部。您的 jsons 列是字符串吗？
zufanka 首先正如文档所说，df['jsons'] 应该是一个字典或字典列表。然后你可以像这样result = json_normalize(data, 'IDs', ['date', 'bidsCount']) 得到你想要的结果。我在回答中做了同样的事情，不知道为什么人们喜欢投票。希望这会有所帮助
我通过 pd.read_json("file.json", lines=True) 从一个巨大的 json 文件创建 df。 json 列是文件嵌套部分之一，而不是字符串。我可以尝试重新创建文件，因为如果有帮助，数据是机密的。
祖凡卡，是的。只需键入（df ['json']）以确保它是一个字典，或与 json_normalize（）一起使用的字典列表。如果您能说出您是如何创建 df['json'] 的，那么它会有所帮助。您不需要重新创建整个数据，只需一个样本就可以了。

标签： python json pandas attributeerror normalize

【解决方案1】：

我认为您的 df['json'] 是一个嵌套列表。您可以使用 for 循环并连接数据框以获取大数据框，即

数据：

{"type": "OPEN","title": "rainbow","json": [{"IDs": [{"lotId": "1","Id": "123456"}],"date": "2009-04-17","bidsCount": 2,}, {"IDs": [{"lotId": "2","Id": "123456"}],"date": "2009-04-17","bidsCount": 4,}, {"IDs": [{"lotId": "3","Id": "123456"}],"date": "2009-04-17","bidsCount": 8,}]}
{"type": "CLOSED","title": "clouds","json": [{"IDs": [{"lotId": "1","Id": "23345"}],"date": "2009-05-17","bidsCount": 2,}, {"IDs": [{"lotId": "2","Id": "23345"}],"date": "2009-05-17","bidsCount": 4,}, {"IDs": [{"lotId": "3","Id": "23345"}],"date": "2009-05-17","bidsCount": 8,}]}

df = pd.read_json("file.json", lines=True)

数据帧：

new_df = pd.concat([pd.DataFrame(json_normalize(x)) for x in df['json']],ignore_index=True)

输出：

IDs bidsCount 日期 0 [{'Id'：'123456'，'lotId'：'1'}] 2 2009-04-17 1 [{'Id'：'123456'，'lotId'：'2'}] 4 2009-04-17 2 [{'Id'：'123456'，'lotId'：'3'}] 8 2009-04-17 3 [{'Id'：'23345'，'lotId'：'1'}] 2 2009-05-17 4 [{'Id'：'23345'，'lotId'：'2'}] 4 2009-05-17 5 [{'Id'：'23345'，'lotId'：'3'}] 8 2009-05-17

如果您希望 ID 的键作为列，那么您可以使用

new_df['lotId'] = [x[0]['lotId'] for x in new_df['IDs']]
new_df['IDs'] = [x[0]['Id'] for x in new_df['IDs']]

IDs bidsCount 日期 lotId 0 123456 2 2009-04-17 1 1 123456 4 2009-04-17 2 2 123456 8 2009-04-17 3 3 23345 2 2009-05-17 1 4 23345 4 2009-05-17 2 5 23345 8 2009-05-17 3

【讨论】：

正是我需要的，非常感谢！只需要添加df['json'].dropna()，因为缺少一些数据。
很高兴它有帮助！
有什么更有效的方法吗？