【发布时间】:2021-12-21 08:08:54
【问题描述】:
我有一个 JSON 示例,我想将其扁平化为 pandas DataFrame。我已经习惯了应用我自己编写的一些方法,但我想知道是否有更好/更短的解决方案来解决这个问题。
JSON 示例:
{
"documentName": "test1.json",
"time": "2020-10-10T08:00:00Z",
"data": [
{
"name":"john",
"scores": [
{
"event":"one",
"score":10
},
{
"event":"two",
"score":10
},
{
"event":"three",
"score":10
}
]
},
{
"name":"mary",
"scores": [
{
"event":"one",
"score":10
},
{
"event":"two",
"score":5
}
]
},
{
"name":"hope",
"scores": [
]
}
]
}
所需的输出数据帧:
| index | documentName | time | name | one | two | three |
|---|---|---|---|---|---|---|
| 0 | test1.json | 2020-10-10T08:00:00Z | john | 10 | 10 | 10 |
| 1 | test1.json | 2020-10-10T08:00:00Z | mary | 10 | 5 | Null |
| 2 | test1.json | 2020-10-10T08:00:00Z | hope | Null | Null | Null |
因此事件名称将被添加为列并相应地填充。有 4 个事件,但如果有可能动态检查数量和命名事件(因此不是固定的),那将是一个巨大的优势。
目前我使用了以下方法:
def object_to_columns(df_row,column):
if isinstance(df_row[column], dict):
for key, value in df_row[column].items():
column_name = "{}-{}".format(column.lower(), key.lower())
df_row[column_name] = value
return df_row
def list_of_objects_to_columns(df_row,column):
if isinstance(df_row[column], list):
for item in df_row[column]:
column_name = f"{item['event']}"
df_row[column_name] = item['score']
return df_row
with open("test1.json") as file:
df = pd.read_json(file)
df = df.apply(object_to_columns, column="data", axis=1)
df = df.apply(list_of_objects_to_columns, column="data-scores", axis-1)
### CODE TO REMOVE UNUSED COLUMNS AND RENAMING ###
哪些想法更好、更清洁、更快?
【问题讨论】:
-
你真的需要希望这一行吗?
-
希望可以去掉:)
标签: python json pandas dataframe