【发布时间】:2021-04-18 17:26:38
【问题描述】:
给定以下示例数据(10 条记录):
test_df = pd.DataFrame({"PN_id": ["745d626b", "745d626b", "fce503fb", "df3d727e", "df3d727e", "56c00531", "72ebb2b3", "5d1bc5d3", "72ebb2b3", "5c32fc8a", "5c32fc8a"],
"PN_raw": ['{"audience":{"and":[{"segment":"67537044-27db-4a0b-b5b7-362c9c5b2ba7"},{"tag":"BR","group":"ua_locale_country"},{"tag":"90_P******_BR","group":"******_CRM"}]}}',
'{"audience":{"and":[{"segment":"67537044-27db-4a0b-b5b7-362c9c5b2ba7"},{"tag":"BR","group":"ua_locale_country"},{"tag":"90_P******_BR","group":"******_CRM"}]}}',
'{"audience":{"and":[{"and":[{"segment":"850c8d94-1236-45a1-93fc-08b0337b4059"}]},{"and":[{"tag":"All_S****_ES","group":"******_CRM"}]}]}}',
'{"audience":{"and":[{"segment":"67537044-27db-4a0b-b5b7-362c9c5b2ba7"},{"tag":"BR","group":"ua_locale_country"},{"tag":"All_S*****_BR","group":"******_CRM"}]}}',
'{"audience":{"and":[{"segment":"67537044-27db-4a0b-b5b7-362c9c5b2ba7"},{"tag":"BR","group":"ua_locale_country"},{"tag":"All_S*****_BR","group":"******_CRM"}]}}',
'{"audience":{"and":[{"and":[{"segment":"850c8d94-1236-45a1-93fc-08b0337b4059"}]},{"and":[{"tag":"All_S****_ES","group":"******_CRM"}]}]}}',
'{"audience":{"and":[{"segment":"67537044-27db-4a0b-b5b7-362c9c5b2ba7"},{"tag":"BR","group":"ua_locale_country"},{"tag":"P_90_or_S_90_BR","group":"******_CRM"}]}}',
'{"audience":{"and":[{"segment":"67537044-27db-4a0b-b5b7-362c9c5b2ba7"},{"tag":"P_90_or_S_90_ESLA","group":"******_CRM"}]}}',
'{"audience":{"and":[{"segment":"67537044-27db-4a0b-b5b7-362c9c5b2ba7"},{"tag":"BR","group":"ua_locale_country"},{"tag":"P_90_or_S_90_BR","group":"******_CRM"}]}}',
'{"audience":{"and":[{"and":[{"segment":"850c8d94-1236-45a1-93fc-08b0337b4059"}]},{"and":[{"tag":"P_90_or_S_90_ES","group":"******_CRM"}]}]}}',
'{"audience":{"and":[{"and":[{"segment":"850c8d94-1236-45a1-93fc-08b0337b4059"}]},{"and":[{"tag":"P_90_or_S_90_ES","group":"******_CRM"}]}]}}']})
我怎样才能实现以下所需的输出? (在同一个 DF 中或在单独的 DF 中,我认为这很可能):
test_df_desired = pd.DataFrame({"PN_id":["745d626b", "745d626b", "fce503fb", "df3d727e", "df3d727e", "56c00531", "72ebb2b3", "5d1bc5d3", "72ebb2b3", "5c32fc8a", "5c32fc8a"],
"segment":["67537044-27db-4a0b-b5b7-362c9c5b2ba7", "67537044-27db-4a0b-b5b7-362c9c5b2ba7", "850c8d94-1236-45a1-93fc-08b0337b4059", "67537044-27db-4a0b-b5b7-362c9c5b2ba7", "67537044-27db-4a0b-b5b7-362c9c5b2ba7", "850c8d94-1236-45a1-93fc-08b0337b4059", "67537044-27db-4a0b-b5b7-362c9c5b2ba7", "67537044-27db-4a0b-b5b7-362c9c5b2ba7", "67537044-27db-4a0b-b5b7-362c9c5b2ba7", "850c8d94-1236-45a1-93fc-08b0337b4059", "850c8d94-1236-45a1-93fc-08b0337b4059"],
"tag_1":["BR", "BR", "All_S****_ES", "BR", "BR", "All_S****_ES", "BR", "P_90_or_S_90_ESLA", "BR", "P_90_or_S_90_ES", "P_90_or_S_90_ES"],
"group_1":["ua_locale_country", "ua_locale_country", "******_CRM", "ua_locale_country", "ua_locale_country", "******_CRM", "ua_locale_country", "******_CRM", "ua_locale_country", "******_CRM", "******_CRM"],
"tag_2":["90_P******_BR", "90_P******_BR", np.nan, "All_S*****_BR", "All_S*****_BR", np.nan, "P_90_or_S_90_BR", np.nan, "P_90_or_S_90_BR", np.nan, np.nan],
"group_2":["******_CRM", "******_CRM", np.nan, "******_CRM", "******_CRM", np.nan, "******_CRM", np.nan, "******_CRM", np.nan, np.nan]})
到目前为止,使用pd.json_normalize(test_df["PN_raw"].apply(ast.literal_eval), record_path = ["audience", "and"]),我已经设法取消了dict路径结构为audience -> and的记录,但对于路径为audience -> and -> and的记录,这不起作用,我也不能破解我的方式在它周围添加我认为可以工作的record_path = ["audience", "and", "and"]。我认为这需要循环遍历系列并根据是否包含一个或两个“and”s 应用不同的函数来解决
当前输出不仅在上面提到的“失败”,而且还有将数据“转置”到正确行的问题(如果您在上面运行该行,您会明白我的意思)。
【问题讨论】:
标签: python json pandas normalize