【问题标题】:Pandas: output csv data to nested json with sortingPandas:通过排序将 csv 数据输出到嵌套 json
【发布时间】:2021-04-20 02:08:14
【问题描述】:

鉴于下面的示例 CSV 数据,在 pandas DataFrame 中,我如何输出 to_json 如下

as_of
  category
     type
        subtype
           log: [
            #sorted by timestamp
            {timestamp: 1618879229, action: add, stale_timestamp: true},
            {timestamp: 1608879229, action: remove, stale_timestamp: None},
           ]

20210415
  apples
     A
        big
           log: [
            {timestamp: 1618879229, action: add, stale_timestamp: None},
           ]
        small
           log: [
            {timestamp: 1618879229, action: add, stale_timestamp: None},
            {timestamp: 1608879229, action: remove, stale_timestamp: None},
            {timestamp: 1518879229, action: add, stale_timestamp: None},
           ]
     B
        big
           log: [
            {timestamp: 1618879229, action: add, stale_timestamp: None},
           ]

如果您还可以帮助我从嵌套的 json 中返回 DataFrame,那么奖励!

as_of category type sub_type action timestamp stale_timestamp
20210415 apples A big add 1618879229.6703315
20210415 apples A small add 1618879229.6703315
20210415 apples B big add 1618879229.6703315
20210415 apples B small add 1618879229.6703315
20210415 apples C big add 1618879229.6703315
20210415 apples C small add 1618879229.6703315
202103 oranges sweet add 1616892142.6703315
202103 oranges sweet remove 1616632942.6703315
202103 oranges sweet add 1616200942.6703315
202103 grapes sweet add 1616200942.6703315
202102 oranges sweet add 1616200942.6703315
202102 grapes sweet add 1616200942.6703315
20210115 apples A big add 1611103342.6703315
20210115 apples A small add 1611103342.6703315
20210115 apples B big add 1611103342.6703315
20210115 apples B small add 1611103342.6703315
20210115 apples C big add 1611103342.6703315
20210115 apples C small add 1611103342.6703315
202101 oranges sweet add 1608424942.6703315
202101 grapes sweet add 1608424942.6703315
202012 oranges sweet add 1608424942.6703315
202012 grapes sweet add 1608424942.6703315
202011 oranges sweet add 1608424942.6703315
202011 grapes sweet add 1608424942.6703315
20201015 apples A big add 1608424942.6703315 True
20201015 apples A small add 1608424942.6703315 True
20201015 apples B big add 1608424942.6703315 True
20201015 apples B small add 1608424942.6703315 True
20201015 apples C big add 1608424942.6703315 True
20201015 apples C small add 1608424942.6703315 True
202010 oranges sweet add 1608424942.6703315 True
202010 grapes sweet add 1608424942.6703315 True

【问题讨论】:

  • 这实际上是 Pandas 的 .to_json 函数中的自定义 orient。提供的表格作为 CSV 会更有用。

标签: pandas csv to-json


【解决方案1】:

首先我将表格转换为 CSV:

as_of,category,type,sub_type,action,timestamp,stale_timestamp
20210415,apples,A,big,add,1618879230,
20210415,apples,A,small,add,1618879230,
20210415,apples,B,big,add,1618879230,
20210415,apples,B,small,add,1618879230,
20210415,apples,C,big,add,1618879230,
20210415,apples,C,small,add,1618879230,
202103,oranges,sweet,,add,1616892143,
202103,oranges,sweet,,remove,1616632943,
202103,oranges,sweet,,add,1616200943,
202103,grapes,sweet,,add,1616200943,
202102,oranges,sweet,,add,1616200943,
202102,grapes,sweet,,add,1616200943,
20210115,apples,A,big,add,1611103343,
20210115,apples,A,small,add,1611103343,
20210115,apples,B,big,add,1611103343,
20210115,apples,B,small,add,1611103343,
20210115,apples,C,big,add,1611103343,
20210115,apples,C,small,add,1611103343,
202101,oranges,sweet,,add,1608424943,
202101,grapes,sweet,,add,1608424943,
202012,oranges,sweet,,add,1608424943,
202012,grapes,sweet,,add,1608424943,
202011,oranges,sweet,,add,1608424943,
202011,grapes,sweet,,add,1608424943,
20201015,apples,A,big,add,1608424943,TRUE
20201015,apples,A,small,add,1608424943,TRUE
20201015,apples,B,big,add,1608424943,TRUE
20201015,apples,B,small,add,1608424943,TRUE
20201015,apples,C,big,add,1608424943,TRUE
20201015,apples,C,small,add,1608424943,TRUE
202010,oranges,sweet,,add,1608424943,TRUE
202010,grapes,sweet,,add,1608424943,TRUE

缺少的条目会导致 JSON 稍后出现问题。这需要在输入文件或 Python 转换器中修复。此外,某些日期似乎缺少字符。

由于 Pandas 中没有适合此要求的预定义 orient 选项,我编写了一个自定义字典,然后将字典转换为 JSON。

import pandas
import json
df = pandas.read_csv('sheet1.csv',header=None, dtype=str)

mydic = {}
for unique_col0 in df[0].unique():
    mydic[unique_col0] = {}
    
    sub_df = df[df[0]==unique_col0]
    for unique_col1 in sub_df[1].unique():
        mydic[unique_col0][unique_col1] = {}
        
        sub_sub_df = sub_df[sub_df[1]==unique_col1]
        for unique_col2 in sub_sub_df[2].unique():
            mydic[unique_col0][unique_col1][unique_col2] = {}
            
            sub_sub_sub_df = sub_sub_df[sub_sub_df[2]==unique_col2]
            for unique_col3 in sub_sub_sub_df[3].unique():
                mydic[unique_col0][unique_col1][unique_col2][unique_col3] = {'log':[]}
                
                for index in range(sub_sub_sub_df.shape[0]):
                    this_dict = {'timestamp': list(sub_sub_sub_df[5])[index],
                                 'action': list(sub_sub_sub_df[4])[index],
                                 'stale_timestamps': list(sub_sub_sub_df[6])[index]}
                    mydic[unique_col0][unique_col1][unique_col2][unique_col3]['log'].append(this_dict)
                
with open('output.json','w') as file_handle:
    json.dump(mydic,file_handle,indent=2)

提问者提供的示例输出与 Python 实现实际生成的输出不一致。

【讨论】:

    猜你喜欢
    • 2018-09-24
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-01
    • 2021-07-26
    • 2019-07-05
    • 1970-01-01
    • 2016-10-08
    相关资源
    最近更新 更多