【问题标题】:Python Dataframe object converts to JSON with conditionsPython Dataframe 对象根据条件转换为 JSON
【发布时间】:2021-06-21 14:01:15
【问题描述】:

之前我遵循question 的解决方案,但后来我意识到这与我的情况不同,我想在display_rows 中显示相同as_of_datesID 的一些值JSON 文件中的部分,我有一个这样的数据框:

     as_of_date create_date   ID  value_1   count   value_3
0    02/03/2021 02/03/2021  12345   5         2      55
1    02/03/2021 01/03/2021  12345   8         2      55
2    02/03/2021 01/03/2021  34567   9         1      66
3    02/03/2021 02/03/2021  78945   9         1      77
4    03/03/2021 02/03/2021  78945   9         1      22
5    03/03/2021 02/03/2021  12345   5         1      33

其中count列是相同IDas_of_date的行数,例如as_of_date=02/03/2021ID=12345有两行(每行有不同的create_date但是我不关心create_date),所以前两行的count 是相同的:2。

预期的 JSON 是:

{
    "examples": [
        {
            "Id": 12345,
            "as_of_date": "2021-03-02 00:00:00", # this field is datetime format
            "value_3": 55, 
            "count": 2,    # for the same 'ID=12345'&'as_of_date=02/03/2021'
            "display_rows": [
                {
                    "value_1": 5,
                    "type": "int" # 'type' field will always be 'int'
                },
                {
                    "value_1": 8,
                    "type": "int"
                }
            ]
        },
        {
            "Id": 34567,
            "as_of_date": "2021-03-02 00:00:00",
            "value_3": 66,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 9,
                    "type": "int"
                }
            ]
        },
        {
            "Id": 78945,
            "as_of_date": "2021-03-02 00:00:00",
            "value_3": 77,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 9,
                    "type": "int" 
                }
            ]
        },
        {
            "Id": 78945,
            "as_of_date": "2021-03-03 00:00:00",
            "value_3": 22,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 9,
                    "type": "int" 
                }
            ]
        },
        {
            "Id": 12345,
            "as_of_date": "2021-03-03 00:00:00",
            "value_3": 33,
            "count": 1,
            "display_rows": [
                {
                    "value_1": 5,
                    "type": "int" 
                }
            ]
        }
    ]
}

我花了将近一整天的时间才弄清楚,但似乎没有用...有人可以帮忙吗?谢谢。

【问题讨论】:

    标签: python arrays json pandas dataframe


    【解决方案1】:

    使用 GroupBy.apply 和 lambda 函数来处理 value_1 列,如:

    import json
    
    df['as_of_date'] = pd.to_datetime(df['as_of_date'], dayfirst=True, errors='coerce')
    
    
    f = lambda x: [ {"value_1": y, "type": "int" } for y in x]
    df = (df.groupby(['as_of_date','ID','value_3','count'])['value_1']
            .apply(f)
            .reset_index(name='display_rows'))
    print (df)
      as_of_date     ID  value_3  count  \
    0 2021-03-02  12345       55      2   
    1 2021-03-02  34567       66      1   
    2 2021-03-02  78945       77      1   
    3 2021-03-03  12345       33      1   
    4 2021-03-03  78945       22      1   
    
                                            display_rows  
    0  [{'value_1': 5, 'type': 'int'}, {'value_1': 8,...  
    1                    [{'value_1': 9, 'type': 'int'}]  
    2                    [{'value_1': 9, 'type': 'int'}]  
    3                    [{'value_1': 5, 'type': 'int'}]  
    4                    [{'value_1': 9, 'type': 'int'}]  
    
    j = json.dumps({"examples":df.to_dict(orient='records')}, default=str)
    

    print (j)
    {"examples": [{"as_of_date": "2021-03-02 00:00:00", "ID": 12345, "value_3": 55, "count": 2, "display_rows": [{"value_1": 5, "type": "int"}, {"value_1": 8, "type": "int"}]}, {"as_of_date": "2021-03-02 00:00:00", "ID": 34567, "value_3": 66, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}, {"as_of_date": "2021-03-02 00:00:00", "ID": 78945, "value_3": 77, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}, {"as_of_date": "2021-03-03 00:00:00", "ID": 12345, "value_3": 33, "count": 1, "display_rows": [{"value_1": 5, "type": "int"}]}, {"as_of_date": "2021-03-03 00:00:00", "ID": 78945, "value_3": 22, "count": 1, "display_rows": [{"value_1": 9, "type": "int"}]}]}
    

    编辑:

    #added some another column
    df['value_7'] = 52
    print (df)
       as_of_date create_date     ID  value_1  count  value_3  value_7
    0  02/03/2021  02/03/2021  12345        5      2       55       52
    1  02/03/2021  01/03/2021  12345        8      2       55       52
    2  02/03/2021  01/03/2021  34567        9      1       66       52
    3  02/03/2021  02/03/2021  78945        9      1       77       52
    4  03/03/2021  02/03/2021  78945        9      1       22       52
    5  03/03/2021  02/03/2021  12345        5      1       33       52
    
    #added type column for last value in dict
    df = (df.assign(type='int')
            .groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','type']]
            .apply(lambda x:  x.to_dict('records'))
            .reset_index(name='display_rows'))
    print (df)
       as_of_date     ID  value_3  count  \
    0  02/03/2021  12345       55      2   
    1  02/03/2021  34567       66      1   
    2  02/03/2021  78945       77      1   
    3  03/03/2021  12345       33      1   
    4  03/03/2021  78945       22      1   
    
                                            display_rows  
    0  [{'value_1': 5, 'value_7': 52, 'type': 'int'},...  
    1     [{'value_1': 9, 'value_7': 52, 'type': 'int'}]  
    2     [{'value_1': 9, 'value_7': 52, 'type': 'int'}]  
    3     [{'value_1': 5, 'value_7': 52, 'type': 'int'}]  
    4     [{'value_1': 9, 'value_7': 52, 'type': 'int'}]  
    
    j = json.dumps({"examples":df.to_dict(orient='records')}, default=str)
    

    编辑:

    df = (df.assign(example_placeholder='xyz')
            .groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','example_placeholder']]
            .apply(lambda x:  x.to_dict('records'))
            .reset_index(name='display_rows'))
    print (df)
       as_of_date     ID  value_3  count  \
    0  02/03/2021  12345       55      2   
    1  02/03/2021  34567       66      1   
    2  02/03/2021  78945       77      1   
    3  03/03/2021  12345       33      1   
    4  03/03/2021  78945       22      1   
    
                                            display_rows  
    0  [{'value_1': 5, 'value_7': 52, 'example_placeh...  
    1  [{'value_1': 9, 'value_7': 52, 'example_placeh...  
    2  [{'value_1': 9, 'value_7': 52, 'example_placeh...  
    3  [{'value_1': 5, 'value_7': 52, 'example_placeh...  
    4  [{'value_1': 9, 'value_7': 52, 'example_placeh...  
    

    df = (df.assign(aa='xyz', type='int')
            .groupby(['as_of_date','ID','value_3','count'])[['value_1', 'value_7','aa', 'type']]
            .apply(lambda x:  x.to_dict('records'))
            .reset_index(name='display_rows'))
    print (df)
    
       as_of_date     ID  value_3  count  \
    0  02/03/2021  12345       55      2   
    1  02/03/2021  34567       66      1   
    2  02/03/2021  78945       77      1   
    3  03/03/2021  12345       33      1   
    4  03/03/2021  78945       22      1   
    
                                            display_rows  
    0  [{'value_1': 5, 'value_7': 52, 'aa': 'xyz', 't...  
    1  [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...  
    2  [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...  
    3  [{'value_1': 5, 'value_7': 52, 'aa': 'xyz', 't...  
    4  [{'value_1': 9, 'value_7': 52, 'aa': 'xyz', 't...  
    

    【讨论】:

    • 嗨,谢谢,只是一个快速的后续问题,如果我有多个列,如 value_1 需要添加到 display_rows 部分?我试过了:df = (df.groupby(['as_of_date','ID','value_3','count'])['value_1', 'value_7', 'value_8'] .apply(lambda x: [ {"value_1": a,'value_7' : b, 'value_8': c, "type": "int" } for a, b, c in x]) .reset_index(name='display_rows'))这给了我错误.apply(lambda x: [{ ValueError: too many values to unpack (expected 3)
    • 谢谢,如果我将'type='int' 更改为其他字符串,例如.assign(example_placeholder = 'xyz',它似乎不起作用
    • @Cecilia - example_placeholder [['value_1', 'value_7','example_placeholder']] 这样在groupby 之后添加?
    • 我知道问题所在,我应该使用records 而不是record,这是一个愚蠢的错误。
    • @Cecilia - 永远不要这样做,所以不知道。也许尝试为这个问题寻找一些解决方案或发布问题。
    猜你喜欢
    • 1970-01-01
    • 2018-04-05
    • 2022-01-13
    • 1970-01-01
    • 2019-01-05
    • 2018-10-04
    • 1970-01-01
    • 2016-04-24
    • 2018-03-11
    相关资源
    最近更新 更多