【问题标题】:Get nested JSON from pandas dataframe grouped by multiple columns从按多列分组的 pandas 数据框中获取嵌套的 JSON
【发布时间】:2021-04-09 12:32:40
【问题描述】:

我有一个熊猫数据框:

d = {'key': ['foo', 'foo', 'foo', 'foo', 'bar', 'bar', 'bar', 'bar', 'crow', 'crow', 'crow', 'crow'], 
     'date': ['2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02', '2021-01-01', '2021-01-01','2021-01-02', '2021-01-02', '2021-01-01', '2021-01-01', '2021-01-02', '2021-01-02'], 
     'class': [1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2],
     'count': [12, 3, 5, 5, 3, 1, 4, 1, 7, 3, 8, 2],
     'percent': [.8, .2, .5, .5, .75, .25, .8, .2, .7, .3, .8, .2]
}
df = pd.DataFrame(data=d)
df  

     key        date  class  count  percent
0    foo  2021-01-01      1     12     0.80
1    foo  2021-01-01      2      3     0.20
2    foo  2021-01-02      1      5     0.50
3    foo  2021-01-02      2      5     0.50
4    bar  2021-01-01      1      3     0.75
5    bar  2021-01-01      2      1     0.25
6    bar  2021-01-02      1      4     0.80
7    bar  2021-01-02      2      1     0.20
8   crow  2021-01-01      1      7     0.70
9   crow  2021-01-01      2      3     0.30
10  crow  2021-01-02      1      8     0.80
11  crow  2021-01-02      2      2     0.20

我想创建一个按 keydate 分组的嵌套 JSON 文件,其中 count: 是一个列表,其中包含当天 key 的计数总和,percent: 是包含百分比的列表班级人数超过总人数(每天需要一份包含每个班级百分比的列表)。

[
  [
    {
      "key": "foo",
      "count": [
        15,
        10
      ],
      "predictions": [
        [
          .80,
          .20
        ],
        [
          .50,
          .50,
        ]
      ]
    },
    {
      "key": "bar",
      "count": [
        4,
        5
      ],
      "predictions": [
        [
          .75,
          .25
        ],
        [
          .80,
          .20
        ]
      ]
    },
    {
      "key": "crow",
      "count": [
        10,
        10
      ],
      "predictions": [
        [
          .70,
          .30
        ],
        [
          .80,
          .20
        ]
      ]
    }
  ]
]

到目前为止,我有:

import json
dfj = dfd.groupby(["key","date"]).apply(lambda x: x.to_dict("r")).to_json(orient="records")
print(json.dumps(json.loads(dfj), indent=2, sort_keys=True))

返回:

[
  [
    {
      "class": 1,
      "count": 3,
      "date": "2021-01-01",
      "key": "bar",
      "percent": 0.75
    },
    {
      "class": 2,
      "count": 1,
      "date": "2021-01-01",
      "key": "bar",
      "percent": 0.25
    }
  ],
  [
    {
      "class": 1,
      "count": 4,
      "date": "2021-01-02",
      "key": "bar",
      "percent": 0.8
    },
    {
      "class": 2,
      "count": 1,
      "date": "2021-01-02",
      "key": "bar",
      "percent": 0.2
    }
  ],
  [
    {
      "class": 1,
      "count": 7,
      "date": "2021-01-01",
      "key": "crow",
      "percent": 0.7
    },
    {
      "class": 2,
      "count": 3,
      "date": "2021-01-01",
      "key": "crow",
      "percent": 0.3
    }
  ],
  [
    {
      "class": 1,
      "count": 8,
      "date": "2021-01-02",
      "key": "crow",
      "percent": 0.8
    },
    {
      "class": 2,
      "count": 2,
      "date": "2021-01-02",
      "key": "crow",
      "percent": 0.2
    }
  ],
  [
    {
      "class": 1,
      "count": 12,
      "date": "2021-01-01",
      "key": "foo",
      "percent": 0.8
    },
    {
      "class": 2,
      "count": 3,
      "date": "2021-01-01",
      "key": "foo",
      "percent": 0.2
    }
  ],
  [
    {
      "class": 1,
      "count": 5,
      "date": "2021-01-02",
      "key": "foo",
      "percent": 0.5
    },
    {
      "class": 2,
      "count": 5,
      "date": "2021-01-02",
      "key": "foo",
      "percent": 0.5
    }
  ]
]

任何帮助将不胜感激。谢谢。

【问题讨论】:

    标签: python json pandas dataframe


    【解决方案1】:

    你可以使用:

    d   = {'count': ('count', 'sum'), 'predictions': ('percent', list)}
    g   = df.groupby(['key', 'date']).agg(**d).groupby(level=0).agg(list)
    dct = [{'key': k, **v} for k, v in g.to_dict('i').items()]
    

    详情:

    1. groupbykeydateagg 上的给定数据帧使用字典 d

    2. groupby 使用listlevel=0agg 进行步骤1 的聚合帧

    3. 最后使用 to_dictorient=index 将第 2 步中的帧转换为字典,然后使用字典推导将 key 变量添加到字典中。

    结果:

    [{'key': 'bar', 'count': [4, 5], 'predictions': [[0.75, 0.25], [0.8, 0.2]]},
     {'key': 'crow', 'count': [10, 10], 'predictions': [[0.7, 0.3], [0.8, 0.2]]},
     {'key': 'foo', 'count': [15, 10], 'predictions': [[0.8, 0.2], [0.5, 0.5]]}]
    

    【讨论】:

      猜你喜欢
      • 2020-08-30
      • 2021-12-09
      • 2018-07-23
      • 2018-05-25
      • 1970-01-01
      • 1970-01-01
      • 2019-08-20
      • 2020-12-22
      • 2016-01-25
      相关资源
      最近更新 更多