【问题标题】:How to aggregate a pandas pivot table across subrows and subcolumns如何跨子行和子列聚合熊猫数据透视表
【发布时间】:2020-10-20 02:52:52
【问题描述】:

我在 python 中使用 pandas 来透视一些数据,我希望能够在我的透视表的各个部分执行 2 种类型的聚合。我知道我可以使用边距对所有行/列执行聚合。 但我想跨单列聚合多行(不是全部)或跨单行聚合多列。如何最好地聚合 pandas 中的子行和子列?

示例代码设置:

#Dataset
rows = [
    [1, 'Factory_1', 'crusher', 'electricity_usage', 15],
    [2, 'Factory_1', 'mixer', 'electricity_usage', 11],
    [3, 'Factory_1', 'turner', 'electricity_usage', 12],
    [4, 'Factory_2', 'crusher', 'electricity_usage', 2],
    [5, 'Factory_2', 'mixer', 'electricity_usage', 7],
    [6, 'Factory_2', 'turner', 'electricity_usage', 13],
    [7, 'Factory_1', 'crusher', 'running_hours', 6],
    [8, 'Factory_1', 'mixer', 'running_hours', 5],
    [9, 'Factory_1', 'turner', 'running_hours', 5],
    [10, 'Factory_2', 'crusher', 'running_hours', 1],
    [11, 'Factory_2', 'mixer', 'running_hours', 3],
    [12, 'Factory_2', 'turner', 'running_hours', 6]
]

dataFrame = pds.DataFrame(rows, columns=["id","Location","Type","recorded_type","value"])

#Pivot Table 1: Form multi row aggregation across a single column
ptable_1 = pds.pivot_table(data=dataFrame,index=['Location', 'Type'], columns=["recorded_type"], values=['value'])
print(ptable_1)

#Pivot Table 2: Form multi column aggregation across a single row
ptable_2 = pds.pivot_table(data=dataFrame,index=['recorded_type'], columns=["Location", "Type"], values=['value'])
print(ptable_2)

下面我尝试在单个列中跨多行聚合枢轴 1。我正在尝试汇总每个位置的所有机器记录值的总和。这可以做得更好吗?

#Form aggregation across multiple rows in a single column

df1 = ptable_1.groupby(level=[0]).sum()
df1['Type'] = ["all", "all"]
#Reset index so machine_location is removed from current index
df1.reset_index(inplace=True)
#Set multi-index of location and type
df1.set_index(['Location', 'Type'], inplace=True)
#Concat both dataframes
aggregated_table_1 = pds.concat([ptable_1.reset_index(),df1.reset_index()], ignore_index=True)
#Sort values by location, so appened table values are in the correct position
aggregated_table_1.sort_values('Location', inplace=True)

print(aggregated_table_1)

例如,我正在尝试汇总特定工厂所有机器类型的用电量。所以聚合位于类型列中,类型为“all” ptable_1 的预期输出:

+---------------+-----------+---------+-------------------+---------------+
|               | Location  |  Type   |       value       |     value     |
+---------------+-----------+---------+-------------------+---------------+
| recorded_type |           |         | electricity_usage | running_hours |
|               | Factory_1 | crusher | 15                | 6             |
|               | Factory_1 | mixer   | 11                | 5             |
|               | Factory_1 | turner  | 12                | 5             |
|               | Factory_1 | all     | 38                | 16            |
|               | Factory_2 | crusher | 2                 | 1             |
|               | Factory_2 | mixer   | 7                 | 3             |
|               | Factory_2 | turner  | 13                | 6             |
|               | Factory_2 | all     | 22                | 10            |
+---------------+-----------+---------+-------------------+---------------+

其次,我不确定如何在子列之间进行聚合,以便对 ptable_2 的每种类型的所有列进行汇总。聚合是一个新列,类型为“全部”

ptable_2 的预期输出:

+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
|     Location      | Factory_1 | Factory_1 | Factory_1 | Factory_1 | Factory_2 | Factory_2 | Factory_2 | Factory_2 |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Type              | crusher   | mixer     | turner    | all       | crusher   | mixer     | turner    | all       |
| recorded_type     |           |           |           |           |           |           |           |           |
| electricity_usage | 15        | 11        | 12        | 38        | 2         | 7         | 13        | 22        |
| running_hours     | 6         | 5         | 5         | 16        | 1         | 3         | 6         | 10        |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+

编辑 1 这是我的输出直接来自 python 应用 Serge de Gosson de Varennes 方法的 melt() 使用默认参数。我丢失了每一行的记录类型,它被一个 NaN 列替换。我应该尝试从中聚合以形成我的预期输出吗?

Df_ex1 = dfex1.melt() # Expected output 1
      NaN      recorded_type  value
0   value  electricity_usage     15
1   value  electricity_usage     11
2   value  electricity_usage     12
3   value  electricity_usage      2
4   value  electricity_usage      7
5   value  electricity_usage     13
6   value      running_hours      6
7   value      running_hours      5
8   value      running_hours      5
9   value      running_hours      1
10  value      running_hours      3
11  value      running_hours      6


Df_exp2 = dfex2.melt() # Expected output 2
      NaN   Location     Type  value
0   value  Factory_1  crusher     15
1   value  Factory_1  crusher      6
2   value  Factory_1    mixer     11
3   value  Factory_1    mixer      5
4   value  Factory_1   turner     12
5   value  Factory_1   turner      5
6   value  Factory_2  crusher      2
7   value  Factory_2  crusher      1
8   value  Factory_2    mixer      7
9   value  Factory_2    mixer      3
10  value  Factory_2   turner     13
11  value  Factory_2   turner      6

【问题讨论】:

  • 您能否分享您对这两种情况的预期输出?
  • @MayankPorwal 希望这对你有用。我尝试在打印到控制台时遵循 pandas 表格样式
  • 对于您预期的输出 2,您只需要融化数据框。你快到了:df = pds.DataFrame(ptable_2)df.melt。预期输出 1 也是如此。

标签: python pandas pivot-table aggregate


【解决方案1】:

你几乎做对了:你需要融化你的数据框:

import pandas as pds
rows = [
    [1, 'Factory_1', 'crusher', 'electricity_usage', 15],
    [2, 'Factory_1', 'mixer', 'electricity_usage', 11],
    [3, 'Factory_1', 'turner', 'electricity_usage', 12],
    [4, 'Factory_2', 'crusher', 'electricity_usage', 2],
    [5, 'Factory_2', 'mixer', 'electricity_usage', 7],
    [6, 'Factory_2', 'turner', 'electricity_usage', 13],
    [7, 'Factory_1', 'crusher', 'running_hours', 6],
    [8, 'Factory_1', 'mixer', 'running_hours', 5],
    [9, 'Factory_1', 'turner', 'running_hours', 5],
    [10, 'Factory_2', 'crusher', 'running_hours', 1],
    [11, 'Factory_2', 'mixer', 'running_hours', 3],
    [12, 'Factory_2', 'turner', 'running_hours', 6]
]

dataFrame = pds.DataFrame(rows, columns=["id","Location","Type","recorded_type","value"])


ptable_1 = pds.pivot_table(data=dataFrame,index=['Location', 'Type'], columns=["recorded_type"], values=['value'])


ptable_2 = pds.pivot_table(data=dataFrame,index=['recorded_type'], columns=["Location", "Type"], values=['value'])
df = pds.DataFrame(ptable_1)

dfex1 = pds.DataFrame(ptable_1)
dfex2 = pds.DataFrame(ptable_2)

给你

Df_ex1 = dfex1.melt # Expected output 1
Df_exp2 = dfex2.melt # Expected output 2

【讨论】:

  • 我使用默认参数应用了 melt 方法(),但没有成功。没有总聚集体。请参阅上面的编辑以获取我的输出。
猜你喜欢
  • 2022-08-03
  • 2019-03-03
  • 1970-01-01
  • 2021-08-21
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2019-03-22
  • 2021-02-28
相关资源
最近更新 更多