【发布时间】:2020-10-20 02:52:52
【问题描述】:
我在 python 中使用 pandas 来透视一些数据,我希望能够在我的透视表的各个部分执行 2 种类型的聚合。我知道我可以使用边距对所有行/列执行聚合。 但我想跨单列聚合多行(不是全部)或跨单行聚合多列。如何最好地聚合 pandas 中的子行和子列?
示例代码设置:
#Dataset
rows = [
[1, 'Factory_1', 'crusher', 'electricity_usage', 15],
[2, 'Factory_1', 'mixer', 'electricity_usage', 11],
[3, 'Factory_1', 'turner', 'electricity_usage', 12],
[4, 'Factory_2', 'crusher', 'electricity_usage', 2],
[5, 'Factory_2', 'mixer', 'electricity_usage', 7],
[6, 'Factory_2', 'turner', 'electricity_usage', 13],
[7, 'Factory_1', 'crusher', 'running_hours', 6],
[8, 'Factory_1', 'mixer', 'running_hours', 5],
[9, 'Factory_1', 'turner', 'running_hours', 5],
[10, 'Factory_2', 'crusher', 'running_hours', 1],
[11, 'Factory_2', 'mixer', 'running_hours', 3],
[12, 'Factory_2', 'turner', 'running_hours', 6]
]
dataFrame = pds.DataFrame(rows, columns=["id","Location","Type","recorded_type","value"])
#Pivot Table 1: Form multi row aggregation across a single column
ptable_1 = pds.pivot_table(data=dataFrame,index=['Location', 'Type'], columns=["recorded_type"], values=['value'])
print(ptable_1)
#Pivot Table 2: Form multi column aggregation across a single row
ptable_2 = pds.pivot_table(data=dataFrame,index=['recorded_type'], columns=["Location", "Type"], values=['value'])
print(ptable_2)
下面我尝试在单个列中跨多行聚合枢轴 1。我正在尝试汇总每个位置的所有机器记录值的总和。这可以做得更好吗?
#Form aggregation across multiple rows in a single column
df1 = ptable_1.groupby(level=[0]).sum()
df1['Type'] = ["all", "all"]
#Reset index so machine_location is removed from current index
df1.reset_index(inplace=True)
#Set multi-index of location and type
df1.set_index(['Location', 'Type'], inplace=True)
#Concat both dataframes
aggregated_table_1 = pds.concat([ptable_1.reset_index(),df1.reset_index()], ignore_index=True)
#Sort values by location, so appened table values are in the correct position
aggregated_table_1.sort_values('Location', inplace=True)
print(aggregated_table_1)
例如,我正在尝试汇总特定工厂所有机器类型的用电量。所以聚合位于类型列中,类型为“all” ptable_1 的预期输出:
+---------------+-----------+---------+-------------------+---------------+
| | Location | Type | value | value |
+---------------+-----------+---------+-------------------+---------------+
| recorded_type | | | electricity_usage | running_hours |
| | Factory_1 | crusher | 15 | 6 |
| | Factory_1 | mixer | 11 | 5 |
| | Factory_1 | turner | 12 | 5 |
| | Factory_1 | all | 38 | 16 |
| | Factory_2 | crusher | 2 | 1 |
| | Factory_2 | mixer | 7 | 3 |
| | Factory_2 | turner | 13 | 6 |
| | Factory_2 | all | 22 | 10 |
+---------------+-----------+---------+-------------------+---------------+
其次,我不确定如何在子列之间进行聚合,以便对 ptable_2 的每种类型的所有列进行汇总。聚合是一个新列,类型为“全部”
ptable_2 的预期输出:
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Location | Factory_1 | Factory_1 | Factory_1 | Factory_1 | Factory_2 | Factory_2 | Factory_2 | Factory_2 |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
| Type | crusher | mixer | turner | all | crusher | mixer | turner | all |
| recorded_type | | | | | | | | |
| electricity_usage | 15 | 11 | 12 | 38 | 2 | 7 | 13 | 22 |
| running_hours | 6 | 5 | 5 | 16 | 1 | 3 | 6 | 10 |
+-------------------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
编辑 1 这是我的输出直接来自 python 应用 Serge de Gosson de Varennes 方法的 melt() 使用默认参数。我丢失了每一行的记录类型,它被一个 NaN 列替换。我应该尝试从中聚合以形成我的预期输出吗?
Df_ex1 = dfex1.melt() # Expected output 1
NaN recorded_type value
0 value electricity_usage 15
1 value electricity_usage 11
2 value electricity_usage 12
3 value electricity_usage 2
4 value electricity_usage 7
5 value electricity_usage 13
6 value running_hours 6
7 value running_hours 5
8 value running_hours 5
9 value running_hours 1
10 value running_hours 3
11 value running_hours 6
Df_exp2 = dfex2.melt() # Expected output 2
NaN Location Type value
0 value Factory_1 crusher 15
1 value Factory_1 crusher 6
2 value Factory_1 mixer 11
3 value Factory_1 mixer 5
4 value Factory_1 turner 12
5 value Factory_1 turner 5
6 value Factory_2 crusher 2
7 value Factory_2 crusher 1
8 value Factory_2 mixer 7
9 value Factory_2 mixer 3
10 value Factory_2 turner 13
11 value Factory_2 turner 6
【问题讨论】:
-
您能否分享您对这两种情况的预期输出?
-
@MayankPorwal 希望这对你有用。我尝试在打印到控制台时遵循 pandas 表格样式
-
对于您预期的输出 2,您只需要融化数据框。你快到了:
df = pds.DataFrame(ptable_2)和df.melt。预期输出 1 也是如此。
标签: python pandas pivot-table aggregate