【发布时间】:2020-08-28 09:06:09
【问题描述】:
我有以下熊猫DataFrame:
df = pd.DataFrame({
"category": ["one", "one", "one", "one", "two", "two", "two", "three", "three", "three"],
"value": [2, 4, 3, 2, 5, 6, 5, 7, 8, 6]
})
>>> df
category value
0 one 2
1 one 4
2 one 3
3 one 2
4 two 5
5 two 6
6 two 5
7 three 7
8 three 8
9 three 6
我想通过计算中位数(或任何其他 groupby 操作)并从未分组 DataFrame 中的相应值中减去它(或任何其他简单操作)来计算一个名为 normalized 的新列。在非熊猫代码中,这就是我的意思:
new_column = []
# Groupby equivalent
for cat in df["category"].unique():
curr_df = df[df["category"] == cat]
curr_median = curr_df.median()
# Calculation on groupby components
for val in curr_df["value"]:
normalized = val - curr_median
new_column.append(normalized)
df["normalized"] = new_column
这会导致以下DataFrame:
df = pd.DataFrame({
"category": ["one", "one", "one", "one", "two", "two", "two", "three", "three", "three"],
"value": [2, 4, 3, 2, 5, 6, 5, 7, 8, 6],
"normalized": [-0.5, 1.5, 0.5, -0.5, 0.0, 1.0, 0.0, 0.0, 1.0, -1.0]
})
>>> df
category value normalized
0 one 2 -0.5
1 one 4 1.5
2 one 3 0.5
3 one 2 -0.5
4 two 5 0.0
5 two 6 1.0
6 two 5 0.0
7 three 7 0.0
8 three 8 1.0
9 three 6 -1.0
我怎样才能用更好的熊猫方式来写这个?在此先感谢:)
【问题讨论】: