【问题标题】:Make new column based on groupby calculation根据 groupby 计算创建新列
【发布时间】:2020-08-28 09:06:09
【问题描述】:

我有以下熊猫DataFrame

df = pd.DataFrame({
    "category": ["one", "one", "one", "one", "two", "two", "two", "three", "three", "three"], 
    "value": [2, 4, 3, 2, 5, 6, 5, 7, 8, 6]
})

>>> df
  category  value
0      one      2
1      one      4
2      one      3
3      one      2
4      two      5
5      two      6
6      two      5
7    three      7
8    three      8
9    three      6

我想通过计算中位数(或任何其他 groupby 操作)并从未分组 DataFrame 中的相应值中减去它(或任何其他简单操作)来计算一个名为 normalized 的新列。在非熊猫代码中,这就是我的意思:

new_column = []

# Groupby equivalent
for cat in df["category"].unique():
    curr_df = df[df["category"] == cat]
    curr_median = curr_df.median()
    
    # Calculation on groupby components
    for val in curr_df["value"]:
        normalized = val - curr_median
        new_column.append(normalized)

df["normalized"] = new_column

这会导致以下DataFrame

df = pd.DataFrame({
    "category": ["one", "one", "one", "one", "two", "two", "two", "three", "three", "three"], 
    "value": [2, 4, 3, 2, 5, 6, 5, 7, 8, 6],
    "normalized": [-0.5, 1.5, 0.5, -0.5, 0.0, 1.0, 0.0, 0.0, 1.0, -1.0]
})

>>> df
  category  value  normalized
0      one      2        -0.5
1      one      4         1.5
2      one      3         0.5
3      one      2        -0.5
4      two      5         0.0
5      two      6         1.0
6      two      5         0.0
7    three      7         0.0
8    three      8         1.0
9    three      6        -1.0

我怎样才能用更好的熊猫方式来写这个?在此先感谢:)

【问题讨论】:

标签: python pandas dataframe


【解决方案1】:

transform 是你的朋友。当我想保持原始数据框形状时,我认为这是apply。你可以使用这个:

df["normalized"] = df.value - df.groupby("category").value.transform("median")

输出:

  category  value  normalized
0      one      2        -0.5
1      one      4         1.5
2      one      3         0.5
3      one      2        -0.5
4      two      5         0.0
5      two      6         1.0
6      two      5         0.0
7    three      7         0.0
8    three      8         1.0
9    three      6        -1.0

【讨论】:

    猜你喜欢
    • 2021-06-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2022-10-13
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多