使用 pandas 从每个组中随机选择一行答案

【问题标题】：Randomly select a row from each group using pandas使用 pandas 从每个组中随机选择一行
【发布时间】：2019-06-15 13:20:13
【问题描述】：

我有一个熊猫数据框df，如下所示：

Month   Day mnthShape
1      1    1.016754224
1      1    1.099451003
1      1    0.963911929
1      2    1.016754224
1      1    1.099451003
1      2    0.963911929
1      3    1.016754224
1      3    1.099451003
1      3    1.783775568

我想从df得到以下信息：

Month   Day mnthShape
1       1   1.016754224
1       2   1.016754224
1       3   1.099451003

mnthShape 值是从索引中随机选择的。即如果查询是 df.loc[(1, 1)] 它应该查找 (1, 1) 的所有值并从中随机选择一个要在上面显示的值。

【问题讨论】：

标签： python pandas dataframe random

【解决方案1】：

使用groupby 和apply 来为每个组随机选择一行。

np.random.seed(0)
df.groupby(['Month', 'Day'])['mnthShape'].apply(np.random.choice).reset_index()

   Month  Day  mnthShape
0      1    1   1.016754
1      1    2   0.963912
2      1    3   1.099451

如果您想知道采样行来自哪个索引，请使用pd.Series.sample 和n=1：

np.random.seed(0)
(df.groupby(['Month', 'Day'])['mnthShape']
   .apply(pd.Series.sample, n=1)
   .reset_index(level=[0, 1]))

   Month  Day  mnthShape
2      1    1   0.963912
3      1    2   1.016754
6      1    3   1.016754

【讨论】：

【解决方案2】：

一种方法是Series.sample() 每个组中的随机行：

pd.np.random.seed(1)

res = df.groupby(['Month', 'Day'])['mnthShape'].apply(lambda x: x.sample()).reset_index(level=[0, 1])

res
   Month  Day  mnthShape
0      1    1   1.099451
1      1    2   1.016754
2      1    3   1.016754

【讨论】：