如何使用熊猫从每组每行的列中减去一个值列表答案

【问题标题】：How to subtract a list of values from a column every each group of rows with pandas如何使用熊猫从每组每行的列中减去一个值列表
【发布时间】：2021-09-10 05:53:31
【问题描述】：

我有一个值列表：

T=['23','22','13','25','33','21','20']

还有一个包含一列十四行的文件：

代码：

df - pd.Series(T)

我想每 7 行从文件的列中减去列表 T（即包含 7 行），但忽略 -9999 和 0 的值。我如何在 python 中使用 pandas 来做到这一点？

 df.mask(df.isin([-9999, 0]))

预期的输出如下：

   col  new
0    24    1
1    25    3
2    15    0
3    27    0
4    35    2
5    39   18
6    40   20
7    33   10
8    44   22
9    45   32
10   27    2
11   39    6
12   35   14
13   39   19

【问题讨论】：

标签： python pandas subtraction

【解决方案1】：

将numpy.tile 用于列表的重复值，按df 的长度过滤，转换为整数并减去是最简单和最快的解决方案：

T=('23','22','13','25','33','21','20')

#if there is always 14 rows
#df['new'] = df['col'].sub(np.tile(T,2).astype(int))

#any rows
df['new'] = df['col'].sub(np.tile(T, len(df) // len(T) + 2)[:len(df)].astype(int))
print (df)
    col  new
0    24    1
1    25    3
2    15    2
3    27    2
4    35    2
5    39   18
6    40   20
7    33   10
8    44   22
9    45   32
10   27    2
11   39    6
12   35   14
13   39   19

或者可以在新 Series 和原始 df.index 之间使用 align by index 值进行整数除法：

T=('23','22','13','25','33','21','20')


df.index = df.index % len(T) 
df['new'] = df['col'].sub(pd.Series(T).astype(int).loc[df.index])
df = df.reset_index(drop=True)
print (df)
    col  new
0    24    1
1    25    3
2    15    2
3    27    2
4    35    2
5    39   18
6    40   20
7    33   10
8    44   22
9    45   32
10   27    2
11   39    6
12   35   14
13   39   19

或者可以按组减去，这是最慢的解决方案：

f  = lambda x: x.sub(np.array(T).astype(int))
df['new'] = df.groupby(df.index // len(T))['col'].transform(f)

print (df)
    col  new
0    24    1
1    25    3
2    15    2
3    27    2
4    35    2
5    39   18
6    40   20
7    33   10
8    44   22
9    45   32
10   27    2
11   39    6
12   35   14
13   39   19

编辑：在我的解决方案loc 或mask 之后按条件使用设置0：

df.loc[df['col'].isin([-9999, 0]), 'new'] = 0
#alternative
#df['new'] = df['new'].mask(df['col'].isin([-9999, 0]), 0)

print (df)
     col  new
0     24    1
1     25    3
2  -9999    0
3      0    0
4     35    2
5     39   18
6     40   20
7     33   10
8     44   22
9     45   32
10    27    2
11    39    6
12    35   14
13    39   19

【讨论】：

非常感谢它的工作。如果我想忽略特定值怎么办？我如何添加限制，例如df.mask(df.isin([-9999, 0])) ?
@Nat - 我认为为了更好地理解，可以使用 -9999 和 0 更改数据样本并在减去后添加预期输出？