编辑:
更新的答案在最终 df 中缺少 BW 的值。
import pandas as pd
import numpy as np
BW = 999
txt = -999
A = [1,10,23,45,24,24,55,67,73,26,13,96,53,23,24,43,90]
B = [24,23,29, BW,49,59,72, BW,9,183,17, txt,2,49,BW,479,BW]
df = pd.DataFrame({'A': A, 'B': B})
df = df.assign(group = (df[~df['B'].between(BW,BW)].index.to_series().diff() > 1).cumsum())
df['C'] = np.where(df.group == df[df.B == txt].group.values[0], np.nan, df.A)
df['C'] = np.where(df['B'] == BW, df['B'], df['C'])
df['C'] = df['C'].astype('Int64')
df = df.drop('group', axis=1)
In [435]: df
Out[435]:
A B C
0 1 24 1
1 10 23 10
2 23 29 23
3 45 999 999 <-- BW
4 24 49 24
5 24 59 24
6 55 72 55
7 67 999 999 <-- BW
8 73 9 <NA>
9 26 183 <NA>
10 13 17 <NA>
11 96 -999 <NA> <-- txt is in the middle of BW
12 53 2 <NA>
13 23 49 <NA>
14 24 999 999 <-- BW
15 43 479 43
16 90 999 999 <-- BW
你可以这样实现,假设 BW 和 txt 是特定值,我只是用一些随机数填充它们以区分它们
In [277]: BW = 999
In [278]: txt = -999
In [293]: A = [1,10,23,45,24,24,55,67,73,26,13,96,53,23,24,43,90]
...: B = [24,23,29, BW,49,59,72, BW,9,183,17, txt,49,BW,479,BW]
In [300]: df = pd.DataFrame({'A': A, 'B': B})
In [301]: df
Out[301]:
A B
0 1 24
1 10 23
2 23 29
3 45 999
4 24 49
5 24 59
6 55 72
7 67 999
8 73 9
9 26 183
10 13 17
11 96 -999
12 53 2
13 23 49
14 24 999
15 43 479
16 90 999
首先让我们拆分不同的值组,在这里我将它们拆分为唯一的组,其中每个组包含B 的值,这些值介于值BW 和下一个BW 之间。
In [321]: df = df.assign(group = (df[~df['B'].between(BW,BW)].index.to_series().diff() > 1).cumsum())
In [322]: df
Out[322]:
A B group
0 1 24 0.00000000
1 10 23 0.00000000
2 23 29 0.00000000
3 45 999 NaN
4 24 49 1.00000000
5 24 59 1.00000000
6 55 72 1.00000000
7 67 999 NaN
8 73 9 2.00000000
9 26 183 2.00000000
10 13 17 2.00000000
11 96 -999 2.00000000
12 53 2 2.00000000
13 23 49 2.00000000
14 24 999 NaN
15 43 479 3.00000000
16 90 999 NaN
接下来使用np.where(),我们可以根据您设置的条件替换这些值。
In [360]: df['C'] = np.where(df.group == df[df.B == txt].group.values[0], np.nan, df.B)
In [432]: df
Out[432]:
A B group C
0 1 24 0.00000000 24.00000000
1 10 23 0.00000000 23.00000000
2 23 29 0.00000000 29.00000000
3 45 999 NaN 999.00000000
4 24 49 1.00000000 49.00000000
5 24 59 1.00000000 59.00000000
6 55 72 1.00000000 72.00000000
7 67 999 NaN 999.00000000
8 73 9 2.00000000 NaN
9 26 183 2.00000000 NaN
10 13 17 2.00000000 NaN
11 96 -999 2.00000000 NaN
12 53 2 2.00000000 NaN
13 23 49 2.00000000 NaN
14 24 999 NaN 999.00000000
15 43 479 3.00000000 479.00000000
16 90 999 NaN 999.00000000
这里我们需要将 B 等于 BW for C 设置回 B 的值。
In [488]: df['C'] = np.where(df['B'] == BW, df['B'], df['C'])
In [489]: df
Out[489]:
A B group C
0 1 24 0.00000000 24.00000000
1 10 23 0.00000000 23.00000000
2 23 29 0.00000000 29.00000000
3 45 999 NaN 999.00000000
4 24 49 1.00000000 49.00000000
5 24 59 1.00000000 59.00000000
6 55 72 1.00000000 72.00000000
7 67 999 NaN 999.00000000
8 73 9 2.00000000 NaN
9 26 183 2.00000000 NaN
10 13 17 2.00000000 NaN
11 96 -999 2.00000000 NaN
12 53 2 2.00000000 NaN
13 23 49 2.00000000 NaN
14 24 999 NaN 999.00000000
15 43 479 3.00000000 479.00000000
16 90 999 NaN 999.00000000
最后只需将 float 列转换为 int 并删除我们不再需要的 group 列。如果您想保持 NaN 值为 np.nan,则忽略到 Int64 的转换。
In [396]: df.C = df.C.astype('Int64')
In [397]: df
Out[397]:
A B group C
0 1 24 0.00000000 24
1 10 23 0.00000000 23
2 23 29 0.00000000 29
3 45 999 NaN 999
4 24 49 1.00000000 49
5 24 59 1.00000000 59
6 55 72 1.00000000 72
7 67 999 NaN 999
8 73 9 2.00000000 <NA>
9 26 183 2.00000000 <NA>
10 13 17 2.00000000 <NA>
11 96 -999 2.00000000 <NA>
12 53 2 2.00000000 <NA>
13 23 49 2.00000000 <NA>
14 24 999 NaN 999
15 43 479 3.00000000 479
16 90 999 NaN 999
In [398]: df = df.drop('group', axis=1)
In [435]: df
Out[435]:
A B C
0 1 24 24
1 10 23 23
2 23 29 29
3 45 999 999
4 24 49 49
5 24 59 59
6 55 72 72
7 67 999 999
8 73 9 <NA>
9 26 183 <NA>
10 13 17 <NA>
11 96 -999 <NA>
12 53 2 <NA>
13 23 49 <NA>
14 24 999 999
15 43 479 479
16 90 999 999