一旦无法同时将fillna() 与 values 和 method 参数一起使用,我认为没有直接的解决方案。
这里描述的解决方案有改进的余地。
创建数据帧
import pandas as pd
df = pd.DataFrame({"patient": ["A", "A", "A", "B", "B", "B", "B", "B"], "drug":["V", "W", "X", "V", "W", "X", "Y", "Z"], "start_day":[0, 4, 10, 0, 4, 4, 10, 11], "end_day":[3, None, 15, 3, None, None, 15, None]})
print(df)
patient drug start_day end_day
0 A V 0 3.0
1 A W 4 NaN
2 A X 10 15.0
3 B V 0 3.0
4 B W 4 NaN
5 B X 4 NaN
6 B Y 10 15.0
7 B Z 11 NaN
尝试fillna(method='bfill')
bfill 方法将使用下一个非 NaN 值填充 NaN 值。
df["end_day"].fillna(method='bfill')
0 3.0
1 15.0
2 15.0
3 3.0
4 15.0
5 15.0
6 15.0
7 NaN
Name: end_day, dtype: float64
如上所述,它将填充同一列的值。
尝试fillna(using the start_day column)
df["end_day"].fillna(df["start_day"])
0 3.0
1 4.0
2 15.0
3 3.0
4 4.0
5 4.0
6 15.0
7 11.0
Name: end_day, dtype: float64
如我们所见,每个 NaN 都填充了同一索引的 start_day 值。
尝试将两个尝试放在一起
df["end_day"].fillna(df["start_day"], method='bfill')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/XXXX/.pyenv/versions/3.8.1/lib/python3.8/site-packages/pandas/core/series.py", line 4517, in fillna
return super().fillna(
File "/Users/XXXX/.pyenv/versions/3.8.1/lib/python3.8/site-packages/pandas/core/generic.py", line 6012, in fillna
value, method = validate_fillna_kwargs(value, method)
File "/Users/XXXX/.pyenv/versions/3.8.1/lib/python3.8/site-packages/pandas/util/_validators.py", line 347, in validate_fillna_kwargs
raise ValueError("Cannot specify both 'value' and 'method'.")
ValueError: Cannot specify both 'value' and 'method'.
正如我们所料:
ValueError:不能同时指定“值”和“方法”。
所以,这种情况下非常“丑陋”的解决方案:
df["new_end_day"] = df["end_day"]
i = 0
while i < rows:
if str(df["end_day"][i]) == "nan":
j = i + 1
while j < rows and str(df["end_day"][j]) == "nan":
j += 1
for n in range(i, j):
try:
df["new_end_day"][n] = df["start_day"][j]
except:
pass
i += 1
结果是
print(df)
patient drug start_day end_day new_end_day
0 A V 0 3.0 3.0
1 A W 4 NaN 10.0
2 A X 10 15.0 15.0
3 B V 0 3.0 3.0
4 B W 4 NaN 10.0
5 B X 4 NaN 10.0
6 B Y 10 15.0 15.0
7 B Z 11 NaN NaN
无需创建new_end_day 列,您可以对end_day 列执行相同操作。
对于大型 DataFrame,这可能需要一段时间
更新基于this
这行得通。
这样就可以了
df['end_day_1'] = df['end_day'].fillna(df['start_day'].mask(df['end_day'].isna()).bfill())
输出将是:
print(df)
patient drug start_day end_day end_day_1
0 A V 0 3.0 3.0
1 A W 4 NaN 10.0
2 A X 10 15.0 15.0
3 B V 0 3.0 3.0
4 B W 4 NaN 10.0
5 B X 4 NaN 10.0
6 B Y 10 15.0 15.0
7 B Z 11 NaN NaN
正如我在开头所写的,这还有改进的余地。