使用布尔逻辑清理 pandas 中的 DF答案

【问题标题】：Using Boolean Logic to clean DF in pandas使用布尔逻辑清理 pandas 中的 DF
【发布时间】：2018-06-21 10:06:57
【问题描述】：

shape   square
shape   circle
animal   NaN
NaN dog
NaN cat
NaN fish
color   red
color   blue

desired_df

shape   square
shape   circle
animal  dog
animal  cat
animal  fish
color   red
color   blue

我有一个 df 包含需要规范化的信息。

我注意到一种模式，它指示如何连接列和规范化数据。

如果在 Col1 != NaN 和 Col2 == NaN 中并且直接在下一行 Col1 == NaN 和 Col2 != NaN 中，则 Col1 和 Col2 中的值应该连接。这一直持续到到达包含值 Col1 != NaN 和 Col2 !=NaN 的行 .

pandas有没有办法解决这个问题？

我正在考虑的第一步是创建一个附加列以包含 True/False 值以确定要加入的列，但是，一旦这样做，我不确定如何在 Col1 中分配值到 Col2 中的所有相关值。

有什么建议可以达到预期的效果吗？

【问题讨论】：

最好提供 MCVE（可复制粘贴的数据）而不是打印的 DataFrame
问题与正则表达式无关，因此我删除了问题中提到的所有正则表达式。

标签： python pandas boolean series

【解决方案1】：

如果您确定的模式是启发式的，但我很难遵循，您可以尝试 pd.Series.ffill 和 pd.Series.bfill 来达到您想要的结果：

df[0] = df[0].ffill()
df[1] = df[1].bfill()

然后删除重复项：

df = df.drop_duplicates()

print(df)

        0       1
0   shape  square
1   shape  circle
2  animal     dog
4  animal     cat
5  animal    fish
6   color     red
7   color    blue

【讨论】：

您能否详细说明ffill() 和bfill() 的作用？
@Fozoro，当然它们代表“前填充”和“后填充”，或“向前填充”/“向后填充”。这些方法在之前或之后用非NaN 值填充空值（例如NaN）。
@jpp 非常感谢