根据熊猫中其他列的条件，仅保留该列的那些行中的值答案

【问题标题】：Retain the values only in those rows of the column based on the condition on other columns in pandas根据熊猫中其他列的条件，仅保留该列的那些行中的值
【发布时间】：2021-08-01 15:28:11
【问题描述】：

我有一个数据框 df_in，其中包含以 pi 和 pm 开头的列名。

df_in = pd.DataFrame([[1,2,3,4,"",6,7,8,9],["",1,32,43,59,65,"",83,97],["",51,62,47,58,64,74,86,99],[73,51,42,67,54,65,"",85,92]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])

如果列名中以 pi 开头的行为空白，则将以 pm 开头的相同行设为空白，直到我们有一个新列开始与 pi。并对其他列也重复相同的过程。

预期输出：

df_out = pd.DataFrame([[1,2,3,4,"","",7,8,9],["","","","",59,65,"","",""],["","","","",58,64,74,86,99],[73,51,42,67,54,65,"","",""]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])

怎么做？

【问题讨论】：

标签： python python-3.x pandas python-2.7 dataframe

【解决方案1】：

您可以通过将列名按str.startswith 与累积总和进行比较来创建组，然后按groupby 中的空格比较值以获取用于在DataFrame.mask 中设置空格的掩码：

g = df_in.columns.str.startswith('pi').cumsum()
df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform(lambda x: x.iat[0]), '')

#first for me failed in pandas 1.2.3
#df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform('first'), '')


print (df)
  piabc pmed pmrde pmret pirtc pmere piuyt pmfgf pmthg
0     1    2     3     4                 7     8     9
1                           59    65                  
2                           58    64    74    86    99
3    73   51    42    67    54    65

【讨论】：