【问题标题】:Retain the values only in those rows of the column based on the condition on other columns in pandas根据熊猫中其他列的条件,仅保留该列的那些行中的值
【发布时间】:2021-08-01 15:28:11
【问题描述】:

我有一个数据框 df_in,其中包含以 pipm 开头的列名。

df_in = pd.DataFrame([[1,2,3,4,"",6,7,8,9],["",1,32,43,59,65,"",83,97],["",51,62,47,58,64,74,86,99],[73,51,42,67,54,65,"",85,92]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])

如果列名中以 pi 开头的行为空白,则将以 pm 开头的相同行设为空白,直到我们有一个新列开始与 pi。并对其他列也重复相同的过程。

预期输出:

df_out = pd.DataFrame([[1,2,3,4,"","",7,8,9],["","","","",59,65,"","",""],["","","","",58,64,74,86,99],[73,51,42,67,54,65,"","",""]], columns=["piabc","pmed","pmrde","pmret","pirtc","pmere","piuyt","pmfgf","pmthg"])

怎么做?

【问题讨论】:

    标签: python python-3.x pandas python-2.7 dataframe


    【解决方案1】:

    您可以通过将列名按str.startswith 与累积总和进行比较来创建组,然后按groupby 中的空格比较值以获取用于在DataFrame.mask 中设置空格的掩码:

    g = df_in.columns.str.startswith('pi').cumsum()
    df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform(lambda x: x.iat[0]), '')
    
    #first for me failed in pandas 1.2.3
    #df = df_in.mask(df_in.eq('').groupby(g, axis=1).transform('first'), '')
    
    
    print (df)
      piabc pmed pmrde pmret pirtc pmere piuyt pmfgf pmthg
    0     1    2     3     4                 7     8     9
    1                           59    65                  
    2                           58    64    74    86    99
    3    73   51    42    67    54    65    
    

    【讨论】:

      猜你喜欢
      • 2018-12-22
      • 2021-08-08
      • 2022-06-10
      • 2018-11-06
      • 1970-01-01
      • 2021-12-24
      • 2018-02-27
      • 1970-01-01
      • 2018-04-26
      相关资源
      最近更新 更多