【问题标题】:Perform operation on columns based on values of another columns in pandas根据 pandas 中另一列的值对列执行操作
【发布时间】:2021-06-07 00:24:06
【问题描述】:

我有一个数据框

df = pd.DataFrame([["A",1,98,88,"",567,453,545,656,323,756], ["B",1,99,"","",231,232,234,943,474,345], ["C",1,97,67,23,543,458,456,876,935,876], ["B",1,"",79,84,895,237,678,452,545,453], ["A",1,45,"",58,334,778,234,983,858,657], ["C",1,23,55,"",183,565,953,565,234,234]], columns=["id","date","col1","col2","col3","col1_num","col1_deno","col3_num","col3_deno","col2_num","col2_deno"])

我需要为列名的 _num 和 _deno 分别设置 Nan/blank 值。例如:如果 "col1" 的特定行为空白,则将 "col1_num""col1_deno" 的值设为 Nan/blank。基于 "col2""col2_num""col2_deno" 以及 "col3_num" 重复相同的过程> 和 "col3_deno" 基于 "col3"

预期输出:

df_out = pd.DataFrame([["A",1,98,88,"",567,453,"","",323,756], ["B",1,99,"","",231,232,"","","",""], ["C",1,97,67,23,543,458,456,876,935,876], ["B",1,"",79,84,"","",678,452,545,453], ["A",1,45,"",58,334,778,234,983,"",""], ["C",1,23,55,"",183,565,"","",234,234]], columns=["id","date","col1","col2","col3","col1_num","col1_deno","col3_num","col3_deno","col2_num","col2_deno"])

怎么做?

【问题讨论】:

    标签: python python-3.x pandas python-2.7 dataframe


    【解决方案1】:

    MultiIndex 的解决方案:

    #first convert not processing and testing columns to index
    df1 = df.set_index(['id','date'])
    cols = df1.columns
    #split columns by _ for MultiIndex
    df1.columns = df1.columns.str.split('_', expand=True)
    
    #compare columns without _ (with NaN in second level) by empty string
    m = df1.xs(np.nan, axis=1, level=1).eq('')
    #create mask by all columns
    mask = m.reindex(df1.columns, axis=1, level=0)
    #set new values by mask, overwrite columns names
    df1 = df1.mask(mask, '').set_axis(cols, axis=1).reset_index()
    print (df1)
      id  date col1 col2 col3 col1_num col1_deno col3_num col3_deno col2_num  \
    0  A     1   98   88           567       453                         323   
    1  B     1   99                231       232                               
    2  C     1   97   67   23      543       458      456       876      935   
    3  B     1        79   84                         678       452      545   
    4  A     1   45        58      334       778      234       983            
    5  C     1   23   55           183       565                         234   
    
      col2_deno  
    0       756  
    1            
    2       876  
    3       453  
    4            
    5       234  
    

    【讨论】:

      【解决方案2】:

      @shubham 的回答简单明了,我相信也更快;这只是一个选项,您可能无法(或不想)列出所有列

      获取需要更改的列列表:

      cols = [col for col in df if col.startswith('col')]
      
      ['col1',
       'col2',
       'col3',
       'col1_num',
       'col1_deno',
       'col3_num',
       'col3_deno',
       'col2_num',
       'col2_deno']
      

      创建一个字典对 col1 到要更改的列,对 col2 等也是如此:

      from collections import defaultdict
      d = defaultdict(list)
      
      for col in cols:
          if "_" in col:
              d[col.split("_")[0]].append(col)
      
      d
      
      defaultdict(list,
                  {'col1': ['col1_num', 'col1_deno'],
                   'col3': ['col3_num', 'col3_deno'],
                   'col2': ['col2_num', 'col2_deno']})
      

      遍历 dict 以分配新值:

      for key, val in d.items():
          df.loc[df[key].eq(""), val] = ""
      
      
      
      
       id  date col1 col2 col3 col1_num col1_deno col3_num col3_deno col2_num col2_deno
      0  A     1   98   88           567       453                         323       756
      1  B     1   99                231       232                                      
      2  C     1   97   67   23      543       458      456       876      935       876
      3  B     1        79   84                         678       452      545       453
      4  A     1   45        58      334       778      234       983                   
      5  C     1   23   55           183       565                         234       234
      

      【讨论】:

        【解决方案3】:

        让我们尝试布尔掩码

        # select the columns
        c = pd.Index(['col1', 'col2', 'col3'])
        
        # create boolean mask
        m = df[c].eq('').to_numpy()
        
        # mask the values in `_num` and `_deno` like columns
        df[c + '_num'] = df[c + '_num'].mask(m, '')
        df[c + '_deno'] = df[c + '_deno'].mask(m, '')
        

        >>> df
        
          id  date col1 col2 col3 col1_num col1_deno col3_num col3_deno col2_num col2_deno
        0  A     1   98   88           567       453                         323       756
        1  B     1   99                231       232                                      
        2  C     1   97   67   23      543       458      456       876      935       876
        3  B     1        79   84                         678       452      545       453
        4  A     1   45        58      334       778      234       983                   
        5  C     1   23   55           183       565                         234       234
        

        【讨论】:

          猜你喜欢
          • 2018-04-15
          • 1970-01-01
          • 2023-03-28
          • 2018-11-28
          • 2018-09-09
          • 1970-01-01
          • 1970-01-01
          • 2020-04-16
          • 2018-08-16
          相关资源
          最近更新 更多