【问题标题】:Pandas count on columns熊猫数列
【发布时间】:2017-04-21 11:18:15
【问题描述】:

我正在将 SPSS 代码转换为 Pandas,并且我正在尝试找到一种 Python 的方式来表达这件事:

COUNT WBbf = M1 M26 M38 M50 M62 M74 M85 M97 M109 
         M121 M133 M144 (1). 

COUNT SPbf = M2 M15 M39 M51 M75 M87 M110 (1) 
           M63 M98 M122 M134 M145 (0).

COUNT ACbf = M3 M16 M27 M52 M76 M88 M111 M123 M135 M146 (1) 
            M64 M99 (0).

COUNT SCbf = M5 M17 M40 M77 M112 (1) 
            M28 M65 M89 M100 M124 M136 M148 (0).

我的数据框有这种形式:

In [90]: data[b]
Out[90]: 
                               M1   M2   M3   M4   M5   M6   M7   M8   M9  \
case_id                                                                     
ERAB_S1_LR_Q1_261016          1.0  1.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
ERAB_AS_011116                1.0  1.0  0.0  1.0  1.0  1.0  1.0  1.0  0.0   
ERAB_S2_LR_Q1_021116AFTERNOO  1.0  1.0  1.0  1.0  0.0  1.0  0.0  0.0  1.0   
ERAB_S2_AS031116MORNING       1.0  1.0  0.0  1.0  0.0  1.0  0.0  0.0  1.0   
ERAB_S3_AS031116AFTERNOON     1.0  0.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
ERAB_S1_AS041116              1.0  0.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
ERAB_LOH__S3_021116           1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  1.0   
ERAB_LR_081116                1.0  1.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
ERAB_S1_AS_111116             1.0  1.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0   
ERAB_S1_141116AFTERNOON       1.0  1.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
ERAB_S1_LOH_151116            1.0  0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0   
ERAB_S1_161116                1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  1.0   

等等…… 我想计算这些值并为每个案例 ID 创建一个包含结果的新列。

【问题讨论】:

    标签: python pandas count spss


    【解决方案1】:

    我相信您可以先按loc 选择数据,然后按eq 比较,然后按sum True 每行值进行比较:

    #add strings by your data
    SPbf1 = 'M2 M5 M8'.split()
    SPbf0 = 'M6 M9'.split()
    print (SPbf1)
    ['M2', 'M5', 'M8']
    
    print (SPbf0)
    ['M6', 'M9']
    
    df['SPbf'] = df[SPbf1].eq(1).sum(axis=1) + df[SPbf0].eq(0).sum(axis=1)
    
    print (df)
                                   M1   M2   M3   M4   M5   M6   M7   M8   M9  \
    case_id                                                                     
    ERAB_S1_LR_Q1_261016          1.0  1.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_AS_011116                1.0  1.0  0.0  1.0  1.0  1.0  1.0  1.0  0.0   
    ERAB_S2_LR_Q1_021116AFTERNOO  1.0  1.0  1.0  1.0  0.0  1.0  0.0  0.0  1.0   
    ERAB_S2_AS031116MORNING       1.0  1.0  0.0  1.0  0.0  1.0  0.0  0.0  1.0   
    ERAB_S3_AS031116AFTERNOON     1.0  0.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_S1_AS041116              1.0  0.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_LOH__S3_021116           1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_LR_081116                1.0  1.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_S1_AS_111116             1.0  1.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0   
    ERAB_S1_141116AFTERNOON       1.0  1.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_S1_LOH_151116            1.0  0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0   
    ERAB_S1_161116                1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  1.0   
    
                                  SPbf  
    case_id                             
    ERAB_S1_LR_Q1_261016             2  
    ERAB_AS_011116                   4  
    ERAB_S2_LR_Q1_021116AFTERNOO     1  
    ERAB_S2_AS031116MORNING          1  
    ERAB_S3_AS031116AFTERNOON        1  
    ERAB_S1_AS041116                 1  
    ERAB_LOH__S3_021116              2  
    ERAB_LR_081116                   2  
    ERAB_S1_AS_111116                2  
    ERAB_S1_141116AFTERNOON          2  
    ERAB_S1_LOH_151116               2  
    ERAB_S1_161116                   2  
    

    如果某些列名可能丢失,请改用loc 使用reindex_axis

    SPbf1 = 'M2 M15 M39 M51 M75 M87 M110'.split()
    SPbf0 = 'M63 M98 M122 M134 M145'.split()
    print (SPbf1)
    ['M2', 'M15', 'M39', 'M51', 'M75', 'M87', 'M110']
    
    print (SPbf0)
    ['M63', 'M98', 'M122', 'M134', 'M145']
    
    df['SPbf'] = df.reindex_axis(SPbf1, axis=1).eq(1).sum(axis=1) + \
                 df.reindex_axis(SPbf0, axis=1).eq(0).sum(axis=1)
    

    print (df)
                                   M1   M2   M3   M4   M5   M6   M7   M8   M9  \
    case_id                                                                     
    ERAB_S1_LR_Q1_261016          1.0  1.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_AS_011116                1.0  1.0  0.0  1.0  1.0  1.0  1.0  1.0  0.0   
    ERAB_S2_LR_Q1_021116AFTERNOO  1.0  1.0  1.0  1.0  0.0  1.0  0.0  0.0  1.0   
    ERAB_S2_AS031116MORNING       1.0  1.0  0.0  1.0  0.0  1.0  0.0  0.0  1.0   
    ERAB_S3_AS031116AFTERNOON     1.0  0.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_S1_AS041116              1.0  0.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_LOH__S3_021116           1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_LR_081116                1.0  1.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_S1_AS_111116             1.0  1.0  0.0  1.0  0.0  0.0  0.0  0.0  1.0   
    ERAB_S1_141116AFTERNOON       1.0  1.0  0.0  1.0  1.0  1.0  0.0  0.0  1.0   
    ERAB_S1_LOH_151116            1.0  0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0   
    ERAB_S1_161116                1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  1.0   
    
                                  SPbf  
    case_id                             
    ERAB_S1_LR_Q1_261016             1  
    ERAB_AS_011116                   1  
    ERAB_S2_LR_Q1_021116AFTERNOO     1  
    ERAB_S2_AS031116MORNING          1  
    ERAB_S3_AS031116AFTERNOON        0  
    ERAB_S1_AS041116                 0  
    ERAB_LOH__S3_021116              1  
    ERAB_LR_081116                   1  
    ERAB_S1_AS_111116                1  
    ERAB_S1_141116AFTERNOON          1  
    ERAB_S1_LOH_151116               0  
    ERAB_S1_161116                   1  
    

    【讨论】:

      猜你喜欢
      • 2020-06-03
      • 1970-01-01
      • 2019-10-12
      • 1970-01-01
      • 1970-01-01
      • 2021-05-27
      • 1970-01-01
      • 2021-09-22
      • 2015-07-31
      相关资源
      最近更新 更多