【问题标题】:Get the last column of a pd.dataFrame and add it to another pd.dataFrame获取 pd.dataFrame 的最后一列并将其添加到另一个 pd.dataFrame
【发布时间】:2017-07-21 23:51:42
【问题描述】:

我有一个如下所示的 Excel 文件:

CompanyName    High Priority     QualityIssue
Customer1         Yes             User
Customer1         Yes             User
Customer2         No              User
Customer3         No              Equipment
Customer1         No              Neither
Customer3         No              User
Customer3         Yes             User
Customer3         Yes             Equipment
Customer4         No              User

我想计算CompanyName 中每个实例在每种QualityIssue 中出现的次数,并按出现次数下降排序。

例如,通过使用以下代码:

df.groupby(["CompanyName ", "QualityIssue"]).size().to_frame('Count')

我明白了:

Out:

CompanyName       QualityIssue    Count
Customer2         User            1
Customer1         Neither         1
Customer4         User            1
Customer1         User            2
Customer3         Equipment       2
Customer3         User            2

然后假设我在内存中有上述的另一个副本。

我想要的是将第二个查询的最后一列添加到第一个查询的末尾(实际上它不会是它的副本,它只是一个示例):

CompanyName       QualityIssue    Count1    Count2
Customer2         User            1        1
Customer1         Neither         1        1
Customer4         User            1        1
Customer1         User            2        2
Customer3         Equipment       2        2
Customer3         User            2        2      

这里的问题是,如果我这样做了

df['Count'] 

它不会只打印那一列,它会打印所有内容,就像这样做一样

print df  

所以我无法找到一种方法来仅获取数据帧的最后一列以将其添加到另一列。

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    使用groupbysize 的快速简单方法

    df.groupby(['CompanyName', 'QualityIssue']).size()
    
    CompanyName  QualityIssue
    Customer1    Neither         1
                 User            2
    Customer2    User            1
    Customer3    Equipment       2
                 User            2
    Customer4    User            1
    dtype: int64
    

    假设我们在内存中还有另一个

    c1 = df.groupby(['CompanyName', 'QualityIssue']).size()
    c2 = c1.copy()
    

    然后使用pd.concat

    pd.concat([c1, c2], keys=['Count1', 'Count2']).unstack(0, fill_value=0)
    
                              Count1  Count2
    CompanyName QualityIssue                
    Customer1   Neither            1       1
                User               2       2
    Customer2   User               1       1
    Customer3   Equipment          2       2
                User               2       2
    Customer4   User               1       1
    

    reset_index 如果您希望索引正确返回到数据框中。

    pd.concat([c1, c2], keys=['Count1', 'Count2']).unstack(0, fill_value=0) \
        .reset_index()
    
      CompanyName QualityIssue  Count1  Count2
    0   Customer1      Neither       1       1
    1   Customer1         User       2       2
    2   Customer2         User       1       1
    3   Customer3    Equipment       2       2
    4   Customer3         User       2       2
    5   Customer4         User       1       1
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-08-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2018-03-20
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多