【问题标题】:Transform with group by in Pandas在 Pandas 中使用 group by 进行转换
【发布时间】:2017-08-16 05:08:05
【问题描述】:

我正在创建一个数据框

import pandas as pd

 df1 = pd.DataFrame( {     
"Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,           
"City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle",     
"Portland"]   } )          

df1.groupby( ["City"] )['Name'].transform(lambda x:     
','.join(x)).drop_duplicates()      
I want the output as      

Name    City                 
Alice,Bob,Mallory,Bob     Seattle           
Mallory,Mallory    Portland        

but i am getting only           
Name         
Alice,Bob,Mallory,Bob                
Mallory,Mallory        

This is an example with small number of columns but in my actual problem i 
have too many columns so i cannot use           
df1['Name']= df1.groupby( ['City'] )['Name'].transform(lambda x:         
','.join(x))              
df1.groupby( ['City','Name'], as_index=False )              
df1.drop_duplicates()          

因为我必须为每一列编写相同的代码
有没有什么方法可以在不为每一列编写转换的情况下做到这一点 分别。

【问题讨论】:

    标签: python python-3.x pandas pandas-groupby


    【解决方案1】:

    1.列聚合

    我认为您需要 apply,.join,然后对于变更单使用 double [[]]

    df = df1.groupby(["City"])['Name'].apply(','.join).reset_index()
    df = df[['Name','City']]
    print (df)
                        Name      City
    0        Mallory,Mallory  Portland
    1  Alice,Bob,Mallory,Bob   Seattle
    

    因为transform 使用聚合值创建新列:

    df1['new'] = df1.groupby("City")['Name'].transform(','.join)
    print (df1)
           City     Name                    new
    0   Seattle    Alice  Alice,Bob,Mallory,Bob
    1   Seattle      Bob  Alice,Bob,Mallory,Bob
    2  Portland  Mallory        Mallory,Mallory
    3   Seattle  Mallory  Alice,Bob,Mallory,Bob
    4   Seattle      Bob  Alice,Bob,Mallory,Bob
    5  Portland  Mallory        Mallory,Mallory
    

    2。列和更多聚合

    如果更多列需要agg 并在[] 中指定列或没有指定连接所有字符串列:

    df1 = pd.DataFrame( {     
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,  
    "Name2":   ["Alice1", "Bob1", "Mallory1", "Mallory1", "Bob1" , "Mallory1"],      
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle",     
    "Portland"]   } )   
    print (df1)
           City     Name     Name2
    0   Seattle    Alice    Alice1
    1   Seattle      Bob      Bob1
    2  Portland  Mallory  Mallory1
    3   Seattle  Mallory  Mallory1
    4   Seattle      Bob      Bob1
    5  Portland  Mallory  Mallory1
    
    df = df = df1.groupby('City')['Name', 'Name2'].agg(','.join).reset_index()
    print (df)
           City                   Name                      Name2
    0  Portland        Mallory,Mallory          Mallory1,Mallory1
    1   Seattle  Alice,Bob,Mallory,Bob  Alice1,Bob1,Mallory1,Bob1
    

    Anf 如果需要聚合所有列:

    df = df1.groupby('City').agg(','.join).reset_index()
    print (df)
           City                   Name                      Name2
    0  Portland        Mallory,Mallory          Mallory1,Mallory1
    1   Seattle  Alice,Bob,Mallory,Bob  Alice1,Bob1,Mallory1,Bob1
    

    df1 = pd.DataFrame( {     
    "Name" : ["Alice", "Bob", "Mallory", "Mallory", "Bob" , "Mallory"] ,  
    "Name2":   ["Alice1", "Bob1", "Mallory1", "Mallory1", "Bob1" , "Mallory1"],      
    "City" : ["Seattle", "Seattle", "Portland", "Seattle", "Seattle", "Portland"],
    'Numbers':[1,5,4,3,2,1]} )   
    print (df1)
           City     Name     Name2  Numbers
    0   Seattle    Alice    Alice1        1
    1   Seattle      Bob      Bob1        5
    2  Portland  Mallory  Mallory1        4
    3   Seattle  Mallory  Mallory1        3
    4   Seattle      Bob      Bob1        2
    5  Portland  Mallory  Mallory1        1
    
    
    df = df1.groupby('City').agg({'Name': ','.join, 
                                  'Name2': ','.join, 
                                  'Numbers': 'max'}).reset_index()
    print (df)
           City                   Name                      Name2  Numbers
    0  Portland        Mallory,Mallory          Mallory1,Mallory1        4
    1   Seattle  Alice,Bob,Mallory,Bob  Alice1,Bob1,Mallory1,Bob1        5
    

    【讨论】:

    • 好的,谢谢,这是可行的,还有一件事假设我还有一列带有数字,我必须使用上述操作计算该列的最大值或最小值,那么我将如何将两个 agg 函数合二为一声明。
    • 很高兴能帮上忙祝你好运!
    • @vatsal 您也可以对此答案和其他答案进行投票以表示额外的赞赏。
    【解决方案2】:

    你可以的

    In [42]: df1.groupby('City')['Name'].agg(','.join).reset_index(name='Name')
    Out[42]:
           City                   Name
    0  Portland        Mallory,Mallory
    1   Seattle  Alice,Bob,Mallory,Bob
    

    或者,

    In [49]: df1.groupby('City', as_index=False).agg({'Name': ','.join})
    Out[49]:
           City                   Name
    0  Portland        Mallory,Mallory
    1   Seattle  Alice,Bob,Mallory,Bob
    

    对于多个聚合

    df1.groupby('City', as_index=False).agg(
          {'Name': ','.join, 'Name2': ','.join, 'Number': 'max'})
    

    【讨论】:

    • 如果我还有一列作为 Name2 那么我将如何使用上述函数通过字符串聚合获得相同的结果。
    • 检查我的答案。
    猜你喜欢
    • 2021-06-03
    • 2018-03-05
    • 2017-10-17
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-07-08
    相关资源
    最近更新 更多