【问题标题】:How to Groupby columns(ignore order) in Pandas DataFrame?如何在 Pandas DataFrame 中对列进行分组(忽略顺序)?
【发布时间】:2020-11-16 08:03:15
【问题描述】:

我有一个 pandas 数据框(8 列中的 4 列):

df = pd.DataFrame( {"departure_country":["Mexico","Mexico","United States","United States","United States","United States","Japan","United States","United States","United States"],"departure_city":["Guadalajara","Guadalajara","New York","Chicago","Los Angeles","Michigan","Tokyo","New York","New York","Chicago"],"destination_country":["United States","United States","United States","United States","Mexico","United States","United States","Mexico","United States","Japan"],"destination_city":["Los Angeles","Los Angeles","Chicago","New York","Guadalajara","New York","Chicago","Guadalajara","Michigan","Tokyo"]})

df
    departure_country   departure_city  destination_country destination_city
0   Mexico              Guadalajara     United States       Los Angeles
1   Mexico              Guadalajara     United States       Los Angeles
2   United States       New York        United States       Chicago
3   United States       Chicago         United States       New York
4   United States       Los Angeles     Mexico              Guadalajara
5   United States       Michigan        United States       New York
6   Japan               Tokyo           United States       Chicago
7   United States       New York        Mexico              Guadalajara
8   United States       New York        United States       Michigan
9   United States       Chicago         Japan               Tokyo

我想分析每个组中的数据,所以我想先按出发地和目的地的“同一对”进行分组,例如:

    departure_country   departure_city  destination_country destination_city
0   Mexico              Guadalajara     United States       Los Angeles
1   Mexico              Guadalajara     United States       Los Angeles
2   United States       Los Angeles     Mexico              Guadalajara
3   United States       New York        United States       Chicago
4   United States       Chicago         United States       New York
5   United States       Michigan        United States       New York
6   United States       New York        United States       Michigan
7   Japan               Tokyo           United States       Chicago
8   United States       Chicago         Japan               Tokyo
9   United States       New York        Mexico              Guadalajara

是否可以在 DataFrame 中制作它?我尝试过 groupby 和 key-value,但我失败了。 非常感谢您的帮助,谢谢!

【问题讨论】:

    标签: python pandas dataframe sorting group-by


    【解决方案1】:

    我相信有人会想到一个更好的优化解决方案,但一种方法是创建您的国家/城市对的排序元组并按它排序:

    print (df.assign(country=[tuple(sorted(i)) for i in df.filter(like="country").to_numpy()],
                     city=[tuple(sorted(i)) for i in df.filter(like="city").to_numpy()])
             .sort_values(["country","city"], ascending=False).filter(like="_"))
    
      departure_country departure_city destination_country destination_city
    5     United States       Michigan       United States         New York
    8     United States       New York       United States         Michigan
    2     United States       New York       United States          Chicago
    3     United States        Chicago       United States         New York
    7     United States       New York              Mexico      Guadalajara
    0            Mexico    Guadalajara       United States      Los Angeles
    1            Mexico    Guadalajara       United States      Los Angeles
    4     United States    Los Angeles              Mexico      Guadalajara
    6             Japan          Tokyo       United States          Chicago
    9     United States        Chicago               Japan            Tokyo
    

    【讨论】:

    • 嗨,@Henry。谢谢,有帮助!
    猜你喜欢
    • 2021-10-31
    • 2014-11-29
    • 2022-08-15
    • 2014-07-10
    • 1970-01-01
    • 1970-01-01
    • 2014-05-15
    • 2021-12-17
    • 2021-11-21
    相关资源
    最近更新 更多