【问题标题】:Pandas - How to group by both brackets and unique column values?Pandas - 如何按方括号和唯一列值分组?
【发布时间】:2018-06-26 18:15:01
【问题描述】:

所以,我遇到了一个有趣的条形图 我找到了underlying data here,我正在尝试重新创建数据是如何按范围箱(我使用过pd.cut)和国家/地区分组的。

这是我到目前为止尝试过的代码,但出现错误,(错误的)行被注释掉了

import pandas as pd

## csv file in zip http://ec.europa.eu/eurostat/cache/GISCO/geodatafiles/GEOSTAT-grid-POP-1K-2011-V2-0-1.zip

url="C:/Users/Simon/Downloads/GEOSTAT-grid-POP-1K-2011-V2-0-1/Version 2_0_1/GEOSTAT_grid_POP_1K_2011_V2_0_1.csv"
whole=pd.read_csv(url, low_memory=False)

populationDensity=whole[['TOT_P','CNTR_CODE']]


## trying to replicate graph here http://www.centreforcities.org/wp-content/uploads/2018/04/18-04-16-Square-kilometre-units-of-land-by-population.png
## which aggregates the records by brackets


# https://stackoverflow.com/questions/25010215/pandas-groupby-how-to-compute-counts-in-ranges#answer-25010952
ranges = [0,10000,15000,20000,25000,30000,35000,40000,45000,1000000]
bins=pd.cut(populationDensity['TOT_P'],ranges)



#print(bins)

## the following fails with error :
## AttributeError: Cannot access callable attribute 'groupby' of 'DataFrameGroupBy' objects, try using the 'apply' method
#print (populationDensity.groupby(['CNTR_CODE']).groupby(bins).count())

## the following fails with error :
## TypeError: 'Series' objects are mutable, thus they cannot be hashed
print (populationDensity.groupby(['CNTR_CODE'],pd.cut(populationDensity['TOT_P'],ranges)).count())

#relevant https://stackoverflow.com/questions/21441259/pandas-groupby-range-of-values#answer-21441621

我才刚刚开始使用熊猫。我明天再试一次,在此期间如果有人知道...

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    变化:

    print (populationDensity.groupby(['CNTR_CODE'],pd.cut(populationDensity['TOT_P'],ranges)).count())
    

    print (populationDensity.groupby(['CNTR_CODE', pd.cut(populationDensity['TOT_P'],ranges)]).count())
                                                ^                                           ^
    

    因为groupby 参数by 使用多个列名,组合列名和系列或list 中的多个系列:

    by:映射、函数、标签或标签列表

    用于确定 groupby 的组。如果 by 是一个函数,它会在对象索引的每个值上调用。如果传递了 dict 或 Series,则 Series 或 dict VALUES 将用于确定组(Series 的值首先对齐;参见 .align() 方法)。如果传递了 ndarray,则按原样使用这些值来确定组。一个标签或标签列表可以通过 self 中的列传递给 group。请注意,元组被解释为(单个)键。

    【讨论】:

      猜你喜欢
      • 2020-09-08
      • 1970-01-01
      • 1970-01-01
      • 2020-10-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多