【问题标题】:Count number of counties per state using python {census}使用 python {census} 计算每个州的县数
【发布时间】:2017-03-24 05:17:31
【问题描述】:

我很难用著名的cenus.csv 数据来计算县的数量。

任务:统计每个州的县数。

面对比较(我认为)/请阅读以下内容?

我试过了:

df = pd.read_csv('census.csv')
dfd = df[:]['STNAME'].unique()  //Gives out names of state

serr = pd.Series(dfd)  // converting to series (from array)

在此之后,我尝试了两种方法:

1:

    df[df['STNAME'] == serr] **//ERROR: series length must match**

2:

i = 0
for name in serr:                        //This generate error 'Alabama'
    df['STNAME'] == name
    for i in serr:
        serr[i] == serr[name]
        print(serr[name].count)
        i+=1

请指导我;这些东西已经用了三天了。

【问题讨论】:

    标签: python pandas dataset data-science


    【解决方案1】:

    使用groupby 并使用nunique 聚合COUNTY

    In [1]: import pandas as pd
    
    In [2]: df = pd.read_csv('census.csv')
    
    In [3]: unique_counties = df.groupby('STNAME')['COUNTY'].nunique()
    

    现在结果

    In [4]: unique_counties
    Out[4]: 
    STNAME
    Alabama                  68
    Alaska                   30
    Arizona                  16
    Arkansas                 76
    California               59
    Colorado                 65
    Connecticut               9
    Delaware                  4
    District of Columbia      2
    Florida                  68
    Georgia                 160
    Hawaii                    6
    Idaho                    45
    Illinois                103
    Indiana                  93
    Iowa                    100
    Kansas                  106
    Kentucky                121
    Louisiana                65
    Maine                    17
    Maryland                 25
    Massachusetts            15
    Michigan                 84
    Minnesota                88
    Mississippi              83
    Missouri                116
    Montana                  57
    Nebraska                 94
    Nevada                   18
    New Hampshire            11
    New Jersey               22
    New Mexico               34
    New York                 63
    North Carolina          101
    North Dakota             54
    Ohio                     89
    Oklahoma                 78
    Oregon                   37
    Pennsylvania             68
    Rhode Island              6
    South Carolina           47
    South Dakota             67
    Tennessee                96
    Texas                   255
    Utah                     30
    Vermont                  15
    Virginia                134
    Washington               40
    West Virginia            56
    Wisconsin                73
    Wyoming                  24
    Name: COUNTY, dtype: int64
    

    【讨论】:

      【解决方案2】:

      juanpa.arrivillaga 有一个很好的解决方案。但是,代码需要稍作修改。

      应过滤带有'SUMLEV' == 40'COUNTY' == 0 的“县”。否则,所有县的数量都太大了。

      所以,正确答案应该是:

      unique_counties = census_df[census_df['SUMLEV'] == 50].groupby('STNAME')['COUNTY'].nunique()
      

      结果如下:

      STNAME
      Alabama                  67
      Alaska                   29
      Arizona                  15
      Arkansas                 75
      California               58
      Colorado                 64
      Connecticut               8
      Delaware                  3
      District of Columbia      1
      Florida                  67
      Georgia                 159
      Hawaii                    5
      Idaho                    44
      Illinois                102
      Indiana                  92
      Iowa                     99
      Kansas                  105
      Kentucky                120
      Louisiana                64
      Maine                    16
      Maryland                 24
      Massachusetts            14
      Michigan                 83
      Minnesota                87
      Mississippi              82
      Missouri                115
      Montana                  56
      Nebraska                 93
      Nevada                   17
      New Hampshire            10
      New Jersey               21
      New Mexico               33
      New York                 62
      North Carolina          100
      North Dakota             53
      Ohio                     88
      Oklahoma                 77
      Oregon                   36
      Pennsylvania             67
      Rhode Island              5
      South Carolina           46
      South Dakota             66
      Tennessee                95
      Texas                   254
      Utah                     29
      Vermont                  14
      Virginia                133
      Washington               39
      West Virginia            55
      Wisconsin                72
      Wyoming                  23
      Name: COUNTY, dtype: int64
      

      【讨论】:

        【解决方案3】:

        @Bakhtawar - 这是一个非常简单的方法:

        df.groupby(df['STNAME']).count().COUNTY
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2017-04-05
          • 2017-10-27
          • 2012-10-17
          • 2021-03-14
          • 1970-01-01
          • 1970-01-01
          相关资源
          最近更新 更多