【问题标题】:Jupyter Notebook Ipython: Groupby based on the values alphabeticallyJupyter Notebook Ipython:Groupby 基于字母顺序的值
【发布时间】:2018-04-22 09:51:06
【问题描述】:

我是第一次使用 jupyter notebook。我试图按一列 csv 分组并获取值的计数。我用这段代码得到了以下结果。

import pandas
pandas.read_csv('a.csv', sep=',')
df.groupby('name').name.count()
name
>Aa</TOPONYM>                 4
>Aachen</TOPONYM>             5
>Aartselaar</TOPONYM>         1
>Abadan</TOPONYM>             1
>Abaya</TOPONYM>              1
>Abba</TOPONYM>              12
>Abbey                        2
>Abbeydale</TOPONYM>          1
>Abbot</TOPONYM>              2
>Abbots                       3
>Abbotsford</TOPONYM>        22
>Abbotsinch</TOPONYM>         5
>Abbotts                      1
>Abel</TOPONYM>               1
>Aberchirder</TOPONYM>        2
>Aberdare</TOPONYM>           3
>Aberdeen                     1
>Aberdeen</TOPONYM>         163
>Aberdeenshire</TOPONYM>    286
>Aberdour</TOPONYM>           9
>Aberfan</TOPONYM>            1
>Aberfeldy</TOPONYM>         16
>Abergavenny</TOPONYM>        4
>Aberlady                     1
>Aberlady</TOPONYM>           3
>Abernethy</TOPONYM>          1
>Abertay                      1
>Abertillery</TOPONYM>        6
>Abha</TOPONYM>               2
>Abidjan</TOPONYM>           10
                           ... 
>Zakho</TOPONYM>             20
>Zakopane</TOPONYM>           1
>Zambezi                      2
>Zambezi</TOPONYM>            8
>Zambia</TOPONYM>            19
>Zamboanga</TOPONYM>          4
>Zandak</TOPONYM>             3
>Zanzibar</TOPONYM>          11
>Zaragosa</TOPONYM>           1
>Zaragoza</TOPONYM>           4
>Zeebrugge</TOPONYM>         28
>Zeeland</TOPONYM>            2
>Zemun</TOPONYM>              1
>Zenica</TOPONYM>            12
>Zermatt</TOPONYM>            5
>Zetland</TOPONYM>            1
>Zhizhong</TOPONYM>           1
>Zhongshan</TOPONYM>          2
>Zhuhai</TOPONYM>             1
>Zimbabwe</TOPONYM>         377
>Znamenskoye</TOPONYM>        1
>Zoetermeer</TOPONYM>         1
>Zola</TOPONYM>               1
>Zomba</TOPONYM>              3
>Zulu</TOPONYM>               1
>Zululand</TOPONYM>           2
>Zuni</TOPONYM>               2
>Zurich</TOPONYM>            86
>Zvornik</TOPONYM>            3
>Zwolle</TOPONYM>             1
Name: name, Length: 8585, dtype: int64

是否可以按字母表获取计数,首先我应该使用字母 a 运行命令,然后它应该给出所有值,然后是下一个 b,依此类推。或者是否有可能获得跳过开始 100 个值的值。

我的真实数据是这样的:

<TOPONYM    geonameid="2657540" lat="51.24827"  lon="-0.76389"  >Aldershot</TOPONYM>    
<TOPONYM    geonameid="3037854" lat="49.9"  lon="2.3"   >Amiens</TOPONYM>   
<TOPONYM    geonameid="6216857" lat="-43.59832" lon="171.55011" >Alaska</TOPONYM>   
<TOPONYM    geonameid="3037854" lat="49.9"  lon="2.3"   >Amiens</TOPONYM>   
<TOPONYM    geonameid="2759794" lat="52.37403"  lon="4.88969"   >Amsterdam</TOPONYM>    
<TOPONYM    geonameid="7216668" lat="28.0106"   lon="-82.1184"  >Alabama</TOPONYM>  
<TOPONYM    geonameid="5884078" lat="48.98339"  lon="-73.34907" >Ally</TOPONYM> 
<TOPONYM    geonameid="2507480" lat="36.7525"   lon="3.04197"   >Algiers</TOPONYM>  
<TOPONYM    geonameid="2759794" lat="52.37403"  lon="4.88969"   >Amsterdam</TOPONYM>    
<TOPONYM    geonameid="2759794" lat="52.37403"  lon="4.88969"   >Amsterdam</TOPONYM>    

【问题讨论】:

    标签: python pandas jupyter-notebook pandas-groupby


    【解决方案1】:

    您可以使用str[1] 选择首字母,然后使用value_counts

    df = pandas.read_csv('a.csv')
    
    a = df['name'].str[0].value_counts().rename_axis('alph').reset_index(name='count')
    

    第二个字母groupby 的另一种解决方案:

    a = df['name'].groupby(df['name'].str[0]).count().reset_index(name='count')
    

    a = df['name'].groupby(df['name'].str[0]).size().reset_index(name='count')
    

    【讨论】:

    • 'SeriesGroupBy' 对象没有属性 'counts'。这是我收到的错误
    • 我添加了解决方案而不是df.groupby('name').name.count(),所以请尝试删除它。
    • 我运行它既没有给出错误也没有给出任何结果。
    • 第一个表示没有数据,第二个需要df = df[df['name'].str.contains( 'Amsterdam', na=False)]
    • 错字,需要pandas.set_option
    猜你喜欢
    • 2015-04-27
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多