python来计算某些列中的确切分布数是Dataframe答案

【问题标题】：python to calculate the exact number of distribution in some column is Dataframepython来计算某些列中的确切分布数是Dataframe
【发布时间】：2017-12-21 00:41:32
【问题描述】：

编写python程序获取一个dataframe(pandas)-"pre_data_matrix"，在这个dataframe中有一个名为"PostTextPolarity"的列，它的值在-1到1之间，想计算"PostTextPolarity"的个数例如当>0,0时“PostTextPolarity”的个数是10000，可能是“PostTextPolarity”个数

    select_sql = "select userID,userName,userURL,postTime,postText,postTextLength,likesCount,sharesCount,commentsCount,postTextPolarity,postTextSubjectivity from fb_pre_davi_group_members_posts"
    cur.execute(select_sql)

    pre_data = cur.fetchall()
    pre_data_list = list(pre_data )
    ...
    pre_data_matrix = pd.DataFrame(pre_data_list,columns = ['userId','UserName','UserURL','PostTime','PostText','PostTextLength','LikesCount','SharesCount','CommentsCount','PostTextPolarity','PostTextSubjectivity'])
    print(pre_data_matrix )

它显示：

         LikesCount  SharesCount  CommentsCount      PostTextPolarity  \
    0       0            0              0                   0.0   
    1       0            0              0    0.3571428571428571   
    2       3            0              0                   1.0   
    3      11            0              0                   0.0   
    4      11            0              0   0.46909090909090906   
    5       0            0              0                   0.9   
    6      11            0              1                 0.625   
    7      11            0              1                   0.0   
    8      11            0              0               0.56875   
    9      11            0              0                   0.0   
   10      0            0              1   0.08333333333333333   
   11      20            0              2                   0.0   
   12      4            0              1                   0.0   
   13      7            0              1                   0.0   
   14      11            0              1                  0.25   
   ...

能否请您告诉我如何获得 PostTextPolarity >0,=0 和

【问题讨论】：

花点时间观看这个演讲并练习概念/示例。您的解决方案应该很明显 - pandas.pydata.org/talks.html#pycon-us-2015
请阅读How to Ask和minimal reproducible example

标签： python pandas dataframe

【解决方案1】：

通过 pandas 库使用 np.where：

g = pd.np.where(df.PostTextPolarity == 0,'Equals 0',pd.np.where(df.PostTextPolarity < 0,'< 0','> 0'))

df.groupby(g)['PostTextPolarity'].count().rename_axis('Category').reset_index()

输出：

   Category  PostTextPolarity
0       > 0                 8
1  Equals 0                 7

【讨论】：