我怎样才能得到负百分比的计数和最负的排序？答案

【问题标题】：How can I get a count of percent negative and order by most negative?我怎样才能得到负百分比的计数和最负的排序？
【发布时间】：2020-02-11 04:36:59
【问题描述】：

我想出了一种方法来进行分组并根据两个字段获取计数：

df.groupby(['brand','result']).size()
df.groupby(['brand','result']).count()

这会产生相同的结果。我的数据现在看起来像这样。

Johnson's Baby Powder   negative         21  
                        neutral          5  
                        positive         121

Estee Lauder            negative         7  
                        positive         23

Calvin Klein            negative         10  
                        neutral          3  
                        positive         29

我想得到每个品牌的结果百分比，像这样。

Johnson's Baby Powder   negative    21  0.142857143
                        neutral     5   0.034013605
                        positive    121 0.823129252

Estee Lauder            negative    7   0.233333333
                        positive    23  0.766666667

Calvin Klein            negative    10  0.238095238
                        neutral     3   0.071428571
                        positive    29  0.69047619

不过，最终我只想显示“结果”负值 > 20% 的“品牌”。

所以，我想看看这个（以及其他符合业务逻辑规则的品牌）。

Estee Lauder            negative    7   0.233333333
                        positive    23  0.766666667

我该怎么做？

【问题讨论】：

标签： python python-3.x pandas pandas-groupby

【解决方案1】：

试试

x = df.groupby(['brand'])['result'].value_counts(normalize=True)

样本数据输出

>>> y = x.loc[(x.index.get_level_values(1) == 'negative')]

>>> y[y>0.2]
airline         airline_sentiment
American        negative             0.710402
Delta           negative             0.429793
Southwest       negative             0.490083
US Airways      negative             0.776862
United          negative             0.688906
Virgin America  negative             0.359127
Name: airline_sentiment, dtype: float64

>>> y[y>0.2].index.get_level_values(0)
Index(['American', 'Delta', 'Southwest', 'US Airways', 'United',
       'Virgin America'],
      dtype='object', name='airline')

【讨论】：

甜！！这给了我每个人的百分比。我怎样才能只显示负值 >20% 的品牌。也许我必须将它放入数据框中，然后应用过滤器。这是要走的路吗？泰。
你能添加一些示例数据@ASH
所以，使用您的代码，我可以看到：Polo 正 0.783237 负 0.210983 中性 0.00578 Proactiv 正 0.677419 负 0.322581 纯正 0.709524 负 0.271429 中性 0.019048 玉兰油正 0.803497 负 0.1看到玉兰油，因为负百分比计算是

【解决方案2】：

添加到@Vishnudev 的答案，使用：

print(df[df.groupby(['brand'])['result'].value_counts(normalize=True).ge(0.5).tolist()])

输出：

          brand    result  number
3  Estee Lauder  negative       7
4  Estee Lauder  positive      23

【讨论】：

这看起来很有希望，FowardVickel。但是，当我运行您的代码时，出现此错误：IndexError: Boolean index has wrong length: 443 而不是 35907
@ASH 已编辑立即尝试