错误'AttributeError：'DataFrameGroupBy'对象没有属性'而数据框上的groupby功能答案

【问题标题】：Error 'AttributeError: 'DataFrameGroupBy' object has no attribute' while groupby functionality on dataframe错误'AttributeError：'DataFrameGroupBy'对象没有属性'而数据框上的groupby功能
【发布时间】：2018-03-14 01:32:18
【问题描述】：

我有一个数据框news_count。以下是它的列名，来自news_count.columns.values 的输出：

 [('date', '') ('EBIX UW Equity', 'NEWS_SENTIMENT_DAILY_AVG') ('Date', '')
  ('day', '') ('month', '') ('year', '')]

我需要按年和月计算groupby 以及'NEWS_SENTIMENT_DAILY_AVG' 的总和值。以下是我尝试过的代码，但都不起作用：

尝试 1

news_count.groupby(['year','month']).NEWS_SENTIMENT_DAILY_AVG.values.sum()

'AttributeError: 'DataFrameGroupBy' object has no attribute'

尝试 2

news_count.groupby(['year','month']).iloc[:,1].values.sum()

AttributeError: Cannot access callable attribute 'iloc' of 'DataFrameGroupBy' objects, try using the 'apply' method

输入数据：

      ticker       date           EBIX UW Equity    month    year
      field             NEWS_SENTIMENT_DAILY_AVG
         0      2007-05-25                   0.3992      5       2007
         1      2007-11-06                   0.3936      11      2007 
         2      2007-11-07                   0.2039      11      2007
         3      2009-01-14                   0.2881       1      2014

【问题讨论】：

你试过news_count.groupby(['year','month']).NEWS_SENTIMENT_DAILY_AVG.sum()吗？
问题是它没有识别NEWS_SENTIMENT_DAILY_AVG 列。错误消息 - AttributeError: 'DataFrameGroupBy' object has no attribute 'NEWS_SENTIMENT_DAILY_AVG'
您在使用多列索引吗？
Reset_index 适用于索引，而不是列...
我不确定我可以吗？因为我不是 100% 确定我了解您的数据框的结构，所以这些列看起来很糟糕。尝试明确地重新分配它们：df.columns = ['date', 'avg', 'day', 'month', 'year', ...] 等等。如果可以，请更新您的数据框，并在我的第一条评论中再次尝试我的建议。

标签： python pandas dataframe group-by pandas-groupby

【解决方案1】：

从 news_count_res 变量中的数据框中提取所需的列，然后应用聚合函数

news_count_res = news_count[['year','month','NEWS_SENTIMENT_DAILY_AVG']]
news_count_res.group(['year','month']).sum()

【讨论】：

感谢您...但我在“df_sample = df.groupby("persons").sample(frac= percent_to_flag, random_state=random_state)”。如果我能弄清楚原因，也许它对我有用......

【解决方案2】：

感谢到目前为止的答案（我已经在那里制作了 cmets，因为我没有这些解决方案可以工作 - 也许我不明白某些东西）。与此同时，我还提出了另一种方法，我仍然怀疑它不是 Pythonic。它确实完成了工作，并且不会花费太长时间来达到我的目的，但是如果我能弄清楚如何调整上面建议的方法以使它们发挥作用，那就太好了……欢迎任何想法！

这是我得到的：

    import pandas as pd
    import math
    y = ['Alex'] * 2321 + ['Doug'] * 34123  + ['Chuck'] * 2012 + ['Bob'] * 9281 
        z = ['xyz'] * len(y)
    df = pd.DataFrame({'persons': y, 'data' : z})
    percent = 10  #CHANGE AS NEEDED

    #add a 'helper'column with random numbers
    df['rand'] = np.random.random(df.shape[0])
    df = df.sample(frac=1)  #optional:  this shuffles data, just to show order doesn't matter

    #CREATE A HELPER LIST
    helper = pd.DataFrame(df.groupby('persons')['rand'].count()).reset_index().values.tolist()
    for row in helper:
        df_temp = df[df['persons'] == row[0]][['persons','rand']]
        lim = math.ceil(len(df_temp) * percent * 0.01)
        row.append(df_temp.nlargest(lim,'rand').iloc[-1][1])

    def flag(name,num):
        for row in helper:
            if row[0] == name:
                if num >= row[2]:
                    return 'yes'
                else:
                    return 'no'
    
    df['flag'] = df.apply(lambda x: flag(x['persons'], x['rand']), axis=1)

并检查结果：

piv = df.pivot_table(index="persons", columns="flag", values="data", aggfunc='count', fill_value=0)
piv = piv.apivend(piv.sum().rename('Total')).assign(Total=lambda x: x.sum(1))
piv['% selected'] = 100 * piv.yes/piv.Total
print(piv)

OUTPUT:
flag        no   yes  Total  % selected
persons                                
Alex      2088   233   2321   10.038776
Bob       8352   929   9281   10.009697
Chuck     1810   202   2012   10.039761
Doug     30710  3413  34123   10.002051
Total    42960  4777  47737   10.006913

似乎可以与不同的 %s 和不同数量的人一起工作……但我认为最好让它更简单。

【讨论】：