pivot_table 没有要聚合的数字类型答案

【问题标题】：pivot_table No numeric types to aggregatepivot_table 没有要聚合的数字类型
【发布时间】：2017-01-06 19:52:13
【问题描述】：

我想从以下数据框中创建一个数据透视表，其中包含 sales、rep 列。数据透视表显示sales，但没有显示rep。当我只尝试rep 时，我收到了错误DataError: No numeric types to aggregate。如何解决此问题，以便我同时看到数字字段 sales 和字段（字符串）rep

data = {'year': ['2016', '2016', '2015', '2014', '2013'],
        'country':['uk', 'usa', 'fr','fr','uk'],
        'sales': [10, 21, 20, 10,12],
        'rep': ['john', 'john', 'claire', 'kyle','kyle']
        }

print pd.DataFrame(data).pivot_table(index='country', columns='year', values=['rep','sales'])

        sales               
year     2013 2014 2015 2016
country                     
fr        NaN   10   20  NaN
uk         12  NaN  NaN   10
usa       NaN  NaN  NaN   21


print pd.DataFrame(data).pivot_table(index='country', columns='year', values=['rep'])
DataError: No numeric types to aggregate

【问题讨论】：

这取决于你想要做什么。默认的 agg 函数是“平均值”，您不能取销售代表的平均值。更改 agg 函数或传递另一列的值。如果只想使用数据透视表，请使用数据透视表而不是数据透视表。

标签： python pandas

【解决方案1】：

您可以使用set_index 和unstack：

df = pd.DataFrame(data)
df.set_index(['year','country']).unstack('year')

产量

          rep                     sales                  
year     2013  2014    2015  2016  2013  2014  2015  2016
country                                                  
fr       None  kyle  claire  None   NaN  10.0  20.0   NaN
uk       kyle  None    None  john  12.0   NaN   NaN  10.0
usa      None  None    None  john   NaN   NaN   NaN  21.0

或者，使用pivot_table 和aggfunc='first'：

df.pivot_table(index='country', columns='year', values=['rep','sales'], aggfunc='first')

产量

          rep                     sales                  
year     2013  2014    2015  2016  2013  2014  2015  2016
country                                                  
fr       None  kyle  claire  None  None    10    20  None
uk       kyle  None    None  john    12  None  None    10
usa      None  None    None  john  None  None  None    21

使用aggfunc='first'，每个(country, year, rep) 或(country, year, sales) 通过获取找到的第一个值来聚合组。在您的情况下，似乎没有重复，因此第一个值与唯一值相同。

【讨论】：

aggfunc = 'first' 很棒。这正是对我有用的方法。

【解决方案2】：

问题似乎来自列rep和sales的不同类型，如果将sales转换为str类型并将aggfunc指定为sum，它可以正常工作：

df.sales = df.sales.astype(str)

pd.pivot_table(df, index=['country'], columns=['year'], values=['rep', 'sales'], aggfunc='sum')

#        rep                            sales
#  year 2013    2014    2015    2016    2013    2014    2015    2016
# country                               
# fr    None    kyle    claire  None    None      10      20    None
# uk    kyle    None    None    john      12    None    None    10
#usa    None    None    None    john    None    None    None    21

【讨论】：

如果索引列中没有重复，也会出现问题，在这种情况下为'country'。如果值列之一（'rep' 和 'sales'）是混合类型，则会引发错误。