绘制 groupbys 时 Seaborn 出现“无法解释输入”错误答案

【问题标题】：'Could not interpret input' error with Seaborn when plotting groupbys绘制 groupbys 时 Seaborn 出现“无法解释输入”错误
【发布时间】：2015-12-30 17:33:34
【问题描述】：

假设我有这个数据框

d = {     'Path'   : ['abc', 'abc', 'ghi','ghi', 'jkl','jkl'],
          'Detail' : ['foo', 'bar', 'bar','foo','foo','foo'],
          'Program': ['prog1','prog1','prog1','prog2','prog3','prog3'],
          'Value'  : [30, 20, 10, 40, 40, 50],
          'Field'  : [50, 70, 10, 20, 30, 30] }


df = DataFrame(d)
df.set_index(['Path', 'Detail'], inplace=True)
df

               Field Program  Value
Path Detail                      
abc  foo        50   prog1     30
     bar        70   prog1     20
ghi  bar        10   prog1     10
     foo        20   prog2     40
jkl  foo        30   prog3     40
     foo        30   prog3     50

我可以聚合它没问题（如果有更好的方法可以做到这一点，顺便说一下，我想知道！）

df_count = df.groupby('Program').count().sort(['Value'], ascending=False)[['Value']]
df_count

Program   Value
prog1    3
prog3    2
prog2    1

df_mean = df.groupby('Program').mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Program  Value
prog3    45
prog2    40
prog1    20

我可以从 Pandas 绘制它没问题...

df_mean.plot(kind='bar')

但为什么我在 seaborn 中尝试时会出现此错误？

sns.factorplot('Program',data=df_mean)
    ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-23c2921627ec> in <module>()
----> 1 sns.factorplot('Program',data=df_mean)

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in factorplot(x, y, hue, data, row, col, col_wrap, estimator, ci, n_boot, units, order, hue_order, row_order, col_order, kind, size, aspect, orient, color, palette, legend, legend_out, sharex, sharey, margin_titles, facet_kws, **kwargs)
   2673     # facets to ensure representation of all data in the final plot
   2674     p = _CategoricalPlotter()
-> 2675     p.establish_variables(x_, y_, hue, data, orient, order, hue_order)
   2676     order = p.group_names
   2677     hue_order = p.hue_names

C:\Anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
    143                 if isinstance(input, string_types):
    144                     err = "Could not interperet input '{}'".format(input)
--> 145                     raise ValueError(err)
    146 
    147             # Figure out the plotting orientation

ValueError: Could not interperet input 'Program'

【问题讨论】：

标签： python pandas grouping aggregate seaborn

【解决方案1】：

您得到异常的原因是Program 在您的group_by 操作之后成为数据帧df_mean 和df_count 的索引。

如果您想从df_mean 中获取factorplot，一个简单的解决方案是将索引添加为列，

In [7]:

df_mean['Program'] = df_mean.index

In [8]:

%matplotlib inline
import seaborn as sns
sns.factorplot(x='Program', y='Value', data=df_mean)

但是您可以更简单地让factorplot 为您计算，

sns.factorplot(x='Program', y='Value', data=df)

你会得到同样的结果。

在 cmets 之后编辑

确实，您对参数as_index 提出了非常好的观点；默认情况下，它设置为 True，在这种情况下，Program 将成为索引的一部分，就像您的问题一样。

In [14]:

df_mean = df.groupby('Program', as_index=True).mean().sort(['Value'], ascending=False)[['Value']]
df_mean

Out[14]:
        Value
Program 
prog3   45
prog2   40
prog1   20

需要明确的是，这种方式Program 不再是列，而是成为索引。 df_mean['Program'] = df_mean.index 的技巧实际上保持索引不变，并为索引添加一个新列，因此 Program 现在被复制了。

In [15]:

df_mean['Program'] = df_mean.index
df_mean

Out[15]:
        Value   Program
Program     
prog3   45  prog3
prog2   40  prog2
prog1   20  prog1

但是，如果将 as_index 设置为 False，则会将 Program 作为一列，加上一个新的自动增量索引，

In [16]:

df_mean = df.groupby('Program', as_index=False).mean().sort(['Value'], ascending=False)[['Program', 'Value']]
df_mean

Out[16]:
    Program Value
2   prog3   45
1   prog2   40
0   prog1   20

这样您就可以直接将其提供给seaborn。不过，您可以使用 df 并获得相同的结果。

【讨论】：

非常感谢您的回复。一开始我以为是索引问题。但是根据the documentation，as index参数默认为True，所以组标签（即Program）已经是索引了。 df_mean.indexIndex(['prog3', 'prog2', 'prog1'], dtype='object', name='Program')我尝试了第二种方法，我也收到了同样的错误。
我不确定我们是否相互理解。无论如何，您对as_index 参数提出了一个很好的观点，我正在更新答案。希望现在更清楚了。
抱歉 - 我刚刚意识到我们对索引的看法相同。我认为 factorplot 默认可以使用 x 轴的索引。所以我很困惑你的第二个解决方案返回相同的错误
对不起，我打错了。第二种解决方案是sns.factorplot(x='Program', y='Value', data=df)，这意味着您可以直接使用df。希望现在更有意义。
非常感谢。我看到我的错误是 x 值需要是列，而不是索引。