groupby pandas：插入列的索引与框架索引不兼容答案

【问题标题】：groupby pandas : incompatible index of inserted column with frame indexgroupby pandas：插入列的索引与框架索引不兼容
【发布时间】：2017-01-16 00:30:28
【问题描述】：

我在 pandas 上执行了 groupby，我想应用一个复杂的函数，该函数需要多个输入，并提供一个我想在原始数据帧中刻录的 pandas Series 作为输出。这对我来说是一个已知的程序，并且运行良好 - 在最后一种情况下除外（我对无法完整发布代码表示歉意）。基本上我得到了TypeError: incompatible index of inserted column with frame index。但是，如下所示，我不应该得到一个。

group_by 部分：

all_in_data_risk['weights_of_the_sac'] = all_in_data_risk.groupby(['ptf', 'ac'])['sac', 'unweighted_weights_by_sac', 'instrument_id', 'risk_budgets_sac'].apply(lambda x: wrapper_new_risk_budget(x, temp_fund_all_ret, method_compute_cov))

函数在哪里：

def wrapper_new_risk_budget:
     print(x.index)
     ...     
     print(result.index)
     return result.loc[:, 'res']

引发了这个错误：

    raise TypeError('incompatible index of inserted column '
TypeError: incompatible index of inserted column with frame index

问题是这样的：

print(np.array_equal(result.index, x.index))

产生所有True。这应该是索引匹配的保证，因此问题不应该只是存在。

现在，我知道我提供的信息很难说，但您是否对问题所在有任何见解？

ps：我已经尝试将结果转换为数据帧并尝试将输出重铸为pd.Series(result.loc[:, 'res'].values, index=result.index)

【问题讨论】：

标签： python pandas indexing group-by

【解决方案1】：

我遇到了这个问题并找到了解决方法。就我而言，我需要这样做：df.groupby('id').apply(func)，然后它返回一个 nx1 数据帧，它的形状与df.shape[0] 完全相同，但会出现同样的问题。

因为第一次 groupby 时，会收到一个多重索引，它和 df 不同。

但是你可以通过reset并重新指定origin index来解决问题，比如：

df['a']=df.groupby('id').apply(lambda x:func(x)).reset_index().set_index('level_1').drop('id',axis=1)

顺便说一句，你应该非常小心这个功能。返回的数据框应包含与 df 相同的索引。

【讨论】：

用这个答案解决了这个stackoverflow.com/questions/68555285/…！

【解决方案2】：

简化问题：

在原始问题中应该这样做：

df[‘new_column’] = df.groupby(...).aggregationfunction()

如果至少满足以下条件之一，这通常会起作用：

groupyby 仅超过一列。
groupyby 聚合函数不会减少行数。（例如 cumcount() ）

如果没有同时给出两个条件，则可能会出现错误“TypeError: incompatible index of the inserted column with frame index”。

上升错误示例

请看下面的例子：

df = pd.DataFrame({'foo':[0,1]*2,'foo2':np.zeros(4).astype(int),'bar':np.arange(4)})
df

>     foo    foo2     bar
> 0     0       0       0
> 1     1       0       1
> 2     0       0       2
> 3     1       0       3

df['bar_max'] = df.groupby(['foo','foo2'])['bar'].max()
> TypeError: incompatible index of inserted column with frame index

解决方案

使用 groupby 中的“as_index= False”，您可以创建一个数据框，您可以将其加入原始数据框：

df_grouped = df.groupby(['foo','foo2'], as_index= False)['bar'].max().rename(columns={'bar': 'bar_max'})
df = df.merge(df_grouped, on = ['foo','foo2'])
df

>   foo     foo2    bar     bar_max
>0  0       0       0       2
>1  0       0       2       2
>2  1       0       1       3
>3  1       0       3       3

【讨论】：

【解决方案3】：

好的，由于我无法理解的原因，当我在代码中执行合并时，虽然它们的 numpy 表示是等价的，但它们在 pandas 眼前的其他方面有所不同。我尝试了一种合并的变通方法（更长且效率更低），现在使用更传统的方法它可以工作。

今天我将无法发布完整的示例，因为我时间紧迫，而且我的最后期限迫在眉睫，但我会尽快完成它，以表达对那些回答或尝试过的人的尊重这样做并向所有其他可能发现有助于解决此问题的用户

【讨论】：

那么解决方案在哪里
也有这个问题......嗯