在 pandas 数据框上使用 polyfit，然后将结果添加到新列答案

【问题标题】：Using polyfit on pandas dataframe and then adding the results to new columns在 pandas 数据框上使用 polyfit，然后将结果添加到新列
【发布时间】：2018-12-10 22:12:28
【问题描述】：

我有一个这样的数据框。对于每个 ID，我有 (x1,x2), (y1,y2)。我想将这些提供给 polyfit()，获取斜率和 x 截距并将它们添加为新列。

    Id        x         y
    1     0.79978   0.018255
    1     1.19983   0.020963
    2     2.39998   0.029006
    2     2.79995   0.033004
    3     1.79965   0.021489
    3     2.19969   0.024194
    4     1.19981   0.019338
    4     1.59981   0.022200
    5     1.79971   0.025629
    5     2.19974   0.028681

我真的需要帮助来分组正确的行并将它们提供给 polyfit。我一直在为此苦苦挣扎。任何帮助都将受到欢迎。

【问题讨论】：

标签： python pandas numpy dataframe linear-regression

【解决方案1】：

您可以groupby 并在每个组内应用拟合。首先，设置索引，这样您以后可以避免合并。

import pandas as pd
import numpy as np

df = df.set_index('Id')
df['fit'] = df.groupby('Id').apply(lambda x: np.polyfit(x.x, x.y, 1))

df 现在是：

          x         y                                           fit
Id                                                                 
1   0.79978  0.018255  [0.0067691538557680215, 0.01284116612923385]
1   1.19983  0.020963  [0.0067691538557680215, 0.01284116612923385]
2   2.39998  0.029006   [0.00999574968122608, 0.005016400680051043]
2   2.79995  0.033004   [0.00999574968122608, 0.005016400680051043]
3   1.79965  0.021489  [0.006761823817618233, 0.009320083766623343]
3   2.19969  0.024194  [0.006761823817618233, 0.009320083766623343]
...

如果您想为每个部分分别设置单独的列，您可以应用 pd.Series。

df[['slope', 'intercept']] = df.fit.apply(pd.Series)
df = df.drop(columns='fit')

或者从最初的 DataFrame 中粘贴一个 apply 并连接结果。

# From initial DataFrame
df = df.set_index('Id')
res = df.groupby('Id').apply(lambda x: pd.Series(np.polyfit(x.x, x.y, 1), 
                                                 index=['slope', 'intercept']))
df = pd.concat([df, res], axis=1)

df 现在是：

          x         y     slope  intercept
Id                                        
1   0.79978  0.018255  0.006769   0.012841
1   1.19983  0.020963  0.006769   0.012841
2   2.39998  0.029006  0.009996   0.005016
2   2.79995  0.033004  0.009996   0.005016
3   1.79965  0.021489  0.006762   0.009320
3   2.19969  0.024194  0.006762   0.009320
4   1.19981  0.019338  0.007155   0.010753
4   1.59981  0.022200  0.007155   0.010753
5   1.79971  0.025629  0.007629   0.011898
5   2.19974  0.028681  0.007629   0.011898

【讨论】：