数据框中的错误输出：分别复制每一行的计算答案

【问题标题】：Wrong ouput in dataframe: Copying a calculation for every row separately数据框中的错误输出：分别复制每一行的计算
【发布时间】：2018-12-08 13:49:30
【问题描述】：

我正在寻找对以下函数的修改，以便对每一行进行自相关计算，而不仅仅是第一行。

下面是我使用的函数：

import pandas as pd
import numpy as np
df = pd.read_excel("directory\\file.xlsx")

def autocorr(x, t):
     y = np.corrcoef(np.array([x[0:len(x)-t], x[t:len(x)]]))
     return y

df1 = df.copy(deep=True) 

for index,row in df1.iterrows():
     df1["output1"] = autocorr(df.T[0], 1)[0, 1]
     df1["output2"]= autocorr(df.T[0], 2)[0, 1]
     df1["output3"]= autocorr(df.T[0], 3)[0, 1]
     df1["output4"]= autocorr(df.T[0], 4)[0, 1]
     df1["output5"]= autocorr(df.T[0], 5)[0, 1]
     df1["output6"]= autocorr(df.T[0], 6)[0, 1]
     df1["output7"]= autocorr(df.T[0], 7)[0, 1]
     df1["output8"]= autocorr(df.T[0], 8)[0, 1]
     df1["output9"]= autocorr(df.T[0], 9)[0, 1]
     df1["output10"]= autocorr(df.T[0], 10)[0, 1]
     df1["output11"]= autocorr(df.T[0], 11)[0, 1]
     df1["output12"]= autocorr(df.T[0], 12)[0, 1]



df1

但它一直给出以下结果（所以第一行的结果，复制到第二、第三、..行）：

我什么都试过了，但我不能为每一行单独做。

【问题讨论】：

标签： python python-3.x pandas math correlation

【解决方案1】：

您是否尝试过使用 Pandas 的内置 autocorr 功能？

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.autocorr.html

import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([np.arange(1,10), np.arange(10, 1, -1)]), index=['a', 'b'])
     df
    0  1  2  3  4  5  6  7  8
a   1  2  3  4  5  6  7  8  9
b  10  9  8  7  6  5  4  3  2
df.loc['a'].autocorr(lag=1)

或df.T['a'].autocorr(lag=1)

在您的代码中，看起来您每次都发送相同的行。

df.loc[0] == df.T[0]  # The first row of the DataFrame

您正在迭代 DataFrame 的行，但没有使用您的迭代器

autocorr(df.T[0], 1)[0, 1]

如果您更喜欢使用您的功能，请尝试将其更改为

autocorr(row, 1)[0, 1]

您可以使用：

row.autocorr(lag=t)

由于您正在遍历 DataFrame 的行，因此 index 变量将保存来自 DF 的行的索引，而 row 变量将保存整行的 Series 类型。

另一个问题是：

df1['outpu1'] = value

这样您就可以为整个列分配值。
如果该列已存在，则可以使用 loc：

df.loc[row_index, col_index] = value

如果该列不存在，您可以先计算整个列并将其保存为一个系列，然后分配给整个列表，或者在运行循环之前添加它

df.insert(loc=0, column='output1' value='np.nan')

【讨论】：

我尝试在我的代码中使用 autocorr(row, 1)[0, 1] 和 row.autocorr(lag=t)，但它仍然为每一行提供相同的值..
我已经编辑了我的答案，忘了提到 df1['outpu1'] = value 会将值分配给整个列
我就是不明白。我现在尝试通过更改 autocorr(row,1)[0,1] 并在开始该行之前进行调整：对于索引...，我现在输入： df.insert(loc=0, column='output1' value= 'np.nan') 因为它是我要添加的新列。这仍然给了我相同的价值.....
你是不是也改了行：df1['outpu1'] = autocorr(row,1)[0,1] into df1.loc[index, 'outpu1'] = autocorr(row,1 )[0,1] ?