【发布时间】:2021-06-23 21:24:35
【问题描述】:
X = df.drop(columns="Math")
y = df.iloc[:, 4]
theta = np.array([0]*len(X.columns))
def hypothesis(theta, X):
return theta*X
def computeCost(X, y, theta):
y1 = hypothesis(theta, X)
y1=np.sum(y1, axis=1)
return sum(np.sqrt((y1-y)**2))/(2*47)
def gradientDescent(X, y, theta, alpha, i):
J = [] #cost function in each iterations
k = 0
while k < i:
y1 = hypothesis(theta, X)
y1 = np.sum(y1, axis=1)
for c in range(0, len(X.columns)):
theta[c] = theta[c] - alpha*(sum((y1-y)*X.iloc[:,c])/len(X))
j = computeCost(X, y, theta)
J.append(j)
k += 1
return J, j, theta
J, j, theta = gradientDescent(X, y, theta, 0.05, 10000)
数据集由五列组成。第一个是偏置项的列。直到最后的第二个是 int64,由 1-100 的数值组成。第二个字段代表物理分数,第三个字段代表科学分数,第四个字段代表统计分数,而最后一个字段代表数学分数。我正在尝试使用第 1 列直到第 4 列来预测第 5 列(数学)
会出现如下错误:
OverflowError Traceback (most recent call last)
<ipython-input-26-d17a8fb83984> in <module>()
----> 1 J, j, theta = gradientDescent(X, y, theta, 0.05, 10000)
<ipython-input-25-bfec0d0edcfa> in gradientDescent(X, y, theta, alpha, i)
6 y1 = np.sum(y1, axis=1)
7 for c in range(0, len(X.columns)):
----> 8 theta[c] = theta[c] - alpha*(sum((y1-y)*X.iloc[:,c])/len(X))
9 j = computeCost(X, y, theta)
10 J.append(j)
OverflowError: Python int too large to convert to C
【问题讨论】:
标签: python machine-learning integer regression gradient-descent