普通python中的简单线性回归答案

【问题标题】：Simple Linear Regression in plain python普通python中的简单线性回归
【发布时间】：2021-08-02 12:52:06
【问题描述】：

我最近刚开始学习机器学习，学习的是 Coursera 的 Andrew ng 的机器学习。 我尝试在不使用任何 ML 库的情况下在纯 Python 中实现简单的线性回归。这段代码被证明是失败的。成本函数随着循环迭代并达到非常高的值而增加。我在这里做错了什么？

def cost_function(train_set, theta0, theta1):
  total_error = 0
  for i in range(len(train_set)):
    x = train_set[i][0]
    y = train_set[i][1]
    total_error += ((theta0 + theta1 * x) - y) ** 2
  return total_error / 2 * len(train_set)

def gradient_descent(train_set, learning_rate, theta0, theta1):
  theta0_der, theta1_der = 0, 0
  for i in range(len(train_set)):
    x = train_set[i][0]
    y = train_set[i][1]
    theta0_der += ((theta0 + theta1 * x) - y)
    theta1_der += ((theta0 + theta1 * x)- y) * x
  new_theta0 = theta0 - (1/len(train_set) * learning_rate * theta0_der)
  new_theta1 = theta1 - (1/len(train_set) * learning_rate * theta1_der)
  return new_theta0, new_theta1

def main():
  theta0, theta1 = 0, 0
  learning_rate = 0.001
  iterations = 100
  x_train = data_frame.iloc[:,0]
  y_train = data_frame.iloc[:,1]
  train_set = list(zip(x_train, y_train))[:280] # [(1, 2.444), (2, 3.555), (3, 6.444) ..... ]
  print('Initial cost: ' + str(cost_function(train_set, theta0, theta1)))
  for i in range(iterations):
    x = train_set[i][0]
    y = train_set[i][1]
    new_theta0, new_theta1 = gradient_descent(train_set, learning_rate, theta0, theta1)
    theta0 = new_theta0
    theta1 = new_theta1
    print([theta0, theta1])
  print('Final cost: ' + str(cost_function(train_set, theta0, theta1)))

main()

【问题讨论】：

标签： machine-learning linear-regression gradient-descent

【解决方案1】：

您将学习率设置得太高，请尝试将其更改为0.0001。

但是，您可以使用其封闭式方程直接实现简单线性回归：

用python实现这个很容易，你可以这样做：-

class LinearRegression:
    def fit(self, X, y):
        ones = np.ones(len(X)).reshape(-1, 1)
        X = np.concatenate((ones, X), axis=1)

        B = np.matmul(np.linalg.pinv(np.matmul(X.T, X)), np.matmul(X.T, y))

        self.slope = B[1:]
        self.intercept = B[0]

    def predict(self, X):
        self.predicted = np.dot(X, self.slope) + self.intercept
        return self.predicted

fit 函数正在讨论 X 和 y 值并计算 Beta（通过上述公式使用 NumPy）。 Beta是一个矩阵，其中第一个索引值是截距，其余的都是斜率！

预测函数获取二维数组，然后计算预测！

【讨论】：

是的……你是对的。 learningRate 的值非常小。我必须做到0.00001 才能获得最新准确的结果。