将线性回归解决方案放在一起答案

【问题标题】：Putting a Linear regression solution together将线性回归解决方案放在一起
【发布时间】：2015-11-18 03:14:47
【问题描述】：

https://gist.github.com/marcelcaraciolo/1321585

从这段代码中，我试图找到我当前在 numpy 数组中拥有的数据集的 theta 系数。我已将训练数组保存到名为“foo.csv”的 csv 中。我通过使用 pandas 库从不同的 csv 文件转换了我的代码，目前我的训练集是 10886 行 x 12 列。我的第一列是我希望预测的 Y 或值，所有其他列都是我希望获得 theta 值的变量。

这应该意味着我最终得到一个 12 x 1 的 theta 值矩阵，因为有 12 个因变量。

现在我对 Python 比较陌生。我目前正在运行 iPython，并希望输入我的测试数组，因为我将它保存为名为“foo.csv”的 csv 文件。我希望能够编写 [1] MVLR.calctheta(foo.csv) 并使输出为 12 x 1 矩阵。但我不明白。我不断收到：

AttributeError: 'module' object has no attribute 'calctheta'

但我已经清楚地将 calctheta 保存为一个函数，我不明白为什么我不能调用它。我是否错误地声明了此方法？我假设我可以评估 theta 值，然后运行一个 for 循环，以便使用这些 theta 值和因变量评估每个测试行。

我遇到的问题是我从上面的 github 更改的这个 calctheta 函数。我想要它，所以我可以用

的 csv 文件调用 calctheta

def calctheta(name):
    data = genfromtxt (name, delimiter=",")
    y = data[:,0]
    X = data[:,1:11]


    #number of training samples
    m = y.size

    y.shape = (m, 1)

    #Scale features and set them to zero mean
    x, mean_r, std_r = feature_normalize(X)

    #Add a column of ones to X (interception data)
    it = ones(shape=(m, 12))
    it[:, 1:12] = x

    #Some gradient descent settings
    iterations = 100
    alpha = 0.01

    #Init Theta and Run Gradient Descent
    theta = zeros(shape=(11, 1))

    theta, J_history = gradient_descent(it, y, theta, alpha, iterations)
    print theta
    plot(arange(iterations), J_history)
    xlabel('Iterations')
    ylabel('Cost Function')
    show()

另一方面，对于这个多变量线性回归问题，有许多因变量。我的一些变量是在 0 的等级尺度上确定的 --> 有多少选项。

例如如果该列选择有 3 个选项，则分布由训练集确定，但对于其他列，它是原始值，所以平均值就是那个（例如，它是一个温度列）

我的问题是，在计算 theta 值时，变量对选项的排名不同这一事实并不妨碍使用多变量线性回归。如果我们假设您尝试测量的最终事物相对于其输入呈正态分布，我认为它不会。

编辑：

我将此添加到我的代码顶部，并在我的代码的其余部分缩进：

class MVLR:

我现在明白了

NameError: name 'calctheta' is not defined

编辑 2：

我的代码

类 MVLR：

from numpy import loadtxt, zeros, ones, array, genfromtxt, linspace, logspace, mean, std, arange
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from pylab import plot, show, xlabel, ylabel

#Evaluate the linear regression

def __init__(self, name):
    self.name = name

def feature_normalize(self.X):
    mean_r = []
    std_r = []
    X_norm = X
    n_c = X.shape[1]
    for i in range(n_c):
        m = mean(X[:, i])
        s = std(X[:, i])
        mean_r.append(m)
        std_r.append(s)
        X_norm[:, i] = (X_norm[:, i] - m) / s
    return X_norm, mean_r, std_r


def compute_cost(self, X, y, theta):
    '''
    Comput cost for linear regression
    '''
    #Number of training samples
    m = y.size

    predictions = X.dot(theta)

    sqErrors = (predictions - y)

    J = (1.0 / (2 * m)) * sqErrors.T.dot(sqErrors)

    return J


def gradient_descent(self, X, y, theta, alpha, num_iters):
    '''
    Performs gradient descent to learn theta
    by taking num_items gradient steps with learning
    rate alpha
    '''
    m = y.size
    J_history = zeros(shape=(num_iters, 1))

    for i in range(num_iters):

        predictions = X.dot(theta)

        theta_size = theta.size

        for it in range(theta_size):

            temp = X[:, it]
            temp.shape = (m, 1)

            errors_x1 = (predictions - y) * temp

            theta[it][0] = theta[it][0] - alpha * (1.0 / m) * errors_x1.sum()

        J_history[i, 0] = compute_cost(X, y, theta)

    return theta, J_history

#Load the dataset



#Plot the data
'''
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
n = 100
for c, m, zl, zh in [('r', 'o', -50, -25)]:
    xs = data[:, 0]
    ys = data[:, 1]
    zs = data[:, 2]
    ax.scatter(xs, ys, zs, c=c, marker=m)
ax.set_xlabel('Size of the House')
ax.set_ylabel('Number of Bedrooms')
ax.set_zlabel('Price of the House')
plt.show()
'''

def calctheta(self, name):
    data = genfromtxt (name, delimiter=",")
    y = data[:,0]
    X = data[:,1:11]


    #number of training samples
    m = y.size

    y.shape = (m, 1)

    #Scale features and set them to zero mean
    x, mean_r, std_r = feature_normalize(X)

    #Add a column of ones to X (interception data)
    it = ones(shape=(m, 12))
    it[:, 1:12] = x

    #Some gradient descent settings
    iterations = 100
    alpha = 0.01

    #Init Theta and Run Gradient Descent
    theta = zeros(shape=(11, 1))

    theta, J_history = gradient_descent(it, y, theta, alpha, iterations)
    print theta
    plot(arange(iterations), J_history)
    xlabel('Iterations')
    ylabel('Cost Function')
    show()

【问题讨论】：

你能发布整个文件吗？为了做类似'MVLR.calctheta(foo.cv)'的事情，你实际上需要一个名为MVLR的类，它有方法calctheta。可悲的是，仅将方法放在文件中是不够的。
我将它添加到我的代码顶部，并将我的代码的其余部分缩进：class MVLR: I'm now getting NameError: name 'calctheta' is not defined
源文件的名称是什么？如果是 multlin.py，你应该可以在 IPython 中输入：import multlin; multlin.calctheta('foo.csv')。
@yuzeh 这就是我所做的，我仍然得到相同的 AttributeError: 'module' object has no attribute 'calctheta'

标签： python arrays csv numpy pandas

【解决方案1】：

您应该考虑使用类来设计您的代码。你可以让你的文件看起来像这样（部分代码取自你的问题）：

from numpy import loadtxt, zeros, ones, array, genfromtxt, linspace, logspace, mean, std, arange
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from pylab import plot, show, xlabel, ylabel

class MyClass(object):
    def __init__(self, name):
        self.name = name

    def calculate_theta(self, name):
       # code calculating theta here
       return theta

    def feature_normalize(self.X):
        mean_r = []
        std_r = []
        X_norm = X
        n_c = X.shape[1]
        for i in range(n_c):
           m = mean(X[:, i])
           s = std(X[:, i])
           mean_r.append(m)
           std_r.append(s)
           X_norm[:, i] = (X_norm[:, i] - m) / s
        return X_norm, mean_r, std_r

if __name__ == '__main__':
    my_class = MyClass(some_input_x)
    my_class.calculate_theta(some_input_y)

Here你可以得到一个更好的例子来说明如何创建类。

【讨论】：

我将它添加到我的代码顶部，并将我的代码的其余部分缩进：class MVLR: I'm now getting NameError: name 'calctheta' is not defined
奇怪，我没有收到这个错误。你的代码有没有比你在这里发布的？也许 calctheta 在其他地方被称为？（注意：我对我的回答做了一些小的修改）
所以在终端中，我现在用 Ninja = MVLR('foo.csv') 调用我的函数，它给了我一个 TypeError: 'module' 对象不可调用
嘿@user3042850，感谢您编辑您的问题。我添加了更多代码。我希望这很清楚。您需要在 Class 中包含函数，以便您可以实例化它并运行它的方法。当然，您可以只执行 MyClass.some_method()。另请查看我发布的链接。我希望它有帮助！ ;)
我没有 if _name_ == '_main_': my_class= MyClass(some_input_x) my_class.calculate_theta(some_input_y ) 但我仍然收到属性错误，我不知道为什么 - 我将 calctheta 作为定义的函数，我正在传递它，它是一个参数，我要分析的文件