【问题标题】:Logistic Regression from scratch tfidf sparce matrix in PythonPython中从头开始的逻辑回归tfidf稀疏矩阵
【发布时间】:2020-02-13 06:56:08
【问题描述】:

我正在尝试从头开始编写逻辑回归并得到以下错误。在进行数据清理和标记化之后,我使用 sklearn 的 tfidfvectorizer 从推文标记创建稀疏 tfidf 矩阵。有人可以帮我解决这个问题吗?

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-36-98e5051d04b6> in <module>()
      3                   fprime=gradient,args=(x, y.values.flatten()))
      4     return opt_weights[0]
----> 5 parameters = fit(X, y, theta)

3 frames
/usr/local/lib/python3.6/dist-packages/scipy/optimize/tnc.py in func_and_grad(x)
    369     else:
    370         def func_and_grad(x):
--> 371             f = fun(x, *args)
    372             g = jac(x, *args)
    373             return f, g

TypeError: cost_function() missing 1 required positional argument: 'y'

代码:

X = tfidf_train
y = train['Sentiment']
theta = np.zeros((X.shape[1], 1))

def sigmoid(x):
    # Activation function used to map any real value between 0 and 1
    return 1 / (1 + np.exp(-x))

def net_input(theta, x):
    # Computes the weighted sum of inputs
    return np.dot(x, theta)

def probability(theta, x):
    # Returns the probability after passing through sigmoid
    return sigmoid(net_input(theta, x))

def cost_function(self, theta, x, y):
    # Computes the cost function for all the training samples
    m = x.shape[0]
    total_cost = -(1 / m) * np.sum(
        y * np.log(probability(theta, x)) + (1 - y) * np.log(
            1 - probability(theta, x)))
    return total_cost

def gradient(self, theta, x, y):
    # Computes the gradient of the cost function at the point theta
    m = x.shape[0]
    return (1 / m) * np.dot(x.T, sigmoid(net_input(theta,   x)) - y)

def fit(x, y, theta):
    opt_weights = fmin_tnc(func=cost_function, x0=theta,
                  fprime=gradient,args=(x, y.values.flatten()))
    return opt_weights[0]
parameters = fit(X, y, theta)

tfidf_train.get_shape

X is <bound method spmatrix.get_shape of <89988x49526 sparse matrix of type '<class 'numpy.float64'>'   with 987177 stored elements in Compressed Sparse Row format>>

y 的形状为 (89988,)

【问题讨论】:

    标签: python machine-learning logistic-regression sentiment-analysis tf-idf


    【解决方案1】:
    TypeError: cost_function() missing 1 required positional argument: 'y'
    

        opt_weights = fmin_tnc(func=cost_function, x0=theta,
    

    fmin_tnc() 的文档没有显示funcself 参数是正确的,所以在def cost_function(self, theta, x, y): 中它只是不合适的;放下它。

    【讨论】:

      猜你喜欢
      • 2015-01-30
      • 2016-09-11
      • 1970-01-01
      • 2017-06-22
      • 1970-01-01
      • 1970-01-01
      • 2018-04-26
      • 2015-04-26
      • 2020-06-14
      相关资源
      最近更新 更多