【发布时间】:2020-09-02 16:03:52
【问题描述】:
我需要帮助来重塑我的输入以匹配我的输出。
我想创建一个模型,对“所有信息”信息进行矢量化和分类,以便标签“Fall”可以分为 0 和 1。 但是,我不断收到 [ValueError: Found input variables with contrast numbers of samples: [2552, 1]] 错误。 “形状”看起来不错,但我不知道如何修复它。
## Linear Regression
import pandas as pd
import numpy as np
from tqdm import tqdm
#instance->fit->predict
from sklearn.linear_model import LinearRegression
model=LinearRegression(fit_intercept=True)
data=pd.read_csv("Fall_test_0826.csv", encoding='cp949', header=0)
data.head(2)
X=data.drop(["fall"], axis=1)
y= data.fall
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state = 0)
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf_vect=TfidfVectorizer()
tfidf_vect.fit(X_train)#단어사전 만듬
X_train_tfidf_vect = tfidf_vect.fit_transform(X_train['All information']).toarray()
X_test_tfidf_vect = tfidf_vect.transform(X_test)
lr_clf=LinearRegression()
lr_clf.fit(X_train_tfidf_vect, y_train)
pred = lr_clf.predict(X_test_tfidf_vect)
from sklearn.metrics import accuracy_score
print('Logisitic Regression _ {0:.3f}'.format(accuracy_score(y_test, pred)))
错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-85-bec6ead862c8> in <module>
----> 1 print('{0:.3f}'.format(accuracy_score(y_test, pred)))
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
71 FutureWarning)
72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 73 return f(**kwargs)
74 return inner_f
75
~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
185
186 # Compute accuracy for each possible representation
--> 187 y_type, y_true, y_pred = _check_targets(y_true, y_pred)
188 check_consistent_length(y_true, y_pred, sample_weight)
189 if y_type.startswith('multilabel'):
~\anaconda3\lib\site-packages\sklearn\metrics\_classification.py in _check_targets(y_true, y_pred)
79 y_pred : array or indicator matrix
80 """
---> 81 check_consistent_length(y_true, y_pred)
82 type_true = type_of_target(y_true)
83 type_pred = type_of_target(y_pred)
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in check_consistent_length(*arrays)
254 uniques = np.unique(lengths)
255 if len(uniques) > 1:
--> 256 raise ValueError("Found input variables with inconsistent numbers of"
257 " samples: %r" % [int(l) for l in lengths])
258
ValueError: Found input variables with inconsistent numbers of samples: [2552, 1]
【问题讨论】:
-
旁注:这里只需要
fit,tfidf_vect.fit_transform(X_train['All information']).toarray(),而不是fit_transform -
你能分享输入数据框/数组的形状吗?
-
您能否编辑您的问题以显示您正在使用的 csv 或 y_test 和 pred 的形状?
-
@yatu 是的,我插入了一张图片。你会检查它吗?
-
@DiegoRueda 当然。我插入了一张与您的建议相关的图片
标签: scikit-learn