【发布时间】:2019-11-11 01:26:10
【问题描述】:
我的代码运行良好
df_amazon = pd.read_csv ("datasets/amazon_alexa.tsv", sep="\t")
X = df_amazon['variation'] # the features we want to analyze
ylabels = df_amazon['feedback'] # the labels, or answers, we want to test against
X_train, X_test, y_train, y_test = train_test_split(X, ylabels, test_size=0.3)
# Create pipeline using Bag of Words
pipe = Pipeline([('cleaner', predictors()),
('vectorizer', bow_vector),
('classifier', classifier)])
pipe.fit(X_train,y_train)
但如果我尝试在模型中再添加 1 个功能,则替换
X = df_amazon['variation']
通过
X = df_amazon[['variation','verified_reviews']]
当我致电fit 时,我收到来自 Sklearn 的错误消息:
ValueError: 发现样本数量不一致的输入变量:[2, 2205]
所以fit 在X_train 和y_train 具有形状时起作用
(2205,) 和 (2205,)。
但不是当形状更改为 (2205, 2) 和 (2205,)。
最好的办法是什么?
【问题讨论】:
-
你用过Countvectorizer吗????
-
是的,我做到了。也许问题可能与管道有关。
标签: python pandas machine-learning scikit-learn dataset