【发布时间】:2019-02-10 02:19:09
【问题描述】:
这是我的代码,任何人有任何想法有什么问题吗?当我调用fit 时发生错误,
import pandas as pd
import numpy as np
from sklearn.ensemble import (RandomTreesEmbedding, RandomForestClassifier,
GradientBoostingClassifier)
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
n_estimators = 10
d = {'f1': [1, 2], 'f2': ['foo goo', 'goo zoo'], 'target':[0, 1]}
df = pd.DataFrame(data=d)
X_train, X_test, y_train, y_test = train_test_split(df, df['target'], test_size=0.1)
X_train['f2'] = CountVectorizer().fit_transform(X_train['f2'])
X_test['f2'] = CountVectorizer().fit_transform(X_test['f2'])
grd = GradientBoostingClassifier(n_estimators=n_estimator, max_depth=10)
grd.fit(X_train.values, y_train.values)
【问题讨论】:
-
我认为问题出在
CountVectorizer。返回一个稀疏矩阵(然后你混合了两种类型的矩阵)尝试转换它.to dense()。 -
@Lucas,它有效!酷!
标签: python pandas dataframe scikit-learn gradient-descent