【问题标题】:setting an array element with a sequence error in scikit learn GradientBoostingClassifier在 scikit learn GradientBoostingClassifier 中设置序列错误的数组元素
【发布时间】:2019-02-10 02:19:09
【问题描述】:

这是我的代码,任何人有任何想法有什么问题吗?当我调用fit 时发生错误,

import pandas as pd
import numpy as np
from sklearn.ensemble import (RandomTreesEmbedding, RandomForestClassifier,
                              GradientBoostingClassifier)
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer

n_estimators = 10
d = {'f1': [1, 2], 'f2': ['foo goo', 'goo zoo'], 'target':[0, 1]}
df = pd.DataFrame(data=d)
X_train, X_test, y_train, y_test = train_test_split(df, df['target'], test_size=0.1)

X_train['f2'] = CountVectorizer().fit_transform(X_train['f2'])
X_test['f2'] = CountVectorizer().fit_transform(X_test['f2'])

grd = GradientBoostingClassifier(n_estimators=n_estimator, max_depth=10)
grd.fit(X_train.values, y_train.values)

【问题讨论】:

  • 我认为问题出在CountVectorizer。返回一个稀疏矩阵(然后你混合了两种类型的矩阵)尝试转换它.to dense()
  • @Lucas,它有效!酷!

标签: python pandas dataframe scikit-learn gradient-descent


【解决方案1】:

问题出在CountVectorizer:

import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer

d = {'f1': [1, 2], 'f2': ['foo goo', 'goo zoo'], 'target':[0, 1]}
df = pd.DataFrame(data=d)
df['f2'] = CountVectorizer().fit_transform(df['f2'])

df.values 是:

array([[1,
        <2x3 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in Compressed Sparse Row format>,
        0],
       [2,
        <2x3 sparse matrix of type '<class 'numpy.int64'>'
    with 4 stored elements in Compressed Sparse Row format>,
        1]], dtype=object)

我们可以看到我们正在混合稀疏矩阵和密集矩阵。您可以使用以下命令将其转换为稠密:todense():

dense_count = CountVectorizer().fit_transform(df['f2']).todense()

dense_count 类似于:

matrix([[1, 1, 0],
        [0, 1, 1]], dtype=int64)

【讨论】:

    猜你喜欢
    • 2017-09-10
    • 2014-10-18
    • 2013-07-01
    • 2019-01-01
    • 1970-01-01
    • 2021-02-16
    • 2016-01-21
    • 2018-08-15
    • 1970-01-01
    相关资源
    最近更新 更多