【问题标题】:Trouble implementing Bernoulli Naive Bayes Classifier执行伯努利朴素贝叶斯分类器的麻烦
【发布时间】:2018-06-11 15:55:15
【问题描述】:

我正在尝试从scikit-learn 库中实现一个Bernoulli Naive Bayes 分类器来进行文本分类。但我被这个错误困住了

ValueError:预期的二维数组,得到一维数组:

如果您的数据具有单个特征,则使用 array.reshape(-1, 1) 重塑您的数据,如果数据包含单个样本,则使用 array.reshape(1, -1)。

详细错误

Traceback (most recent call last):
  File "BNB.py", line 27, in <module>
    clf.fit(train_data, train_labels)
  File "/home/atinesh/.local/lib/python3.6/site-packages/sklearn/naive_bayes.py", line 579, in fit
    X, y = check_X_y(X, y, 'csr')
  File "/home/atinesh/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 573, in check_X_y
    ensure_min_features, warn_on_dtype, estimator)
  File "/home/atinesh/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 441, in check_array
    "if it contains a single sample.".format(array))
ValueError: Expected 2D array, got 1D array instead:
array=['Apple' 'Banana' 'Cherry' 'Grape' 'Guava' 'Lemon' 'Mangos' 'Orange'
 'Strawberry' 'Watermelon' 'Potato' 'Spinach' 'Carrot' 'Onion' 'Cabbage'
 'Barccoli' 'Tomatoe' 'Pea' 'Cucumber' 'Eggplant'].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

“BNB.py”

from sklearn.naive_bayes import BernoulliNB

dataPos = ['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 'Lemon', 'Mangos',
            'Orange', 'Strawberry', 'Watermelon']

dataNeg = ['Potato', 'Spinach', 'Carrot', 'Onion', 'Cabbage', 'Barccoli', 
            'Tomatoe', 'Pea', 'Cucumber', 'Eggplant']

def get_data():
    examples = []
    labels   = []

    for item in dataPos:
        examples.append(item)
        labels.append('positive')

    for item in dataNeg:
        examples.append(item)
        labels.append('negative')

    return examples, labels

train_data, train_labels = get_data()

# Train
clf = BernoulliNB()
clf.fit(train_data, train_labels)

# Predict
print(clf.predict('Apple Banana'))
print(clf.predict_proba('Apple Banana'))

【问题讨论】:

    标签: python scikit-learn text-classification naivebayes


    【解决方案1】:

    我建议在 sklearn 中使用 LabelBinarizer

    from sklearn.naive_bayes import BernoulliNB
    import numpy as np
    from sklearn import preprocessing
    
    dataPos = ['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 'Lemon', 'Mangos',
                           'Orange', 'Strawberry', 'Watermelon']
    
    dataNeg = ['Potato', 'Spinach', 'Carrot', 'Onion', 'Cabbage', 'Barccoli',
                           'Tomatoe', 'Pea', 'Cucumber', 'Eggplant']
    
    Y=[0]*10+[1]*10
    Y=np.array(Y)
    
    lb = preprocessing.LabelBinarizer()
    X = lb.fit_transform(dataPos+dataNeg)
    clf = BernoulliNB()
    clf.fit(X, Y)
    
    test_sample = lb.transform([['Apple'],['Banana'],['Spinach']])
    print clf.predict(test_sample)
    

    您的代码出错了,因为在执行clf.fit(X,Y) 时,X 需要是二维数组。每行对应一个特征向量。

    【讨论】:

    • 一切正常,谢谢。但是收到此警告消息&lt;path&gt;/lib/python3.6/site-packages/sklearn/utils/validation.py:578: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel(). y = column_or_1d(y, warn=True)
    • @OliviaBrown,我编辑了我的解决方案。它不需要重塑 Y
    【解决方案2】:

    如果您将简单的 python 列表传递给 scikit_learn,它将被解释为形状 (n, ) 的数组。您可能想要做的是将示例和标签的列表转换为 numpy 数组,并将它们重新整形/调整大小为形状为 (n, 1) 的线向量。 例如:

    import numpy as np
    
    examples = np.array(['Apple', 'Banana', 'Cherry', 'Grape', 'Guava', 'Lemon', 'Mangos','Orange', 'Strawberry', 'Watermelon'])
    examples.shape  # returns (10, ), a 1D-array
    examples.resize((10,1))
    examples.shape  # returns (10, 1), which is a 2-D array
    

    或者对于更简单的解决方案,您可以简单地提供 fit 方法:

    clf.fit([train_data], [train_labels])
    

    但既然您已经有一个专门的方法来格式化数据,为什么不在其中使用 numpy 并返回具有正确尺寸的列表。

    希望这对您的努力有所帮助。

    【讨论】:

      猜你喜欢
      • 2016-02-14
      • 2017-03-19
      • 2013-03-21
      • 2016-02-26
      • 2019-03-22
      • 2017-01-10
      • 2012-07-02
      相关资源
      最近更新 更多