【问题标题】:Why does my coefficients only have 1 dimension?为什么我的系数只有一维?
【发布时间】:2023-03-28 15:52:01
【问题描述】:

我正在尝试对一些评论数据进行情感分析。响应变量是“正面”或“负面”。我运行了我的模型,我的系数只有一维,我相信它应该是两个,因为有两个响应变量。任何帮助都可以弄清楚为什么会这样。

from nltk.corpus import stopwords
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from sklearn.naive_bayes import BernoulliNB
from sklearn import cross_validation
from sklearn.metrics import classification_report
import numpy as np
from sklearn.metrics import accuracy_score
import textblob as TextBlob



#scikit
comments = list(['happy','sad','this is negative','this is positive', 'i like this', 'why do i hate this'])
classes = list(['positive','negative','negative','positive','positive','negative'])


# preprocess creates the term frequency matrix for the review data set
stop = stopwords.words('english')
count_vectorizer = CountVectorizer(analyzer =u'word',stop_words = stop, ngram_range=(1, 3))
comments = count_vectorizer.fit_transform(comments)
tfidf_comments = TfidfTransformer(use_idf=True).fit_transform(comments)


# preparing data for split validation. 60% training, 40% test
data_train,data_test,target_train,target_test = cross_validation.train_test_split(tfidf_comments,classes,test_size=0.2,random_state=43)
classifier = BernoulliNB().fit(data_train,target_train)

classifier.coef_.shape

最后一行打印出 (1L, 6L)。我正在尝试找出负面和正面的信息特征,但由于它是 1L,它会为我提供两种响应的相同信息。

谢谢!

【问题讨论】:

  • 实际上是为我打印的最后一行 (2, 6)。你确定是 (1, 6) 吗?
  • test_size 有错别字,应该是“0.2”,你可以再试一次吗?

标签: python scikit-learn naivebayes


【解决方案1】:

source code for scikit learn preprocessing module 中,LabelBinarizer 类实现了多标签分类的 one-vs-all 方案。您可以在其中看到,如果只存在两个类,它会学习一组系数,这些系数预测样本是否属于“1”类,如果不是,则分类器预测“0”。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2014-11-13
    • 1970-01-01
    • 2015-05-10
    • 2011-02-22
    • 1970-01-01
    • 2021-10-15
    • 1970-01-01
    相关资源
    最近更新 更多