【发布时间】:2017-09-11 07:47:22
【问题描述】:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
您好,我有以下 cmets 列表:
comments = ['I am very agry','this is not interesting','I am very happy']
这些是对应的标签:
sents = ['angry','indiferent','happy']
我正在使用 tfidf 对这些 cmets 进行矢量化,如下所示:
tfidf_vectorizer = TfidfVectorizer(analyzer='word')
tfidf = tfidf_vectorizer.fit_transform(comments)
from sklearn import preprocessing
我正在使用标签编码器对标签进行矢量化:
le = preprocessing.LabelEncoder()
le.fit(sents)
labels = le.transform(sents)
print(labels.shape)
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.model_selection import train_test_split
with open('tfidf.pickle','wb') as idxf:
pickle.dump(tfidf, idxf, pickle.HIGHEST_PROTOCOL)
with open('tfidf_vectorizer.pickle','wb') as idxf:
pickle.dump(tfidf_vectorizer, idxf, pickle.HIGHEST_PROTOCOL)
这里我使用被动攻击来拟合模型:
clf2 = PassiveAggressiveClassifier()
with open('passive.pickle','wb') as idxf:
pickle.dump(clf2, idxf, pickle.HIGHEST_PROTOCOL)
with open('passive.pickle', 'rb') as infile:
clf2 = pickle.load(infile)
with open('tfidf_vectorizer.pickle', 'rb') as infile:
tfidf_vectorizer = pickle.load(infile)
with open('tfidf.pickle', 'rb') as infile:
tfidf = pickle.load(infile)
在这里,我尝试使用三个新的 cmets 及其相应的标签来测试部分拟合的用法,如下所示:
new_comments = ['I love the life','I hate you','this is not important']
new_labels = [1,0,2]
vec_new_comments = tfidf_vectorizer.transform(new_comments)
print(clf2.predict(vec_new_comments))
clf2.partial_fit(vec_new_comments, new_labels)
问题是部分拟合后我没有得到正确的结果,如下所示:
print('AFTER THIS UPDATE THE RESULT SHOULD BE 1,0,2??')
print(clf2.predict(vec_new_comments))
但是我得到了这个输出:
[2 2 2]
因此,我非常感谢您提供的支持,如果我使用与过去训练过的相同示例对其进行测试,那么为什么模型没有更新,所需的输出应该是:
[1,0,2]
感谢您对调整超参数以查看所需输出的支持。
这是完整的代码,显示部分拟合:
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
import sys
from sklearn.metrics.pairwise import cosine_similarity
import random
comments = ['I am very agry','this is not interesting','I am very happy']
sents = ['angry','indiferent','happy']
tfidf_vectorizer = TfidfVectorizer(analyzer='word')
tfidf = tfidf_vectorizer.fit_transform(comments)
#print(tfidf.shape)
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit(sents)
labels = le.transform(sents)
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.model_selection import train_test_split
with open('tfidf.pickle','wb') as idxf:
pickle.dump(tfidf, idxf, pickle.HIGHEST_PROTOCOL)
with open('tfidf_vectorizer.pickle','wb') as idxf:
pickle.dump(tfidf_vectorizer, idxf, pickle.HIGHEST_PROTOCOL)
clf2 = PassiveAggressiveClassifier()
clf2.fit(tfidf, labels)
with open('passive.pickle','wb') as idxf:
pickle.dump(clf2, idxf, pickle.HIGHEST_PROTOCOL)
with open('passive.pickle', 'rb') as infile:
clf2 = pickle.load(infile)
with open('tfidf_vectorizer.pickle', 'rb') as infile:
tfidf_vectorizer = pickle.load(infile)
with open('tfidf.pickle', 'rb') as infile:
tfidf = pickle.load(infile)
new_comments = ['I love the life','I hate you','this is not important']
new_labels = [1,0,2]
vec_new_comments = tfidf_vectorizer.transform(new_comments)
clf2.partial_fit(vec_new_comments, new_labels)
print('AFTER THIS UPDATE THE RESULT SHOULD BE 1,0,2??')
print(clf2.predict(vec_new_comments))
但是我得到了:
AFTER THIS UPDATE THE RESULT SHOULD BE 1,0,2??
[2 2 2]
【问题讨论】:
-
你如何适应
clf2。请将整个代码作为一个代码 sn-p 发布。现在一次又一次地复制粘贴非常烦人。 -
@VivekKumar 我已经更新了问题,我添加了完整的代码来重现我的问题,感谢支持
标签: machine-learning scikit-learn