【发布时间】:2020-11-30 00:14:57
【问题描述】:
考虑我必须对以下数据应用 CountVectorizer():
words = [
'A am is',
'This the a',
'the am is',
'this a am',
]
我做了以下事情:
from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(corpus)
print(X.toarray())
它返回以下内容:
[[1 1 0 0]
[0 0 1 1]
[1 1 1 0]
[1 0 0 1]]
供参考print(vectorizer.get_feature_names())打印['am', 'is', 'the', 'this']
为什么没有读取“a”??
CountVectorizer()中是不是单字母单词不算单词
【问题讨论】:
标签: python machine-learning scikit-learn countvectorizer