【发布时间】:2021-01-26 00:54:43
【问题描述】:
我在 pandas 列中有一个句子列表:
sentence
I am writing on Stackoverflow because I cannot find a solution to my problem.
I am writing on Stackoverflow.
I need to show some code.
Please see the code below
我想通过它们进行一些文本挖掘和分析,例如获取词频。 为此,我正在使用这种方法:
from sklearn.feature_extraction.text import CountVectorizer
# list of text documents
text = ["I am writing on Stackoverflow because I cannot find a solution to my problem."]
vectorizer = CountVectorizer()
# tokenize and build vocab
vectorizer.fit(text)
如何将它应用到我的专栏中,在构建词汇表后删除多余的停用词?
【问题讨论】:
标签: python pandas scikit-learn nlp countvectorizer