【发布时间】:2017-09-24 12:51:37
【问题描述】:
从语料库导入停用词后,我从 nltk.download() 下载了所有文件,然后
#reading from a .txt file
list = []
with open("positive.txt", "r") as file:
for words in file:
words = words.strip()
list.append(words)
#tokenizing words
pos_words = []
for i in list:
pos_words.append(word_tokenize(i))
stop_words = [stopwords.words('english')]
print(stop_words)
final_pos_words = []
for i in pos_words:
if i not in stop_words:
final_pos_words.append(i)
print(final_pos_words)
但这并没有删除任何内容 运行后:
final_pos_words = []
for i in pos_words:
if i in stop_words:
final_pos_words.append(i)
print(final_pos_words)
输出是[]
【问题讨论】:
-
试试:
stop_words = set(stopwords.words('english')) -
我试过了,但它总是给出 TypeError: unhashable type: 'list', at this line- if i in stop_words: under final_pos_words
-
您的错误在这里:
pos_words.append(word_tokenize(i))。word_tokenize()方法返回一个列表(可能是一个单词),所以pos_words包含列表,而不是单词。
标签: python nltk stop-words