【发布时间】:2019-07-27 18:22:25
【问题描述】:
我想从Myfile.txt 文件中删除这些行,如果该行仅包含并且仅包含停用词中的任何一个
例如Myfile.txt文件的样本是
Adh Dhayd
Abu Dhabi is # here is "is" stopword but this line should not be removed because line contain #Abu Dhabi is
Zaranj
of # this line contains just stop word, this line should be removed
on # this line contains just stop word, this line should be removed
Taloqan
Shnan of # here is "of" stopword but this line should not be removed because line contain #Shnan of
is # this line contains just stop word, this line should be removed
Shibirghn
Shahrak
from # this line contains just stop word, this line should be removed
我以这段代码为例
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
example_sent = "This is a sample sentence, showing off the stop words filtration."
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
filtered_sentence = [w for w in word_tokens if not w in stop_words]
filtered_sentence = []
for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)
print(word_tokens)
print(filtered_sentence)
那么根据上面提到的Myfile.txt 的解决方案代码是什么。
【问题讨论】:
标签: python python-3.x text nltk stop-words