【发布时间】:2021-12-31 06:48:16
【问题描述】:
我正在制作一个制作词云的程序。我想要一个没有标点符号和常用词的单词列表。我删除了标点符号,使用函数removepunc;它工作正常。现在我正在创建第二个函数来删除常用词(我没有使用以前的逻辑,因为它从程序中删除了字母 I 以及代词 I),我收到错误 IndexError: list index out of range,我将文件转换为列表。
代码:
def removepunc(z):
test_str=z
punc = '''!()-[]{};:'""\,<>./?@#$%^&*_~'''
for ele in test_str:
if ele in punc:
test_str = test_str.replace(ele, "")
return test_str
def removebad(f):
print(type(f))
z=[]
badword2 = ["the", "a", "to", "if", "is", "it", "of", "and", "or", "an", "as", "i", "me",
"my","we", "our", "ours", "you", "your", "yours", "he", "she", "him", "his", "her",
"hers", "its", "they","them","their", "what", "which", "who", "whom", "this", "that",
"am", "are", "was", "were", "be", "been","being","have", "has", "had", "do", "does",
"did", "but", "at", "by", "with", "from", "here", "when", "where","how","all", "any",
"both", "each", "few", "more", "some", "such", "no", "nor", "too", "very", "can",
"will","just"]
for i in range (len(f)-1):
if f[i] in badword2:
x=f.pop(i)
z.append(x)
else:
continue
return f
file=open("openfile.txt")
a=file.read()
a=a.lower()
unqword=removepunc(a)
ab=unqword.split()
print(type(ab))
unqword1=removebad(ab)
print(unqword1)
`
输出:
C:\Users\Nitin\PycharmProjects\pythonProject1\venv\Scripts\python.exe C:/Users/Nitin/PycharmProjects/pythonProject1/prjt.py
<class 'list'>
<class 'list'>
Traceback (most recent call last):
File "C:/Users/Nitin/PycharmProjects/pythonProject1/prjt.py", line 29, in <module>
unqword1=removebad(ab)
File "C:/Users/Nitin/PycharmProjects/pythonProject1/prjt.py", line 14, in removebad
if f[i] in badword2:
IndexError: list index out of range
Process finished with exit code 1
我还没有为 wordcloud 编写逻辑,稍后我会在摆脱它时执行此操作
【问题讨论】:
-
您无法在迭代列表时对其进行修改。
range对象在for循环开始时创建一次,并且将在那时遍历列表的长度。当您弹出项目时,您会使列表更短,但范围不知道这一点。在这种情况下,最好的计划就是创建两个全新的列表:一个是要保留的,一个是要扔掉的。那么你甚至不必使用range(len()),你可以使用for item in f:。
标签: python python-3.x list