Python - 文件名不正确 - 循环错误答案

【问题标题】：Python - name for file not correct - loop errorPython - 文件名不正确 - 循环错误
【发布时间】：2018-05-10 13:49:43
【问题描述】：

我有一个脚本，它读取一个 html 文件并从这个文件中提取相关的行。但我在打印文件名时遇到问题。文件名是 source1.html source2.html 和 source3.html。而是打印 source2.html source3.html, source4.html。

from bs4 import BeautifulSoup
import re
import os.path

n = 1
filename = "source"+str(n)+".html"
savefile = open('OUTPUT.csv', 'w')

while os.path.isfile(filename):
    n = n+1
    strjpgs = "Extracted Layers: \n \n"
    file = open(filename, "r")
    filename = "source"+str(n)+".html"


    soup = BeautifulSoup (file, "html.parser")

    thedata = soup.find("div", class_="cplayer")
    strdata = str(thedata)

    DoRegEx = re.compile('/([^/]+)\.jpg')
    jpgs = DoRegEx.findall(strdata)
    strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
    savefile.write(filename + '\n')
    savefile.write(strjpgs)

    print(filename)
    print(strjpgs)

savefile.close()
print "done"

【问题讨论】：

在循环结束而不是开始时增加 n（并更新文件名）。

标签： python python-2.7 loops

【解决方案1】：

您将 n 定义为 1，然后在 WHILE 中立即将其增加到 2。当您到达 print(filename) 时，n 为 2，并且文件名已更改为“Source2.html”。移动打印或移动变量增量。

【讨论】：

就是这样。只需在执行 n += 1（或 n = n+1）之前打开文件。

【解决方案2】：

您只需要在循环开始时移动 print 语句，并在结束时移动增量，以进行下一次迭代：

while os.path.isfile(filename):
    print(filename)

    strjpgs = "Extracted Layers: \n \n"
    file = open(filename, "r")
    filename = "source"+str(n)+".html"
    soup = BeautifulSoup (file, "html.parser")

    thedata = soup.find("div", class_="cplayer")
    strdata = str(thedata)

    DoRegEx = re.compile('/([^/]+)\.jpg')
    jpgs = DoRegEx.findall(strdata)
    strjpgs = strjpgs + "\n".join(jpgs) + "\n \n"
    savefile.write(filename + '\n')
    savefile.write(strjpgs)

    n = n+1
    print(strjpgs)

【讨论】：

他不需要移动他的 open() 函数吗？
我完全理解你的思路，它会解决问题，应该打印出他想要的，但我相信 OP 可能计划收集更多信息，如果 n 的值是错误的，但你只是在增加它之前打印正确的值，那么它对其他功能就会有问题。

【解决方案3】：

您犯了一个逻辑错误，因为您在存储变量 n 之前对其进行了递增。最简单的解决方案是将变量定义为 0 而不是 1。下一个错误是您从不关闭 html 文件，因此使用 with open("filename", 'w') as file: 这会关闭您的文件超出范围时自动进行，并且更加 Pythonic。

from bs4 import BeautifulSoup
import re
import os.path

n = 1
filename = "source"+str(n)+".html"
savefile = open('OUTPUT.csv', 'w')

if os.path.isfile(filename):

    strjpgs = "Extracted Layers: \n \n"
    while True:
        with open(filename, "r") as file:
            filename = "source"+str(n)+".html"

            # parsing things...

            savefile.write(filename + '\n')
            savefile.write(strjpgs)

            print(filename)
            print(strjpgs)

        if filename == "source3.html":
            break
        else:
            n+=1

savefile.close()
print ("done")

【讨论】：