【问题标题】:problem with closing python pypdf - writing. getting a valueError: I/O operation on closed file关闭 python pypdf 的问题 - 写作。获取 valueError: 对已关闭文件的 I/O 操作
【发布时间】:2011-10-10 01:25:26
【问题描述】:

想不通 此功能(用于将网站抓取为 pdf 的类的一部分)应该使用 pypdf 合并从网页生成的 pdf 文件。

这是方法代码:

def mergePdf(self,mainname,inputlist=0):
    """merging the pdf pages
    getting an inputlist to merge or defaults to the class instance self.pdftomerge list"""
    from pyPdf import PdfFileWriter, PdfFileReader
    self._mergelist = inputlist or self.pdftomerge
    self.pdfoutput = PdfFileWriter()

    for name in self._mergelist:
        print "merging %s into main pdf file: %s" % (name,mainname)
        self._filestream = file(name,"rb")
        self.pdfinput = PdfFileReader(self._filestream)
        for p in self.pdfinput.pages:
            self.pdfoutput.addPage(p)
        self._filestream.close()

    self._pdfstream = file(mainname,"wb")
    self._pdfstream.open()
    self.pdfoutput.write(self._pdfstream)
    self._pdfstream.close()

我不断收到此错误:

  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 264, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 339, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 315, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 339, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 315, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 324, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 339, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 315, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 345, in _sweepIndirectReferences
    newobj = data.pdf.getObject(data)
  File "c:\tmp\easy_install-iik9vj\pyPdf-1.13-py2.7-win32.egg.tmp\pyPdf\pdf.py", line 645, in getObject
    self.stream.seek(start, 0)
ValueError: I/O operation on closed file

但是当我检查 self._pdfstream 的状态时,我得到:

<open file 'c:\python27\learn\dive.pdf', mode 'wb' at 0x013B2020>

我做错了什么?

我会很高兴得到任何帮助

【问题讨论】:

    标签: python pypdf


    【解决方案1】:

    好的,我发现了你的问题。你打电话给file()是对的。千万不要尝试拨打open()

    你的问题是调用self.pdfoutput.write(self._pdfstream)输入文件仍然需要打开,所以你需要删除self._filestream.close()这一行。

    编辑:此脚本将触发问题。第一次写入成功,第二次失败。

    from pyPdf import PdfFileReader as PfR, PdfFileWriter as PfW
    
    input_filename = 'in.PDF' # replace with a real file
    output_filename = 'out.PDF' # something that doesn't exist
    
    infile = file(input_filename, 'rb')
    reader = PfR(infile)
    writer = PfW()
    
    writer.addPage(reader.getPage(0))
    outfile = file(output_filename, 'wb')
    writer.write(outfile)
    print "First Write Successful!"
    infile.close()
    outfile.close()
    
    infile = file(input_filename, 'rb')
    reader = PfR(infile)
    writer = PfW()
    
    writer.addPage(reader.getPage(0))
    outfile = file(output_filename, 'wb')
    infile.close() # BAD!
    
    writer.write(outfile)
    print "You'll get an IOError Before this line"
    outfile.close()
    

    【讨论】:

    • 嘿 agf,正如我所写,我的问题是 self._pdfstream。我改为打开,但这无济于事。当我尝试从 pypdf 写入时,我仍然收到错误,当我检查我仍然得到的对象时 - 。 wtf?!
    • @alonisser 你是对的,调用open() 是错误的!但是您的问题不在于self._pdfstream,而在于输入流。编辑我的答案。
    • 这似乎解决了问题 - 非常感谢!但现在还有另一个问题!我得到相同的长错误字符串和不同的结尾:第 693 行,在 readObjectHeader 返回 int(idnum), int(generation) ValueError: invalid literal for int() with base 10: '' any ideas
    • 听起来你的一个 PDF 有一个应该是整数的字段,但不是。除此之外,您可能需要深入研究 pyPdf 源代码才能找出问题所在。
    • 好的 - 我解决了这个问题。似乎问题在于校准 pypdf 将页面添加到已经存在的文件中 - 将输出文件的名称更改为“output.pdf”之类的名称解决了这个问题。再次感谢@agf 的所有帮助。
    猜你喜欢
    • 1970-01-01
    • 2015-07-20
    • 2013-09-27
    • 2016-08-26
    • 2016-07-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-18
    相关资源
    最近更新 更多