【发布时间】:2020-08-14 00:49:58
【问题描述】:
我尝试合并从 Google Drive 下载的 PDF,但出现此错误:
ValueError: invalid literal for int() with base 10: b'F-1.4'
当我合并使用 Keynote 生成的 PDF 时,不会发生这种情况。
完整的错误如下所示:
Traceback (most recent call last):
File "weekly_meeting.py", line 36, in <module>
file_path = sort_pdf(path)
File "weekly_meeting.py", line 15, in sort_pdf
pdf_merger.append(file)
File "/usr/local/lib/python3.6/site-packages/PyPDF2/merger.py", line 203, in append
self.merge(len(self.pages), fileobj, bookmark, pages, import_bookmarks)
File "/usr/local/lib/python3.6/site-packages/PyPDF2/merger.py", line 151, in merge
outline = pdfr.getOutlines()
File "/usr/local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1346, in getOutlines
lines = catalog["/Outlines"]
File "/usr/local/lib/python3.6/site-packages/PyPDF2/generic.py", line 516, in __getitem__
return dict.__getitem__(self, key).getObject()
File "/usr/local/lib/python3.6/site-packages/PyPDF2/generic.py", line 178, in getObject
return self.pdf.getObject(self).getObject()
File "/usr/local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1599, in getObject
idnum, generation = self.readObjectHeader(self.stream)
File "/usr/local/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1667, in readObjectHeader
return int(idnum), int(generation)
ValueError: invalid literal for int() with base 10: b'F-1.4'
我试过了
- 打开 PDF 文件 - 它们是正常工作的 PDF
- 使用预览导出它们,再次导出为 PDF - 它们仍然会产生错误
- 其他 PDF - 它们似乎工作正常
这是我的代码,问题似乎是 pdf_merger.append(file):
def sort_pdf(path):
pdf_merger = PdfFileMerger()
if (os.path.isdir(path)):
head, file_name = os.path.split(path)
os.chdir(path)
chronology = ["OVERVIEW", "CUSTOMER", "PROJECT", "PERSONAL"]
for prefix in chronology:
for file in glob.glob(prefix + "*.pdf"):
pdf_merger.append(file)
file_path = path + "/" + file_name + ".pdf"
with open(file_path, 'wb') as result:
pdf_merger.write(result)
return file_path
我希望输出是经过排序和组合的 PDF,我已经用其他文档实现了这一点。
【问题讨论】:
-
看起来您的输入 PDF 已损坏。这个
b'F-1.4'应该是b'%PDF-1.4' -
我想这是我可以通过编程解决的问题,对吧?在我尝试对 PDF 进行排序之前检查标题并修复它?知道如何更改文件头吗?
-
“可以通过编程方式解决,对吗?”:否,请验证您是否可以使用 PDF 阅读器打开 PDF。用编辑器打开,例如Leafpad,验证第一个字符是否等于
'%PDF-1.4'。 -
我只写了标题就解决了:pdf_reader._header = b_("%PDF-1.4")