如何合并多个PDF？答案

【问题标题】：How to combine multiples PDFs?如何合并多个PDF？
【发布时间】：2018-06-11 03:37:57
【问题描述】：

我想制作一个脚本来读取目录中的所有 pdf 文件，复制每个文件的第二页并将其写入一个输出 pdf（包含所有秒页）。
我已经写了一个代码，但它给了我一个带有空白页的 pdf。这真的很奇怪，因为我有另一个代码，它获取每个 pdf 的第二页并为每个第二页制作一个新的 pdf，并且该代码有效。我想我的问题可能与addPage()有关。
我正在使用 PyPDF2 库来使用 pdf 文件。

import pathlib
from PyPDF2 import PdfFileReader, PdfFileWriter

files_list = [file for file in pathlib.Path(__file__).parent.iterdir() if (file.is_file() and not str(file).endswith(".py"))]
total = len(files_list)    
writer = PdfFileWriter()    
for file in files_list:
    with open(file, 'rb') as infile:
        reader = PdfFileReader(infile)
        reader.decrypt("")
        writer.addPage(reader.getPage(1))            
with open('Output.pdf', 'wb') as outfile:
    writer.write(outfile)    
print('Done.')

【问题讨论】：

你为什么不使用你想要的代码呢？因为它没有合并页面？
@PatrickArtner，该代码没有合并，它只是复制旧的 pdf，只有第二页，但它没有合并到 one pdf .
在我的答案中添加了来自其他答案的示例（已修改）。归功于下面的其他答案。
不是严格的重复 - 但这种特殊情况由 stackoverflow.com/questions/22795091/… -answer

标签： python python-3.x pdf

【解决方案1】：

您是否尝试过以下代码：https://www.randomhacks.co.uk/how-to-split-a-pdf-every-2-pages-using-python/

from pyPdf import PdfFileWriter, PdfFileReader
import glob
import sys

pdfs = glob.glob("*.pdf")

for pdf in pdfs:

    inputpdf = PdfFileReader(file(pdf, "rb"))

    for i in range(inputpdf.numPages // 2):

        output = PdfFileWriter()
        output.addPage(inputpdf.getPage(i * 2))

        if i * 2 + 1 <  inputpdf.numPages:
            output.addPage(inputpdf.getPage(i * 2 + 1))

        newname = pdf[:7] + "-" + str(i) + ".pdf"

        outputStream = file(newname, "wb")
        output.write(outputStream)
        outputStream.close()

【讨论】：

很抱歉，您的回答没有回答我的问题。您将 pdf 每两页拆分一次，我问如何用每个 pdf 的 second 页制作一个新的 pdf。对不起。

【解决方案2】：

看看PdfFileMerger.append - 它允许您将多个 pdf 的页面合并到一个结果文件中。

append(fileobj, bookmark=None, pages=None, import_bookmarks=True)

与 merge() 方法相同，但假设您要将所有页面连接到文件末尾而不是指定位置。

Parameters:   
fileobj               A File Object or an object that supports the standard read 
                      and seek methods similar to a File Object. Could also be a 
                      string representing a path to a PDF file.
bookmark (str)        Optionally, you may specify a bookmark to be applied at the 
                      beginning of the included file by supplying the text of 
                      the bookmark.
pages                 can be a Page Range or a (start, stop[, step]) tuple to merge
                      only the specified range of pages from the source document into 
                     the output document.
import_bookmarks (bool)      You may prevent the source document’s bookmarks 
                             from being imported by specifying this as False.

这似乎更适合您使用PdfFileWriter 执行的任务。

from PyPDF2 import PdfFileMerger, PdfFileReader

# ...

merger = PdfFileMerger()

merger.append(PdfFileReader(file(filename1, 'rb')),None, [2])
merger.append(PdfFileReader(file(filename2, 'rb')),None, [2])

merger.write("document-output.pdf")

示例改编自答案：https://stackoverflow.com/a/29871560/7505395

【讨论】：

您的回答似乎很有用，但我有两个问题。 1) file() 命令在我的计算机上不起作用，但我用这 3 行修复了它：file = PdfFileReader(infile)、file.decrypt("") 和 merger.append(file, None, [2])。 2）如果参数页面是[2]我得到TypeError: "pages" must be a tuple of (start, stop[, step])，如果参数是一些元组范围，例如：(0,2,1)我得到PyPDF2.utils.PdfReadError: file has not been decrypted，但是如果参数不是它, “它有效” - 它附加了 pdf 的所有页面......但至少它不会引发错误 -。