如何使用python合并pdf文件而不将它们存储到本地目录中答案

【问题标题】：How to merge pdf files using python without storing them into the local directory如何使用python合并pdf文件而不将它们存储到本地目录中
【发布时间】：2023-02-07 17:23:48
【问题描述】：

我有一些上传到远程服务器上的 pdf 文件。我有每个文件的 URL，我们可以通过访问这些 URL 下载这些 PDF 文件。

我的问题是，

我想将所有 pdf 文件合并到一个文件中（但不将这些文件存储到本地目录中）。我该怎么做（在 python 模块“PyPDF2”中）？

【问题讨论】：

标签： python django pycharm pypdf pdfmerger

【解决方案1】：

请移至pypdf。它与PyPDF2 本质上相同，但开发将在那里继续（我是这两个项目的维护者）。

您的问题已在文档中得到解答：

https://pypdf.readthedocs.io/en/latest/user/streaming-data.html

您不是写入文件，而是写入 io.ByteIO 流：

from io import ByteIO

# e.g. writer = PdfWriter()
# ... do what you want to do with the PDFs

with BytesIO() as bytes_stream:
    writer.write(bytes_stream)
    bytes_stream.seek(0)
    data = bytes_stream.read()  # that is now the "bytes" represention

【讨论】：

【解决方案2】：

要合并 PDF 文件而不在本地保存它们，您可以使用请求库下载每个文件的内容，然后将内容传递给 PyPDF2 库中的 PdfFileReader 类。

import requests
import PyPDF2
from io import BytesIO

def merge_pdfs_remotely(urls, output_filename):
    # Create a list of file-like objects from the URLs
    file_streams = [BytesIO(requests.get(url).content) for url in urls]
    
    # Create the PDF merger object
    merger = PyPDF2.PdfFileMerger()
    
    # Add each PDF file to the merger
    for stream in file_streams:
        merger.append(PyPDF2.PdfFileReader(stream))

【讨论】：

PdfFileMerger 和 PdfFileReader 已弃用