尝试遍历多个 PDF 文件并将这些 PDF 的各个页面保存为图像

【问题标题】：Trying to loop through multiple PDF files and save the individual pages of those PDF as images尝试遍历多个 PDF 文件并将这些 PDF 的各个页面保存为图像
【发布时间】：2022-01-20 10:18:23
【问题描述】：

我正在处理一个 python 项目，该项目需要我一个接一个地遍历存储在我当前目录的一个名为 sample/ 的文件夹中的多个 pdf，并将这些 pdf 的各个页面作为图像保存在另一个名为 convert_images/ 的目录中。有人能帮我吗？所有的 pdf 文件都是随机命名的，但扩展名为“.pdf”。

【问题讨论】：

这能回答你的问题吗？ Extract a page from a pdf as a jpeg

标签： python python-3.x pdf2image

【解决方案1】：

你可以用pdf2image来做到这一点

pip install pdf2image

    from pdf2image import convert_from_path
    pages = convert_from_path('pdf_file', 500)
    for page in pages:
        page.save('out.jpg', 'JPEG')

或：

import pypdfium2 as pdfium

pdffile = 'path/to/your_doc.pdf'

# render multiple pages concurrently (in this case: all)
for image, suffix in pdfium.render_pdf(pdffile):
    image.save(f'output_{suffix}.jpg')

# render a single page (in this case: the first one)
with pdfium.PdfContext(pdffile) as pdf:
    image = pdfium.render_page(pdf, 0)
    image.save('output.jpg')

【讨论】：