【问题标题】:Anyway to optimize pdf programmatically?无论如何以编程方式优化pdf?
【发布时间】:2014-06-09 01:08:02
【问题描述】:

我想以编程方式优化(即 Acrobat Pro 10 中的“另存为缩小尺寸的 PDF”)一系列 PDF 文件。如果可能,我更愿意从 python 2.7.5 执行此操作,如果不是从 python 则可能是 VBA Word,我最后的偏好是从另一个编程机制执行此操作。

想法?

【问题讨论】:

  • 为了我的需要,我使用了带有 nconvert(非商业)+GhostScript 的批处理文件:将所有页面提取为 PNG 格式的文件;将所有 PNG 文件打包成一个 pdf。通过减少颜色大小和尺寸,它变得非常小(但仍然可读)。

标签: vba python-2.7 pdf ms-word


【解决方案1】:

建议查看pdfsizeopt

Python 程序旨在充当 PDF 文件大小优化器。它可用于将较大的 pdf 转换为较小的 pdf,并支持您可以调用的命令行界面。

详情:

pdfsizeopt 是一个将大型 PDF 文件转换为小型文件的程序。 更具体地说,pdfsizeopt 是一个免费的跨平台命令行 应用程序(适用于 Linux、Mac OS X、Windows 和 Unix)和一个集合 优化 PDF 文件大小的最佳实践,重点是 从 TeX 和 LaTeX 文档创建的 PDF。 pdfsizeopt 是写在 Python,所以它有点慢,但它减轻了一些繁重的工作 其更快的(C、C++ 和 Java)依赖项。 pdfsizeopt 是在 一个 Linux 系统,它依赖于现有的工具,例如 Python 2.4, Ghostscript 8.50、jbig2enc(可选)、sam2p、pngtopnm、pngout (可选),以及编写的多价 PDF 压缩器(可选) Java。

参考:

http://code.google.com/p/pdfsizeopt/

【讨论】:

    【解决方案2】:

    另一个选项可以是Aspose.PDF Cloud SDK for Python。它是一个付费的 REST API,但每月提供 150 次免费 API 调用。目前,它从云存储(Aspose 默认存储/Amazon S3/Google Drive/Azure 存储/Dropbox/FTP 存储)压缩 PDF 文档。在不久的将来,我们计划支持从请求正文(流)压缩 PDF。

    import os
    import asposepdfcloud
    from asposepdfcloud.apis.pdf_api import PdfApi
    from shutil import copyfile
    
    # Get App key and App SID from https://cloud.aspose.com
    pdf_api_client = asposepdfcloud.api_client.ApiClient(
        app_key='xxxxxxxxxxxxxxxxxxxxxxxxxx',
        app_sid='xxxxx-xxxx-xxxx-xxxx-xxxxxxxx')
    
    pdf_api = PdfApi(pdf_api_client)
    temp_folder="Temp"
    
    #upload PDF file to storage
    
    data_file = "C:/Temp/02_pages.pdf"
    remote_name="02_pages.pdf"
    result_name="02_pages_compressed.pdf"
    
    pdf_api.upload_file(temp_folder + '/' + remote_name,data_file)
    
    optimize_options = asposepdfcloud.models.OptimizeOptions(
                    allow_reuse_page_content=False,
                    compress_images=True,
                    image_quality=100,
                    link_duplcate_streams=True,
                    remove_unused_objects=True,
                    remove_unused_streams=True,            
                    unembed_fonts=True)
    opts = {
                "options" : optimize_options,
                "folder" : temp_folder
            }
    
    response = pdf_api.post_optimize_document(remote_name, **opts)
    
    #download PDF file from storage
    response_download = pdf_api.download_file(temp_folder + '/' + remote_name)
    copyfile(response_download, 'C:/Temp/' + result_name)
    print(response)
    

    P.S:我是 Aspose 的开发布道者。

    【讨论】:

      【解决方案3】:

      我正在使用 Ghostscript 批处理 pdf。此 VBA 适用于 Word 和 Excel。它要求一个 Source 目录和一个 Target 目录。 .bat 文件被创建并存储在 Source 文件夹中,然后您可以执行它。我可能会让这个脚本更健壮,并会在我这样做时在这里更新。

      Ghostscript

      Sub gsPDF_Bat()
        
          'Summary of -dPDFSETTINGS:
      
          '-dPDFSETTINGS=/screen lower quality, smaller size. (72 dpi)
          '-dPDFSETTINGS=/ebook for better quality, but slightly larger pdfs. (150 dpi)
          '-dPDFSETTINGS=/prepress output similar to Acrobat Distiller "Prepress Optimized" setting (300 dpi)
          '-dPDFSETTINGS=/printer selects output similar to the Acrobat Distiller "Print Optimized" setting (300 dpi)
          '-dPDFSETTINGS=/default selects output intended to be useful across a wide variety of uses, possibly at the expense of a larger output file
            
          Dim ProofsFolder As String
          Dim CompressFolder As String
          Dim exePath As String
      
          exePath = "C:\Program Files\gs\gs9.54.0\bin\"
      
          ' Open the select folder prompt
          With Application.FileDialog(msoFileDialogFolderPicker)
              If .Show = -1 Then ' if OK is pressed
                  ProofsFolder = .SelectedItems(1)
              End If
          End With
          
          With Application.FileDialog(msoFileDialogFolderPicker)
              If .Show = -1 Then ' if OK is pressed
                  CompressFolder = .SelectedItems(1)
              End If
          End With
              
          Dim fso As Object
          Dim folder As Object
          Dim CurrFile As Object
      
        
          Set fso = CreateObject("Scripting.FileSystemObject")
          Set folder = fso.GetFolder(ProofsFolder)
             
          Open ProofsFolder & "\gsPDF-Compress.bat" For Output As #1
             
          For Each CurrFile In folder.Files
              FName = CurrFile.Name
              CurrFileExt = Right(FName, 4)
                  Debug.Print CurrFileExt
      
                  If CurrFileExt = ".pdf" Then
      
                      backNum = InStrRev(CurrFile, "\", -1)
                      FName = Mid(CurrFile, (backNum + 1))
      
                      Print #1, exePath & "gswin64 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dAutoRotatePages=/None -r300 -dUseCIEColor -sOutputFile=""" & CompressFolder & "\" & FName & """ """ & CurrFile & """"
                  End If
          Next
          Close #1
      
          Set fso = Nothing
          Set folder = Nothing
      End Sub
      

      【讨论】:

        【解决方案4】:

        和我之前的回答一样,仍然使用 Ghostscript。我注意到,当我们选择 1,000 个左右的 pdf 进行批量优化时,Excel 需要几分钟才能完成 bat 文件。我写了一个不同的版本,它创建了一个新工作表,将 bat 文件放在一起,然后保存它。即使有 1,000 条记录,这也只需要几秒钟。

        此脚本不会在 Word 中运行,因为它需要创建一个新的 Excel 工作表。可以将脚本更新为使用 Word 文档。 .prn 格式有行数限制,所以我需要用“^”对命令进行换行

        Sub gsPDF_Bat()
         'https://www.ghostscript.com/doc/current/VectorDevices.htm#distillerparams
            
            Dim ProofsFolder As String
            Dim CompressFolder As String
            Dim OrigSheet As String
            Dim exePath As String
            Dim CmdLine, CmdLine2, CmdLine3 As String
            
            exePath = "C:\Program Files\gs\gs9.54.0\bin\"
        
            ' Open the select folder prompt
            With Application.FileDialog(msoFileDialogFolderPicker)
                If .Show = -1 Then ' if OK is pressed
                    ProofsFolder = .SelectedItems(1)
                End If
            End With
            
            If ProofsFolder <> "" Then ' if a file was chosen
                Debug.Print ProofsFolder
            End If
            With Application.FileDialog(msoFileDialogFolderPicker)
                If .Show = -1 Then ' if OK is pressed
                    CompressFolder = .SelectedItems(1)
                End If
            End With
            
            If CompressFolder <> "" Then ' if a file was chosen
                Debug.Print CompressFolder
            End If
            
            Dim fso As Object
            Dim folder As Object
            Dim CurrFile As Object
        
          
            Set fso = CreateObject("Scripting.FileSystemObject")
            Set folder = fso.GetFolder(ProofsFolder)
               
            cell = 0
            OrigSheet = ActiveSheet.Name
            
            Sheets.Add(After:=Sheets(Sheets.Count)).Name = "temp"
            Application.DisplayAlerts = False
            
            For Each CurrFile In folder.Files
                FName = CurrFile.Name
                CurrFileExt = Right(FName, 4)
                    Debug.Print CurrFileExt
        
                    If CurrFileExt = ".pdf" Then
                        cell = cell + 1
                        
                        Debug.Print "CurrFile Found: " & CurrFile
        
                        backNum = InStrRev(CurrFile, "\", -1)
                        Debug.Print "backNum: " & backNum
                        FName = Mid(CurrFile, (backNum + 1))
                        Debug.Print FName
                        
                        ' ^ allows a line break on a DOS command
                    CmdLine = exePath & "gswin64 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -dAutoRotatePages=/None -dPDFSETTINGS=/prepress -dUseCIEColor -^"
                    CmdLine2 = "sOutputFile=""" & CompressFolder & "\z" & FName & """^"
                    CmdLine3 = " " & """" & CurrFile & """"
        
                    Sheets("temp").Range("A" & cell).value = CmdLine
                    cell = cell + 1
                    Sheets("temp").Range("A" & cell).value = CmdLine2
                    cell = cell + 1
                    Sheets("temp").Range("A" & cell).value = CmdLine3    
                    End If
            Next
        
            Sheets("temp").Select
            Sheets("temp").Copy
         
            ActiveWorkbook.SaveAs FileName:= _
                ProofsFolder & "\gsPDF Compress.bat", FileFormat:=xlTextPrinter, _
                CreateBackup:=False
            
            ActiveWorkbook.Close
        
            Sheets("temp").Delete
            Sheets(OrigSheet).Select
        
            Application.DisplayAlerts = True
            Set fso = Nothing
            Set folder = Nothing
        
        End Sub
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 1970-01-01
          • 2013-08-24
          • 2011-12-26
          • 1970-01-01
          相关资源
          最近更新 更多