在 Java-GAE 上将 PDF 页面转换为 JPG答案

【问题标题】：Converting PDF Pages to JPG on Java-GAE在 Java-GAE 上将 PDF 页面转换为 JPG
【发布时间】：2012-08-26 14:34:00
【问题描述】：

我正在寻找一个开源 java 库，它使我能够在服务器端将 PDF 的单页呈现为 JPG 或 PNG。

不幸的是，它不能使用任何其他 java.awt.* 类然后

java.awt.datatransfer.DataFlavor
java.awt.datatransfer.MimeType
java.awt.datatransfer.Transferable

如果有什么办法，一点 code-sn-p 就太棒了。

【问题讨论】：

stackoverflow.com/questions/11513841/… 展示了如何使用 Google Conversions api 进行操作。但是有一个问题。这个 api 将在 11 月被删除。也许您可以向 Google 寻求任何替代方案的提示。
是的，我已经看到了。但就像你写的那样，支持很快就会停止。否则就完美了。我会尝试从 google 获取一些信息。
嗨，你有没有找到其他可以进行相同转换的东西？我也在寻找类似的功能。我知道我可以使用谷歌驱动器从小于 25Mb 的 pdf 请求图像。但我需要它来处理更大的文件。

标签： java image google-app-engine pdf

【解决方案1】：

我相信icepdf 可能有您正在寻找的东西。

我曾经使用过这个开源项目，将上传的 pdf 文件转换为图像，以便在在线目录中使用。

import org.icepdf.core.exceptions.PDFException;
import org.icepdf.core.exceptions.PDFSecurityException;
import org.icepdf.core.pobjects.Document;
import org.icepdf.core.pobjects.Page;
import org.icepdf.core.util.GraphicsRenderingHints;


public byte[][] convert(byte[] pdf, String format) {

    Document document = new Document();
    try {
        document.setByteArray(pdf, 0, pdf.length, null);

    } catch (PDFException ex) {
        System.out.println("Error parsing PDF document " + ex);
    } catch (PDFSecurityException ex) {
        System.out.println("Error encryption not supported " + ex);
    } catch (FileNotFoundException ex) {
        System.out.println("Error file not found " + ex);
    } catch (IOException ex) {
        System.out.println("Error handling PDF document " + ex);
    }
    byte[][] imageArray = new byte[document.getNumberOfPages()][];
    // save page captures to bytearray.
    float scale = 1.75f;
    float rotation = 0f;

    // Paint each pages content to an image and write the image to file
    for (int i = 0; i < document.getNumberOfPages(); i++) {
        BufferedImage image = (BufferedImage)
                document.getPageImage(i,
                                      GraphicsRenderingHints.SCREEN,
                                      Page.BOUNDARY_CROPBOX, rotation, scale);
       try {
            //get the picture util object
            PictureUtilLocal pum = (PictureUtilLocal) Component
            .getInstance("pictureUtil");
            //load image into util
            pum.loadBuffered(image);

            //write image in desired format
            imageArray[i] = pum.imageToByteArray(format, 1f);

            System.out.println("\t capturing page " + i);

        } catch (IOException e) {
            e.printStackTrace();
        }
        image.flush();
    }
    // clean up resources
    document.dispose();
    return imageArray;
}

但请注意，我在这个库在 open-jdk 上抛出 SegFault 时遇到了麻烦。在 Sun 上运行良好。不确定它会在 GAE 上做什么。我不记得是哪个版本有问题，所以请注意。

【讨论】：

没有线索。但是当他们投票反对时，我在过去的 4 年里一直在生产中运行它。完全没有问题。
出于好奇，你用过 pdf-renderer 吗？我在使用 Apache PdfBox 将单页 PDF 转换为 PNG 时遇到问题，但 pdf-renderer 似乎修复了它doing similar to this post。我没有听到它谈论太多，所以我担心我错过了它的一些问题/缺点。
我没有。我不知道..实际上我在 2010 年编写了上述代码的第一个修订版。pdf-renderer 直到一年后才开始。可能是一个很好的项目。我是一名程序员。我总是对更好的方式感兴趣。 “pdf-renderer 是 Swinglabs 的子项目，于 2011 年 1 月启动，拥有 571 名成员。项目管理员为 rbair、tomoke、joshy 和 Jan Haderka。”
嘿，所以我认为您对此没有意见？

【解决方案2】：

您可以为此目的使用 apache PDF box APi 并使用以下代码将两个 pdf 逐页转换为 JPG。

public  void convertPDFToJPG(String src,String FolderPath){

           try{
               File folder1 = new File(FolderPath+"\\");
               comparePDF cmp=new comparePDF();
               cmp.rmdir(folder1);

           //load pdf file in the document object
           PDDocument doc=PDDocument.load(new FileInputStream(src));
           //Get all pages from document and store them in a list
           List<PDPage> pages=doc.getDocumentCatalog().getAllPages();
           //create iterator object so it is easy to access each page from the list
           Iterator<PDPage> i= pages.iterator();
           int count=1; //count variable used to separate each image file
           //Convert every page of the pdf document to a unique image file
           System.out.println("Please wait...");
           while(i.hasNext()){
            PDPage page=i.next(); 
            BufferedImage bi=page.convertToImage();
            ImageIO.write(bi, "jpg", new File(FolderPath+"\\Page"+count+".jpg"));
            count++;
            }
           System.out.println("Conversion complete");
           }catch(IOException ie){ie.printStackTrace();}
          }

【讨论】：

OP 明确表示他需要“Google App Engine”（GAE）的解决方案。当前的 PDFBox 版本以不在 GAE 环境中工作而闻名，因为它们使用了 GAE 环境中不存在的 AWT 类。