【问题标题】:Extract Multiple Embedded Images from a single PDF Page using PDFBox使用 PDFBox 从单个 PDF 页面中提取多个嵌入图像
【发布时间】:2018-01-15 22:56:30
【问题描述】:

朋友们,我使用的是 PDFBox 2.0.6。我已经成功地从 pdf 文件中提取图像,但现在它正在为单个 pdf 页面创建图像。但问题是可能有任何不。 pdf 页面中的图像,我希望每个嵌入的图像都应该被提取为单个图像本身。

这是代码,

import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;

public class DemoPdf {

    public static void main(String args[]) throws Exception {
        //Loading an existing PDF document
        File file = new File("C:/Users/ADMIN/Downloads/Vehicle_Photographs.pdf");
        PDDocument document = PDDocument.load(file);
        //Instantiating the PDFRenderer class
        PDFRenderer renderer = new PDFRenderer(document);
        File imageFolder = new File("C:/Users/ADMIN/Desktop/image");

        for (int page = 0; page < document.getNumberOfPages(); ++page) {
            //Rendering an image from the PDF document
            BufferedImage image = renderer.renderImage(page);
            //Writing the image to a file
            ImageIO.write(image, "JPEG", new File(imageFolder+"/" + page +".jpg"));
            System.out.println("Image created"+ page);
        }
        //Closing the document
        document.close();
    }

}   

是否可以在 PDFBox 中将所有嵌入的图像提取为单独的图像,谢谢

【问题讨论】:

标签: java image pdf pdfbox


【解决方案1】:

是的。可以从pdf中的所有页面中提取所有图像。

您可以参考此链接,extract images from pdf using PDFBox

这里的基本思想是,用 PDFStreamEngine 扩展类,并覆盖 processOperator 方法。为所有页面调用 PDFStreamEngine.processPage。而如果传给processOperator的对象是Image Object,则从该对象中获取BufferedImage,并保存。

【讨论】:

  • 谢谢,Malikarjun。它解释了一切
  • 请注意,虽然示例写得很好,答案也很好,但它没有官方工具的功能(请参阅我的其他评论中的链接)。官方工具避免重复,还可以检测内联图片,将jpg文件1:1保存为jpg。
  • 链接有错误。它无用地创建了一个 BufferedImage 对象,然后直接分配一个新对象。
  • 谢谢@Daniel。我已经更正了 BufferedImage 的链接和无用分配。
【解决方案2】:

扩展 PDFStreamEngine 并覆盖 processOperator 之类的东西

 @Override
protected void processOperator( Operator operator, List<COSBase> operands) throws IOException
{
    String operation = operator.getName();
    if( "Do".equals(operation) )
    {
        COSName objectName = (COSName) operands.get( 0 );
        PDXObject xobject = getResources().getXObject( objectName );
        if( xobject instanceof PDImageXObject)
        {
            PDImageXObject image = (PDImageXObject)xobject;
            int imageWidth = image.getWidth();
            int imageHeight = image.getHeight();

            // same image to local
            BufferedImage bImage = new BufferedImage(imageWidth,imageHeight,BufferedImage.TYPE_INT_ARGB);
            bImage = image.getImage();
            ImageIO.write(bImage,"PNG",new File("c:\\temp\\image_"+imageNumber+".png"));

            imageNumber++;

        }
        else 
        {

        }
    }
    else
    {
        super.processOperator( operator, operands);
    }
}

【讨论】:

    【解决方案3】:

    这个答案与@jprism 类似。但这适用于那些只想复制和粘贴这个准备使用的代码和演示的人。

    import org.apache.pdfbox.contentstream.PDFStreamEngine;
    import org.apache.pdfbox.contentstream.operator.Operator;
    import org.apache.pdfbox.cos.COSBase;
    import org.apache.pdfbox.cos.COSName;
    import org.apache.pdfbox.pdmodel.PDDocument;
    import org.apache.pdfbox.pdmodel.PDPage;
    import org.apache.pdfbox.pdmodel.graphics.PDXObject;
    import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
    import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
    
    import javax.imageio.ImageIO;
    import java.awt.image.BufferedImage;
    import java.io.File;
    import java.io.IOException;
    import java.util.List;
    import java.util.UUID;
    
    public class ExtractImagesUseCase extends PDFStreamEngine{
        private final String filePath;
        private final String outputDir;
    
        // Constructor
        public ExtractImagesUseCase(String filePath,
                                    String outputDir){
            this.filePath = filePath;
            this.outputDir = outputDir;
        }
    
        // Execute
        public void execute(){
            try{
                File file = new File(filePath);
                PDDocument document = PDDocument.load(file);
    
                for(PDPage page : document.getPages()){
                    processPage(page);
                }
    
            }catch(IOException e){
                e.printStackTrace();
            }
        }
    
        @Override
        protected void processOperator(Operator operator, List<COSBase> operands) throws IOException{
            String operation = operator.getName();
    
            if("Do".equals(operation)){
                COSName objectName = (COSName) operands.get(0);
                PDXObject pdxObject = getResources().getXObject(objectName);
    
                if(pdxObject instanceof PDImageXObject){
                    // Image
                    PDImageXObject image = (PDImageXObject) pdxObject;
                    BufferedImage bImage = image.getImage();
    
                    // File
                    String randomName = UUID.randomUUID().toString();
                    File outputFile = new File(outputDir,randomName + ".png");
    
                    // Write image to file
                    ImageIO.write(bImage, "PNG", outputFile);
    
                }else if(pdxObject instanceof PDFormXObject){
                    PDFormXObject form = (PDFormXObject) pdxObject;
                    showForm(form);
                }
            }
    
            else super.processOperator(operator, operands);
        }
    }
    

    演示

    public class ExtractImageDemo{
        public static void main(String[] args){
            String filePath = "C:\\Users\\John\\Downloads\\Documents\\sample-file.pdf";
            String outputDir = "C:\\Users\\John\\Downloads\\Documents\\Output";
    
            ExtractImagesUseCase useCase = new ExtractImagesUseCase(
                    filePath,
                    outputDir
            );
            useCase.execute();
        }
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2012-02-01
      • 2018-05-05
      • 2011-07-25
      • 1970-01-01
      • 1970-01-01
      • 2014-04-16
      • 2023-03-05
      相关资源
      最近更新 更多