使用带有 /JBIG2 过滤器的 PDFSharp 从 pdf 中提取图像答案

【问题标题】：Extract image from pdf using PDFSharp with /JBIG2 filter使用带有 /JBIG2 过滤器的 PDFSharp 从 pdf 中提取图像
【发布时间】：2019-08-02 14:51:32
【问题描述】：

我正在尝试使用 PDFsharp 从 PDF 文件中提取图像。我运行代码的测试文件显示过滤器类型为 /JBIG2。如果可以使用 PDFSharp，我希望了解如何解码并保存此图像。

我用来提取图像然后保存的代码如下：

const string filename = "../../../test.pdf";            
PdfDocument document = PdfReader.Open(filename);
int imageCount = 0;

foreach (PdfPage page in document.Pages) { // Iterate pages
  // Get resources dictionary
  PdfDictionary resources = page.Elements.GetDictionary("/Resources");

  if (resources != null) {
    // Get external objects dictionary
    PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");

    if (xObjects != null) {
      ICollection<PdfItem> items = xObjects.Elements.Values;

      foreach (PdfItem item in items) { // Iterate references to external objects
        PdfReference reference = item as PdfReference;

        if (reference != null) {
          PdfDictionary xObject = reference.Value as PdfDictionary;

          // Is external object an image?
          if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image") {
            ExportImage(xObject, ref imageCount);
          }
        }
      }
    }
  }
}

static void ExportImage(PdfDictionary image, ref int count) {
   string filter = image.Elements.GetName("/Filter");

   switch (filter) {
     case "/DCTDecode":
       ExportJpegImage(image, ref count);
       break;
     case "/FlateDecode":
       ExportAsPngImage(image, ref count);
       break;
   }  
}

static void ExportJpegImage(PdfDictionary image, ref int count) {
  // Fortunately, JPEG has native support in PDF and exporting an image is just writing the stream to a file.
  byte[] stream = image.Stream.Value;
  FileStream fs = new FileStream(
    String.Format("Image{0}.jpeg", count++), FileMode.Create, FileAccess.Write
  );
  BinaryWriter bw = new BinaryWriter(fs);
  bw.Write(stream);
  bw.Close();
}

在上面，我得到的过滤器类型为/JBIG2，我确实支持。以上代码来自PDFSharp: Export Images Sample

【问题讨论】：

请发布您用于提取过程的代码，如果可能，请发布相关 PDF（或链接）。
用代码编辑了主要摘要。共享文件会很困难，但我可以补充一点，该文件是我扫描文档并将其通过电子邮件发送给自己时生成的 pdf。 @AgiHammerthief
要回答您的问题，我必须阅读 Adobe 的 PDF 参考手册，但我现在没有时间。如果您查看参考资料，也许您可以自己回答您的问题。

标签： c# image pdf pdfsharp jbig2

【解决方案1】：

JBIG2 在 PDF 中使用最广泛，但在 PDF 之外则是另一回事。尽管 .jbig2 是一种光栅图像格式，但在图像查看器方面对它的支持非常少。最好的办法是像 Acrobat 一样将其导出为 CCITT4 压缩 TIFF。

【讨论】：