【问题标题】:Rotated image extracted from pdfsharp从pdfsharp中提取的旋转图像
【发布时间】:2011-11-23 10:15:45
【问题描述】:

我能够使用 pdfsharp 成功地从 pdf 中提取图像。图像是 CCITFFaxDecode。但是在创建的 tiff 图像中,图像正在旋转。知道可能出了什么问题吗?

这是我使用的代码:

byte[] data = xObject.Stream.Value;
Tiff tiff = BitMiracle.LibTiff.Classic.Tiff.Open("D:\\clip_TIFF.tif", "w");
tiff.SetField(TiffTag.IMAGEWIDTH, (uint)(width));
tiff.SetField(TiffTag.IMAGELENGTH, (uint)(height));
tiff.SetField(TiffTag.COMPRESSION, (uint)BitMiracle.LibTiff.Classic.Compression.CCITTFAX4);
tiff.SetField(TiffTag.BITSPERSAMPLE, (uint)(bpp));
tiff.WriteRawStrip(0,data,data.Length);
tiff.Close();

【问题讨论】:

  • 没有 PDF,没有 TIFF,提取代码 - 我们怎么知道出了什么问题?也许图像是通过旋转变换在 PDF 中绘制的?还是 PDF 页面被旋转?也许什么都不会出错,一切都是设计好的。
  • 哦,你的意思是如果图片是通过旋转变换绘制在pdf上的,那么提取出来的图片也会被旋转?图像的旋转是否与pdf和tiff图像的坐标系有关?

标签: image itextsharp pdfsharp


【解决方案1】:

由于该问题仍被标记为 w/iTextSharp,因此可以添加一些代码,即使它看起来不像您在此处使用该库。从 iText[Sharp] 5 开始添加 PDF 解析支持。

没有您使用的图像类型的测试 PDF,但 found one here(请参阅附件)。这是ASP.NET (HTTP handler .ashx) 中的非常简单工作示例,使用该测试 PDF 文档帮助您进行:

<%@ WebHandler Language="C#" Class="CCITTFaxDecodeExtract" %>
using System;
using System.Collections.Generic;
using System.IO;
using System.Web;
using iTextSharp.text;
using iTextSharp.text.pdf;
using iTextSharp.text.pdf.parser;
using Dotnet = System.Drawing.Image;
using System.Drawing.Imaging;

public class CCITTFaxDecodeExtract : IHttpHandler {
  public void ProcessRequest (HttpContext context) {
    HttpServerUtility Server = context.Server;
    HttpResponse Response = context.Response;
    string file = Server.MapPath("~/app_data/CCITTFaxDecode.pdf");
    PdfReader reader = new PdfReader(file);
    PdfReaderContentParser parser = new PdfReaderContentParser(reader);
    MyImageRenderListener listener = new MyImageRenderListener();
    for (int i = 1; i <= reader.NumberOfPages; i++) {
      parser.ProcessContent(i, listener);
    } 
    for (int i = 0; i < listener.Images.Count; ++i) {
      string path = Server.MapPath("~/app_data/" + listener.ImageNames[i]);
      using (FileStream fs = new FileStream(
        path, FileMode.Create, FileAccess.Write
      ))
      {
        fs.Write(listener.Images[i], 0, listener.Images[i].Length);
      }
    }         
  }
  public bool IsReusable { get { return false; } }
/*
 * see: TextRenderInfo & RenderListener classes here:
 * http://api.itextpdf.com/itext/
 * 
 * and Google "itextsharp extract images"
 */
  public class MyImageRenderListener : IRenderListener {
    public void RenderText(TextRenderInfo renderInfo) { }
    public void BeginTextBlock() { }
    public void EndTextBlock() { }

    public List<byte[]> Images = new List<byte[]>();
    public List<string> ImageNames = new List<string>();
    public void RenderImage(ImageRenderInfo renderInfo) {
      PdfImageObject image = renderInfo.GetImage();
      PdfName filter = image.Get(PdfName.FILTER) as PdfName;
      if (filter == null) {
        PdfArray pa = (PdfArray) image.Get(PdfName.FILTER);
        for (int i = 0; i < pa.Size; ++i) {
          filter = (PdfName) pa[i];
        }
      }
      if (PdfName.CCITTFAXDECODE.Equals(filter)) {
        using (Dotnet dotnetImg = image.GetDrawingImage()) {
          if (dotnetImg != null) {
            ImageNames.Add(string.Format(
              "{0}.tiff", renderInfo.GetRef().Number)
            );
            using (MemoryStream ms = new MemoryStream()) {
              dotnetImg.Save(
              ms, ImageFormat.Tiff);
              Images.Add(ms.ToArray());
            }
          }
        }
      }
    }
  }
}

如果图像正在/正在旋转,see this thread on the iText mailing list;可能 PDF 文档中的某些页面已被旋转。

【讨论】:

    【解决方案2】:

    by 是完整的代码,它从 pdf 中提取图像,但旋转它。抱歉代码太长了。

    PdfDocument document = PdfReader.Open("D:\\Sample.pdf");
    PdfDictionary resources =document.pages.Elements.GetDictionary("/Resources");
    PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
    if (xObjects != null)
    {
        ICollection<PdfItem> items = xObjects.Elements.Values;
        // Iterate references to external objects
        foreach (PdfItem item in items)
        {
            PdfReference reference = item as PdfReference;
            if (reference != null)
            {
                PdfDictionary xObject = reference.Value as PdfDictionary;
                // Is external object an image?
    
                if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
                {
                    string filter = xObject.Elements.GetName("/Filter");
    
                    if (filter.Equals("/CCITTFaxDecode"))
                    {
                        int width = xObject.Elements.GetInteger(PdfImage.Keys.Width);
                        int height = xObject.Elements.GetInteger(PdfImage.Keys.Height);
                        int bpp = xObject.Elements.GetInteger(PdfImage.Keys.BitsPerComponent);
    
                        byte[] data = xObject.Stream.Value;
                        Tiff tiff = BitMiracle.LibTiff.Classic.Tiff.Open("D:\\sample.tif", "w");
                        tiff.SetField(TiffTag.IMAGEWIDTH, (uint)(width));
                        tiff.SetField(TiffTag.IMAGELENGTH, (uint)(height));
                        tiff.SetField(TiffTag.COMPRESSION, (uint)BitMiracle.LibTiff.Classic.Compression.CCITTFAX4);
                        tiff.SetField(TiffTag.BITSPERSAMPLE, (uint)(bpp));
                        tiff.SetField(TiffTag.STRIPOFFSETS, 187);
    
                        tiff.WriteRawStrip(0,data,data.Length);
                        tiff.Close();
                    }
                }
            }
        }
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-12-15
      • 1970-01-01
      • 2019-08-02
      • 2011-10-15
      • 2015-11-02
      • 1970-01-01
      相关资源
      最近更新 更多