在 RDLC 中包含图像会导致输出大小异常大答案

【问题标题】：Including an image in an RDLC results in a unexpectedly large output size在 RDLC 中包含图像会导致输出大小异常大
【发布时间】：2012-03-02 17:14:25
【问题描述】：

我有一份 RDLC 报告，其中包含一些数据和（可选）图像。内容呈现为 PDF。

我可能有一个容器（包）文件，其中存储了 100 个相同的结果。问题是，如果我包含图像，结果输出的增加量比预期的要大。

举个例子；我的 RDLC 报告是一张发票，可以在底部显示签名图片的图像。我可能在一个客户的包裹文件中有 100 张发票。

如果没有图像的总输出包（100 张发票）的大小是 2MB，而图像是 15 KB，我希望有图像的总输出包在 3.5MB 左右（2MB + 15KB * 100)。问题是我得到的总输出包超过 8MB。

是否有任何技术可用于减小此输出的大小，或任何其他方法来获得更符合预期的输出大小？

【问题讨论】：

不知道 rdlc 是什么。但我认为 15KB 的图像在 PDF 中呈现时不一定必须是 15KB。这是因为为 Web 制作的典型图像的分辨率为 72dpi。当包含在 PDF 中时，该软件通常会将其转换为 200-300dpi 以获得最佳打印质量。因此，100x100 像素的图像在 200dpi 时变为 ~278x278px 图像； 10,000 像素的图像转换为 77,000 像素，您自己算算。
PDF 渲染器保存上采样图像是愚蠢的，因为没有添加新信息。上采样可以等到打印时间。但是很多软件会做一些愚蠢的事情......
您能告诉我您的图片类型（jpg、png、tif）、颜色深度（1bpp、8bpp、24bpp 等）和大小（宽度和高度，以像素为单位）吗？
AFAIK PDF 以 TIFF 格式存储所有内部图像。根据您的 PDF 生成器的功能，它可以很好地保存，只需很少或不压缩。

标签： image optimization pdf reporting-services rdlc

【解决方案1】：

根据您的 PDF 生成器的功能，可以以弱压缩、无损压缩甚至不压缩保存图像。您可以使用以下方法从 PDF 中提取图像信息，以检查是否属于您的情况。如果是，则可以使用一些“PDF compression”软件来解决此问题。

（这可能看起来很奇怪，但我真的没有找到任何可以做到这一点的预先编写的软件）

安装 Python 2.x 和PDFMiner 包（安装步骤请参阅PDFMiner manual#cmap），然后使用以下代码列出文档中的所有图像、它们的大小和压缩。有关 PDF 使用的压缩算法的列表和说明，请参阅PDF specification，第 23 页（“标准过滤器”表）。

from pdfminer.pdfparser import PDFParser, PDFDocument
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.pdfdevice import PDFDevice

# Open a PDF file.
fp = open('Reader.pdf', 'rb')
# Create a PDF parser object associated with the file object.
parser = PDFParser(fp)
# Create a PDF document object that stores the document structure.
doc = PDFDocument()
# Connect the parser and document objects.
parser.set_document(doc)
doc.set_parser(parser)
# Supply the password for initialization.
# (If no password is set, give an empty string.)
doc.initialize('')
# Check if the document allows text extraction. If not, abort.
if not doc.is_extractable:
    raise PDFTextExtractionNotAllowed
# Create a PDF resource manager object that stores shared resources.
rsrcmgr = PDFResourceManager()

from pdfminer.layout import LAParams, LTImage
from pdfminer.converter import PDFPageAggregator

# Set parameters for analysis.
laparams = LAParams()
# Create a PDF page aggregator object.
device = PDFPageAggregator(rsrcmgr, laparams=laparams)
interpreter = PDFPageInterpreter(rsrcmgr, device)

#Build layout trees of all pages
layouts=[]
for page in doc.get_pages():
    interpreter.process_page(page)
    # receive the LTPage object for the page.
    layouts.append(device.get_result())

#search the trees for images and show their info,
# excluding repeating ones
known_ids=set()
count=0;size=0
def lsimages(obj):
    global count; global size
    if hasattr(obj,'_objs'):
        for so in obj._objs:
            if isinstance(so,LTImage):
                i=so; id=i.stream.attrs['ID'].objid
                if id not in known_ids:
                    a=i.stream.attrs
                    print a
                    count+=1;size+=a.get('Length',0)
                    known_ids.add(id)
            lsimages(so)
for l in layouts:
    lsimages(l)
print "Total: %d images, %d bytes"%(count,size)

致谢：样板代码取自 Programming with PDFMiner 文章。

【讨论】：