使用 itext7 .NET 更新现有标记（FreeText 标注）PDF答案

【问题标题】：Updating existing markup (FreeText Callout) PDF using itext7 .NET使用 itext7 .NET 更新现有标记（FreeText 标注）PDF
【发布时间】：2021-06-12 14:09:54
【问题描述】：

我在下面有一个代码来使用 itext7 .NET 更新现有的标记（FreeText 标注）PDF。它没有正确显示，但在 bluebeam 中对其进行编辑，然后它会显示正确的内容，如下图：我错过了什么？

public void UpdateMarkupCallout()
{
    string inPDF = @"C:\in PDF.pdf";
    string outPDF = @"C:\out PDF.pdf";
    PdfDocument pdfDoc = new PdfDocument(new PdfReader(inPDF), new PdfWriter(outPDF));
    int numberOfPages = pdfDoc.GetNumberOfPages();
    for (int i = 1; i <= numberOfPages; i++)
    {
        PdfDictionary page = pdfDoc.GetPage(i).GetPdfObject();
        PdfArray annotArray = page.GetAsArray(PdfName.Annots);
        if (annotArray == null)
        {
            continue;
        }
        int size = annotArray.Size();
        for (int x = 0; x < size; x++)
        {
            PdfDictionary curAnnot = annotArray.GetAsDictionary(x);
            if (curAnnot.GetAsString(PdfName.Contents) != null)
            {
                string contents = curAnnot.GetAsString(PdfName.Contents).ToString();
                if (contents != "" && contents.Contains("old content"))
                {
                    curAnnot.Put(PdfName.Contents, new PdfString("new content"));
                }
            }
        }
    }
    pdfDoc.Close();
}

附件：here

【问题讨论】：

您还需要为该注释生成外观。转载请附上源PDF
嗨@AlexeySubach，我附上了上面的文件，谢谢。

标签： c# .net itext7

【解决方案1】：

答案在 Java 中，但转换为 C# 应该是一些简单的字母大小写替换和小调整。

不幸的是，这里没有灵丹妙药的解决方案，至少在不付出巨大努力的情况下并非如此。

1。偏正解

这里有几个问题。首先，您只更新了/Contents 键，而您正在编辑的注释也有/RC 键，代表A rich text string (see Adobe XML Architecture, XML Forms Architecture (XFA) Specification, version 3.3) that shall be used to generate the appearance of the annotation. (ISO 32000)。

除此之外，必须重新生成外观（/AP 条目）。正如规范所规定的那样。这不是 iText 目前能够做的，所以你必须自己做。

您需要确定必须绘制文本的区域，将/RD 或 rect diff 条目考虑在内。

要创建您的外观，您可以使用pdfHTML 附加组件，它将来自/RC 的富文本表示处理为布局元素，您可以将其传输到可以放入/AP 的XObject。

代码类似如下：

PdfDocument pdfDocument = new PdfDocument(new PdfReader("in PDF.pdf"),
        new PdfWriter("out PDF.pdf"));

int numberOfPages = pdfDocument.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
    PdfDictionary page = pdfDocument.getPage(i).getPdfObject();
    PdfArray annotArray = page.getAsArray(PdfName.Annots);
    if (annotArray == null) {
        continue;
    }
    int size = annotArray.size();
    for (int x = 0; x < size; x++) {
        PdfDictionary curAnnot = annotArray.getAsDictionary(x);
        if (curAnnot.getAsString(PdfName.Contents) != null) {
            String contents = curAnnot.getAsString(PdfName.Contents).toString();
            if (!contents.isEmpty() && contents.contains("old content")) //set layer for a FreeText with this content
            {
                curAnnot.put(PdfName.Contents, new PdfString("new content"));
                String richText = curAnnot.getAsString(PdfName.RC).toUnicodeString();
                Document document = Jsoup.parse(richText);
                for (Element element : document.select("p")) {
                    element.html("new content");
                }
                curAnnot.put(PdfName.RC, new PdfString(document.body().outerHtml()));

                Rectangle bbox = curAnnot.getAsRectangle(PdfName.Rect);

                Rectangle textBbox = bbox.clone();
                // left, top, right, bottom
                PdfArray rectDiff = curAnnot.getAsArray(PdfName.RD);
                if (rectDiff != null) {
                    textBbox.applyMargins(rectDiff.getAsNumber(1).floatValue(),
                            rectDiff.getAsNumber(2).floatValue(),
                            rectDiff.getAsNumber(3).floatValue(),
                            rectDiff.getAsNumber(0).floatValue(), false);
                }
                float leftRectDiff = rectDiff != null ? rectDiff.getAsNumber(0).floatValue() : 0;
                float topRectDiff = rectDiff != null ? rectDiff.getAsNumber(1).floatValue() : 0;

                List<IElement> elements = HtmlConverter.convertToElements(document.body().outerHtml());
                PdfFormXObject appearance = new PdfFormXObject(
                        new Rectangle(0, 0, bbox.getWidth(), bbox.getHeight()));
                Canvas canvas = new Canvas(new PdfCanvas(appearance, pdfDocument),
                        new Rectangle(leftRectDiff, topRectDiff, textBbox.getWidth(), textBbox.getHeight()));
                canvas.setProperty(Property.RENDERING_MODE, RenderingMode.HTML_MODE);
                for (IElement ele : elements) {
                    if (ele instanceof IBlockElement) {
                        canvas.add((IBlockElement) ele);
                    }
                }
                curAnnot.getAsDictionary(PdfName.AP).put(PdfName.N, appearance.getPdfObject());
            }
        }
    }
}

pdfDocument.close();

你会得到如下所示的结果：

您可以看到新文本按预期显示，但整体视觉表现与我们的预期相差甚远——缺少背景填充、边框和箭头。因此，要正确生成外观，您必须进一步探索其他 PDF 属性，例如 /CL（箭头描述符）、/BS（边框样式）、/C（背景颜色）等。这需要相当长的时间 - 阅读在规范上，解析相关条目并将其应用到您的绘图操作中。你可以从PdfFormField类的实现中得到一些启发。

2。没有任何保证的简单解决方案

如果您希望注释中的文本仅包含一行，是纯拉丁文本，并且通常输入文档的可变性很小，您可以采用当前外观并假设文本字符串将被写入有一大块（你的输入文档就是这种情况）。

请注意，这是一种 hacky 方法，容易出现许多潜在错误/错误。

示例代码：

PdfDocument pdfDocument = new PdfDocument(new PdfReader("in PDF.pdf"),
        new PdfWriter("out PDF.pdf"));

int numberOfPages = pdfDocument.getNumberOfPages();
for (int i = 1; i <= numberOfPages; i++) {
    PdfDictionary page = pdfDocument.getPage(i).getPdfObject();
    PdfArray annotArray = page.getAsArray(PdfName.Annots);
    if (annotArray == null) {
        continue;
    }
    int size = annotArray.size();
    for (int x = 0; x < size; x++) {
        PdfDictionary curAnnot = annotArray.getAsDictionary(x);
        if (curAnnot.getAsString(PdfName.Contents) != null) {
            String contents = curAnnot.getAsString(PdfName.Contents).toString();
            String oldContent = "old content";
            if (!contents.isEmpty() && contents.contains(oldContent)) {
                String newContent = "new content";
                curAnnot.put(PdfName.Contents, new PdfString(newContent));
                String richText = curAnnot.getAsString(PdfName.RC).toUnicodeString();
                Document document = Jsoup.parse(richText);
                for (Element element : document.select("p")) {
                    element.html(newContent);
                }
                curAnnot.put(PdfName.RC, new PdfString(document.body().outerHtml()));

                PdfStream currentAppearance = curAnnot.getAsDictionary(PdfName.AP).getAsStream(PdfName.N);
                String currentBytes = new String(currentAppearance.getBytes(), StandardCharsets.UTF_8);
                currentBytes = currentBytes.replace("(" + oldContent + ") Tj", "(" + newContent + ") Tj");
                currentAppearance.setData(currentBytes.getBytes(StandardCharsets.UTF_8));
            }
        }
    }
}

pdfDocument.close();

视觉结果（如您所见，这就是我们想要的）：

3。不合规解决方案

另一种不符合 PDF 规范的方法是删除 /AP 条目。您可以使用curAnnot.remove(PdfName.AP); 在同一个循环中执行此操作。大多数主要的 PDF 查看器都将自行重新生成外观。但是，我的查看器生成的外观并不是最吸引人的：

如您所见，结果将取决于 PDF 查看器，这很好地说明了 PDF 规范要求存在 /AP 的原因。 再次强调，这种方式不符合 PDF 规范。

【讨论】：

感谢@Alexey Subach，我将尝试根据选项 1 进行修复。