如何使用 pdfbox 从 pdf 中删除可选内容组及其内容？答案

【问题标题】：How to delete an optional content group alongwith its content from pdf using pdfbox?如何使用 pdfbox 从 pdf 中删除可选内容组及其内容？
【发布时间】：2017-12-16 15:50:15
【问题描述】：

我已经实现了从 pdf 中删除图层的功能，但问题是，我在图层上绘制的内容没有被删除。这是我用来删除图层的代码：

PDDocumentCatalog documentCatalog = doc.getDocumentCatalog();
PDOptionalContentProperties ocgProps = documentCatalog.getOCProperties();
PDOptionalContentGroup ocg = ocgProps.getGroup(markupLayerName);

    COSDictionary ocgsDict = (COSDictionary)ocgProps.getCOSObject();
    COSArray ocgs = (COSArray)ocgsDict.getItem(COSName.OCGS);
    int indexToBeDeleted = -1;
    for (int index = 0; index < ocgs.size(); index++)
    {
         COSBase o = ocgs.get(index);
         COSDictionary ocgDict = ToCOSDictionary(o);
          if (ocgDict.getString(COSName.NAME) == markupLayerName)
          {
              indexToBeDeleted = index;
               break;
           }
    }
    if (indexToBeDeleted >= 0)
     {
        cgs.remove(indexToBeDeleted);
        ocgsDict.setItem(COSName.OCGS, ocgs);
        documentCatalog.setOCProperties(new PDOptionalContentProperties(ocgsDict));

      }

【问题讨论】：

你的其他问题不回答这个问题吗？

标签： c# pdf pdfbox ocg

【解决方案1】：

为了删除标记数据，我不得不修改PDPage的内容。我只是在内容中搜索了BDC和EMC对，然后搜索了该对是否属于相关层，如果是，则从内容中删除该部分.下面是我使用的C#代码：

                PDPage page = (PDPage)doc.getDocumentCatalog().getPages().get(pageNum);
                PDResources resources = page.getResources();
                PDFStreamParser parser = new PDFStreamParser(page);
                parser.parse();
                java.util.Collection tokens = parser.getTokens();
                java.util.List newTokens = new java.util.ArrayList();
                List<Tuple<int, int>> deletionIndexList = new List<Tuple<int, int>>();
                object[] tokensArray = tokens.toArray();
                for (int index = 0; index < tokensArray.Count(); index++)
                {
                    object obj = tokensArray[index];
                    if (obj is COSName && (((COSName)obj) == COSName.OC))
                    {
                        int startIndex = index;
                        index++;
                        if (index < tokensArray.Count())
                        {
                            obj = tokensArray[index];
                            if (obj is COSName)
                            {
                                PDPropertyList prop = resources.getProperties((COSName)obj);//Check if the COSName found is the resource name of layer which contains the markup to be deleted.
                                if (prop != null && (prop is PDOptionalContentGroup))
                                {
                                    if (((PDOptionalContentGroup)prop).getName() == markupLayerName)
                                    {
                                        index++;
                                        if (index < tokensArray.Count())
                                        {
                                            obj = tokensArray[index];
                                            if (obj is Operator && ((Operator)obj).getName() == "BDC")//Check if the token specifies the start of markup
                                            {

                                                int endIndex = -1;
                                                index++;
                                                while (index < tokensArray.Count())
                                                {
                                                    obj = tokensArray[index];
                                                    if (obj is Operator && ((Operator)obj).getName() == "EMC")//Check if the token specifies the end of markup
                                                    {
                                                        endIndex = index;
                                                        break;
                                                    }
                                                    index++;
                                                }
                                                if (endIndex >= 0)
                                                {
                                                    deletionIndexList.Add(new Tuple<int, int>(startIndex, endIndex));
                                                }
                                            }

                                        }
                                    }
                                }
                            }
                        }
                    }
                }
                int tokensListIndex = 0;
                for (int index = 0; index < deletionIndexList.Count(); index++)
                {
                    Tuple<int, int> indexes = deletionIndexList.ElementAt(index);
                    while (tokensListIndex < indexes.Item1)
                    {
                        newTokens.add(tokensArray[tokensListIndex]);
                        tokensListIndex++;
                    }
                    tokensListIndex = indexes.Item2 + 1;
                }
                while (tokensListIndex < tokensArray.Count())
                {
                    newTokens.add(tokensArray[tokensListIndex]);
                    tokensListIndex++;
                }
                PDStream newContents = new PDStream(doc);
                OutputStream output = newContents.createOutputStream(COSName.FLATE_DECODE);
                ContentStreamWriter writer = new ContentStreamWriter(output);
                writer.writeTokens(newTokens);
                output.close();
                page.setContents(newContents);

【讨论】：

您使用== 进行字符串比较让我想知道这是否普遍适用......
@mkl 这是一个 C# 代码，它可以工作。我已经测试过很多次了。也许你指的是java，
啊，好的。可能您应该指出某个地方作为 java 是使用 PDFBox 的更常见的上下文。
@mkl 添加了 c# 标签。