【问题标题】:pdf form filled with PyPDF2 does not show in print用 PyPDF2 填充的 pdf 表单未在打印中显示
【发布时间】:2018-04-27 13:35:56
【问题描述】:

我需要批量填写 pdf 表单,因此尝试编写一个 python 代码从 csv 文件中为我完成。我在这个question 中使用了第二个答案,它可以很好地填写表格,但是当我打开填写好的表格时,除非选择了相应的字段,否则答案不会显示。打印表格时也不会显示答案。我查看了 PyPDF2 文档,看看我是否可以展平生成的表单,但这个功能还没有实现,尽管大约一年前就被要求了。我的偏好是不使用 pdftk,因此我可以编译脚本而无需更多依赖。在提到的问题中使用原始代码时,一些字段显示在打印中,而有些则没有,这让我对它们的工作方式感到困惑。任何帮助表示赞赏。

这是代码。

# -*- coding: utf-8 -*-

from collections import OrderedDict
from PyPDF2 import PdfFileWriter, PdfFileReader


def _getFields(obj, tree=None, retval=None, fileobj=None):
    """
    Extracts field data if this PDF contains interactive form fields.
    The *tree* and *retval* parameters are for recursive use.

    :param fileobj: A file object (usually a text file) to write
    a report to on all interactive form fields found.
    :return: A dictionary where each key is a field name, and each
    value is a :class:`Field<PyPDF2.generic.Field>` object. By
    default, the mapping name is used for keys.
    :rtype: dict, or ``None`` if form data could not be located.
    """
    fieldAttributes = {'/FT': 'Field Type', '/Parent': 'Parent', '/T': 'Field Name', '/TU': 'Alternate Field Name',
                   '/TM': 'Mapping Name', '/Ff': 'Field Flags', '/V': 'Value', '/DV': 'Default Value'}
    if retval is None:
        retval = {} #OrderedDict()
        catalog = obj.trailer["/Root"]
        # get the AcroForm tree
        if "/AcroForm" in catalog:
            tree = catalog["/AcroForm"]
        else:
            return None
    if tree is None:
        return retval

    obj._checkKids(tree, retval, fileobj)
    for attr in fieldAttributes:
        if attr in tree:
            # Tree is a field
            obj._buildField(tree, retval, fileobj, fieldAttributes)
            break

    if "/Fields" in tree:
        fields = tree["/Fields"]
        for f in fields:
            field = f.getObject()
            obj._buildField(field, retval, fileobj, fieldAttributes)

    return retval


def get_form_fields(infile):
    infile = PdfFileReader(open(infile, 'rb'))
    fields = _getFields(infile)
    return {k: v.get('/V', '') for k, v in fields.items()}


def update_form_values(infile, outfile, newvals=None):
    pdf = PdfFileReader(open(infile, 'rb'))
    writer = PdfFileWriter()

    for i in range(pdf.getNumPages()):
        page = pdf.getPage(i)
        try:
            if newvals:
                writer.updatePageFormFieldValues(page, newvals)
            else:
                writer.updatePageFormFieldValues(page,
                                             {k: f'#{i} {k}={v}'
                                              for i, (k, v) in 
enumerate(get_form_fields(infile).items())
                                              })
            writer.addPage(page)
        except Exception as e:
            print(repr(e))
            writer.addPage(page)

    with open(outfile, 'wb') as out:
        writer.write(out)


if __name__ == '__main__':
    import csv    
    import os
    from glob import glob
    cwd=os.getcwd()
    outdir=os.path.join(cwd,'output')
    csv_file_name=os.path.join(cwd,'formData.csv')
    pdf_file_name=glob(os.path.join(cwd,'*.pdf'))[0]
    if not pdf_file_name:
        print('No pdf file found')
    if not os.path.isdir(outdir):
        os.mkdir(outdir)
    if not os.path.isfile(csv_file_name):
        fields=get_form_fields(pdf_file_name)
        with open(csv_file_name,'w',newline='') as csv_file:
            csvwriter=csv.writer(csv_file,delimiter=',')
            csvwriter.writerow(['user label'])
            csvwriter.writerow(['fields']+list(fields.keys()))
            csvwriter.writerow(['Mr. X']+list(fields.values()))
    else:
        with open(csv_file_name,'r',newline='') as csv_file:
            csvreader=csv.reader(csv_file,delimiter=',')
            csvdata=list(csvreader)
        fields=csvdata[1][1:]
        for frmi in csvdata[2:]:
            frmdict=dict(zip(fields,frmi[1:]))
            outfile=os.path.join(outdir,frmi[0]+'.pdf')
            update_form_values(pdf_file_name, outfile,frmdict)

【问题讨论】:

    标签: python-3.x pdf-form pypdf2


    【解决方案1】:

    我遇到了同样的问题,显然将“/NeedsAppearance”属性添加到 AcroForm 的 PdfWriter 对象解决了问题(请参阅 https://github.com/mstamy2/PyPDF2/issues/355)。在 ademidun (https://github.com/ademidun) 的大力帮助下,我能够填充 pdf 表单并正确显示字段的值。下面是一个例子:

    from PyPDF2 import PdfFileWriter, PdfFileReader
    from PyPDF2.generic import BooleanObject, NameObject, IndirectObject
    
    def set_need_appearances_writer(writer):
        # See 12.7.2 and 7.7.2 for more information:
        # http://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/PDF32000_2008.pdf
        try:
            catalog = writer._root_object
            # get the AcroForm tree and add "/NeedAppearances attribute
            if "/AcroForm" not in catalog:
                writer._root_object.update({
                    NameObject("/AcroForm"): IndirectObject(len(writer._objects), 0, writer)})
    
            need_appearances = NameObject("/NeedAppearances")
            writer._root_object["/AcroForm"][need_appearances] = BooleanObject(True)
            return writer
    
        except Exception as e:
            print('set_need_appearances_writer() catch : ', repr(e))
            return writer
    
    infile = "myInputPdf.pdf"
    outfile = "myOutputPdf.pdf"
    
    inputStream = open(infile, "rb")
    pdf_reader = PdfFileReader(inputStream, strict=False)
    if "/AcroForm" in pdf_reader.trailer["/Root"]:
        pdf_reader.trailer["/Root"]["/AcroForm"].update(
            {NameObject("/NeedAppearances"): BooleanObject(True)})
    
    pdf_writer = PdfFileWriter()
    set_need_appearances_writer(pdf_writer)
    if "/AcroForm" in pdf_writer._root_object:
        pdf_writer._root_object["/AcroForm"].update(
            {NameObject("/NeedAppearances"): BooleanObject(True)})
    
    field_dictionary = {"Field1": "Value1", "Field2": "Value2"}
    
    pdf_writer.addPage(pdf_reader.getPage(0))
    pdf_writer.updatePageFormFieldValues(pdf_writer.getPage(0), field_dictionary)
    
    outputStream = open(outfile, "wb")
    pdf_writer.write(outputStream)
    
    inputStream.close()
    outputStream.close()
    

    【讨论】:

    • 这很好用。谢谢。就像一般做法一样,打开文件后总是关闭它:-)
    • 不错@tromar。我想知道你是否可以为我提议一个模组?我想用字节(或字符串?)流替换输入流。实际上,我正在从 PostgreSQL 字节茶(一串十六进制字节)在您的代码中加载 infile。
    • 这实际上工作得很好,但在我的情况下,非 ASCII 字母出现在 Arial 中,直到被点击(表单中使用的字体是 Helvetica)。但至少它们出现了,所以在更改默认字体后,您的解决方案似乎很有魅力。谢谢!
    【解决方案2】:

    这对我来说适用于 Python 3.8 和 PyPDF4(但我认为它也适用于 PyPDF2):

    #!/usr/bin/env python3
    from PyPDF4.generic import NameObject
    from PyPDF4.generic import TextStringObject
    from PyPDF4.pdf import PdfFileReader
    from PyPDF4.pdf import PdfFileWriter
    
    import random
    import sys
    
    reader = PdfFileReader(sys.argv[1])
    
    writer = PdfFileWriter()
    # Try to "clone" the original one (note the library has cloneDocumentFromReader)
    # but the render pdf is blank
    writer.appendPagesFromReader(reader)
    writer._info = reader.trailer["/Info"]
    reader_trailer = reader.trailer["/Root"]
    writer._root_object.update(
        {
            key: reader_trailer[key]
            for key in reader_trailer
            if key in ("/AcroForm", "/Lang", "/MarkInfo")
        }
    )
    
    page = writer.getPage(0)
    
    params = {"Foo": "Bar"}
    
    # Inspired by updatePageFormFieldValues but also handle checkboxes
    for annot in page["/Annots"]:
        writer_annot = annot.getObject()
        field = writer_annot["/T"]
        if writer_annot["/FT"] == "/Btn":
            value = params.get(field, random.getrandbits(1))
            if value:
                writer_annot.update(
                    {
                        NameObject("/AS"): NameObject("/On"),
                        NameObject("/V"): NameObject("/On"),
                    }
                )
        elif writer_annot["/FT"] == "/Tx":
            value = params.get(field, field)
            writer_annot.update(
                {
                    NameObject("/V"): TextStringObject(value),
                }
            )
    
    with open(sys.argv[2], "wb") as f:
        writer.write(f)
    

    这会更新文本字段和复选框。

    我认为关键部分是从原始文件中复制了一些部分:

    reader_trailer = reader.trailer["/Root"]
    writer._root_object.update(
        {
            key: reader_trailer[key]
            for key in reader_trailer
            if key in ("/AcroForm", "/Lang", "/MarkInfo")
        }
    )
    

    注意:请随时在其他地方分享此解决方案,我咨询了很多与此主题相关的SO问题。

    【讨论】:

      猜你喜欢
      • 2023-03-20
      • 2015-12-13
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2017-06-12
      • 2014-10-24
      相关资源
      最近更新 更多