使用pywin32控制Adobe Acrobat时出现“未实现”异常答案

【问题标题】："Not implemented" Exception when using pywin32 to control Adobe Acrobat使用pywin32控制Adobe Acrobat时出现“未实现”异常
【发布时间】：2012-03-12 02:27:54
【问题描述】：

我已经使用 pywin32 在 python 中编写了一个脚本来将 pdf 文件保存为文本，直到最近它工作正常。我在 Excel 中使用类似的方法。代码如下：

def __pdf2Txt(self, pdf, fileformat="com.adobe.acrobat.accesstext"):
    outputLoc = os.path.dirname(pdf)
    outputLoc = os.path.join(outputLoc, os.path.splitext(os.path.basename(pdf))[0] + '.txt')

    try:
        win32com.client.gencache.EnsureModule('{E64169B3-3592-47d2-816E-602C5C13F328}', 0, 1, 1)
        adobe = win32com.client.DispatchEx('AcroExch.App')
        pdDoc = win32com.client.DispatchEx('AcroExch.PDDoc')
        pdDoc.Open(pdf)
        jObject = pdDoc.GetJSObject()
        jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
    except:
        traceback.print_exc()
        return False
    finally:
        del jObject
        pdDoc.Close()
        del pdDoc
        adobe.Exit()
        del adobe

但是这段代码突然停止工作，我得到以下输出：

Traceback (most recent call last):
  File "C:\Documents and Settings\ablishen\workspace\HooverKeyCreator\src\HooverKeyCreator.py", line 38, in __pdf2Txt
    jObject.SaveAs(outputLoc, "com.adobe.acrobat.accesstext")
  File "C:\Python27\lib\site-packages\win32com\client\dynamic.py", line 505, in __getattr__
    ret = self._oleobj_.Invoke(retEntry.dispid,0,invoke_type,1)
com_error: (-2147467263, 'Not implemented', None, None)
False

我有类似的用 VB 编写的代码可以正常工作，所以我猜它与 COM 接口没有正确绑定到适当的函数有关？（我的 COM 知识不完整）。

【问题讨论】：

此 PDF 有保存使用权吗？（根据来自 docs 的猜测：“此方法在 Adobe Reader 中适用于具有保存使用权限的文档。）
它似乎没有，但我启用了它们并且仍然得到同样的错误。另外我正在使用 adobe acrobat 运行代码。

标签： python com acrobat pywin32 win32com

【解决方案1】：

Blish，this thread 掌握着您正在寻找的解决方案的关键：https://mail.python.org/pipermail/python-win32/2002-March/000260.html

我承认上面的帖子不是最容易找到的（可能是因为谷歌根据内容的年龄给它打分低？）。

具体来说，应用 this piece of 建议会让事情顺利进行：https://mail.python.org/pipermail/python-win32/2002-March/000265.html

作为参考，不需要您手动修补 dynamic.py 的完整代码（sn-p 应该可以立即运行）：

# gets all files under ROOT_INPUT_PATH with FILE_EXTENSION and tries to extract text from them into ROOT_OUTPUT_PATH with same filename as the input file but with INPUT_FILE_EXTENSION replaced by OUTPUT_FILE_EXTENSION
from win32com.client import Dispatch
from win32com.client.dynamic import ERRORS_BAD_CONTEXT

import winerror

# try importing scandir and if found, use it as it's a few magnitudes of an order faster than stock os.walk
try:
    from scandir import walk
except ImportError:
    from os import walk

import fnmatch

import sys
import os

ROOT_INPUT_PATH = None
ROOT_OUTPUT_PATH = None
INPUT_FILE_EXTENSION = "*.pdf"
OUTPUT_FILE_EXTENSION = ".txt"

def acrobat_extract_text(f_path, f_path_out, f_basename, f_ext):
    avDoc = Dispatch("AcroExch.AVDoc") # Connect to Adobe Acrobat

    # Open the input file (as a pdf)
    ret = avDoc.Open(f_path, f_path)
    assert(ret) # FIXME: Documentation says "-1 if the file was opened successfully, 0 otherwise", but this is a bool in practise?

    pdDoc = avDoc.GetPDDoc()

    dst = os.path.join(f_path_out, ''.join((f_basename, f_ext)))

    # Adobe documentation says "For that reason, you must rely on the documentation to know what functionality is available through the JSObject interface. For details, see the JavaScript for Acrobat API Reference"
    jsObject = pdDoc.GetJSObject()

    # Here you can save as many other types by using, for instance: "com.adobe.acrobat.xml"
    jsObject.SaveAs(dst, "com.adobe.acrobat.accesstext")

    pdDoc.Close()
    avDoc.Close(True) # We want this to close Acrobat, as otherwise Acrobat is going to refuse processing any further files after a certain threshold of open files are reached (for example 50 PDFs)
    del pdDoc

if __name__ == "__main__":
    assert(5 == len(sys.argv)), sys.argv # <script name>, <script_file_input_path>, <script_file_input_extension>, <script_file_output_path>, <script_file_output_extension>

    #$ python get.txt.from.multiple.pdf.py 'C:\input' '*.pdf' 'C:\output' '.txt'

    ROOT_INPUT_PATH = sys.argv[1]
    INPUT_FILE_EXTENSION = sys.argv[2]
    ROOT_OUTPUT_PATH = sys.argv[3]
    OUTPUT_FILE_EXTENSION = sys.argv[4]

    # tuples are of schema (path_to_file, filename)
    matching_files = ((os.path.join(_root, filename), os.path.splitext(filename)[0]) for _root, _dirs, _files in walk(ROOT_INPUT_PATH) for filename in fnmatch.filter(_files, INPUT_FILE_EXTENSION))

    # patch ERRORS_BAD_CONTEXT as per https://mail.python.org/pipermail/python-win32/2002-March/000265.html
    global ERRORS_BAD_CONTEXT
    ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL)

    for filename_with_path, filename_without_extension in matching_files:
        print "Processing '{}'".format(filename_without_extension)
        acrobat_extract_text(filename_with_path, ROOT_OUTPUT_PATH, filename_without_extension, OUTPUT_FILE_EXTENSION)

我已在 WinPython x64 2.7.6.3、Acrobat X Pro 上对此进行了测试

【讨论】：

将 winerror.E_NOTIMPL 添加到 dynamic.py 中的 ERRORS_BAD_CONTEXT 列表有效。非常感谢！
嗨，我正在使用 python 和 acrobat reader pro 来实现相同的功能，目前这段代码甚至在做了之前的评论者所做的之后，给我以下错误：“NotAllowedError：安全设置阻止访问此属性或方法”。你知道是什么原因造成的吗？谢谢
对于ERRORS_BAD_CONTEXT.append(winerror.E_NOTIMPL) 行，我对你的支持不够。
这真是太棒了。还解释了为什么 jsobject 在 VBA 中有效，但在 PowerShell 中无效...

【解决方案2】：

makepy.py 是 win32com python 包自带的脚本。

为您的安装运行它“连接”python 到 Windows 中的 COM/OLE 对象。以下是我用来与 Excel 对话并在其中做一些事情的一些代码的摘录。此示例获取当前工作簿中工作表 1 的名称。如果出现异常，它会自动运行 makepy：

import win32com;
import win32com.client;
from win32com.client import selecttlb;

def attachExcelCOM():
   makepyExe = r'python C:\Python25\Lib\site-packages\win32com\client\makepy.py';
   typeList = selecttlb.EnumTlbs();
   for tl in typeList:
      if (re.match('^Microsoft.*Excel.*', tl.desc, re.IGNORECASE)):
          makepyCmd = "%s -d \"%s\"" % (makepyExe, tl.desc);
          os.system(makepyCmd);
      # end if
   # end for
# end def

def getSheetName(sheetNum):
   try:
      xl = win32com.client.Dispatch("Excel.Application");
      wb = xl.Workbooks.Item(sheetNum);
   except Exception, detail:
      print 'There was a problem attaching to Excel, refreshing connect config...';
      print Exception, str(detail);
      attachExcelCOM();
      try:
         xl = win32com.client.Dispatch("Excel.Application");
         wb = xl.Workbooks.Item(sheetNum);
      except:
         print 'Could not attach to Excel...';
         sys.exit(-1);
      # end try/except
   # end try/except

   wsName = wb.Name;
   if (wsName == 'PERSONAL.XLS'):
      return( None );
   # end if
   print 'The target worksheet is:';
   print '      ', wsName;
   print 'Is this correct? [Y/N]',;
   answer = string.strip( sys.stdin.readline() );
   answer = answer.upper();
   if (answer != 'Y'):
      print 'Sheet not identified correctly.';
      return(None);
   # end if
   return( (wb, wsName) );
# end def

# -- Main --
sheetInfo = getSheetName(sheetNum);
if (sheetInfo == None):
   print 'Sheet not found';
   sys.exit(-1);
else:
   (wb, wsName) = sheetInfo;
# end if

【讨论】：