使用 openpyxl 加载工作表并忽略包含数据透视表的其他工作表答案

【问题标题】：Load a worksheet with openpyxl and ignore other worksheet which contains pivot table使用 openpyxl 加载工作表并忽略包含数据透视表的其他工作表
【发布时间】：2020-06-14 19:52:26
【问题描述】：

我有一个 .xlsx 文件，其中包含 2 个工作表。第一个包含常规数据（没什么花哨的），而第二个包含数据透视表。我只需要第一个工作表中的数据，而我想忽略第二个工作表，但是当调用 openpyxl.load_workbook 时，数据透视表会引发错误：TypeError: expected <type 'basestring'>。

错误出现在：openpyxl/reader/excel.py，在行：pivot_caches = parser.pivot_caches。

我尝试使用openpyxl 版本2.6.4 和2.5.1。我正在使用 Python 2.7。

删除第二张工作表后，错误消失了，并且正确读取了第一张工作表中的数据。但是，这些文件是由用户上传的，虽然我不需要数据透视表，但如果可能的话，我想避免强迫用户删除不必要的工作表。

示例代码：

from io import BytesIO

import openpyxl

pivot = '~/Downloads/file_with_pivot_tables.xlsx'

with open(pivot) as fin:
    content = BytesIO(fin.read())
    wb = openpyxl.load_workbook(content)  # this line fails
    ws = wb.get_sheet_by_name('Sheet1')

整个错误跟踪：

  File "/Users/gi/lib/openpyxl/reader/excel.py", line 224, in load_workbook
    pivot_caches = parser.pivot_caches
  File "/Users/gi/lib/openpyxl/packaging/workbook.py", line 125, in pivot_caches
    cache = get_rel(self.archive, self.rels, id=c.id, cls=CacheDefinition)
  File "/Users/gi/lib/openpyxl/packaging/relationship.py", line 162, in get_rel
    obj.deps = get_dependents(archive, rels_path)
  File "/Users/gi/lib/openpyxl/packaging/relationship.py", line 130, in get_dependents
    rels = RelationshipList.from_tree(node)
  File "/Users/gi/lib/openpyxl/descriptors/serialisable.py", line 84, in from_tree
    obj = desc.expected_type.from_tree(el)
  File "/Users/gi/lib/openpyxl/descriptors/serialisable.py", line 100, in from_tree
    return cls(**attrib)
  File "/Users/gi/lib/openpyxl/packaging/relationship.py", line 50, in __init__
    self.Target = Target
  File "/Users/gi/lib/openpyxl/descriptors/base.py", line 44, in __set__
    raise TypeError('expected ' + str(self.expected_type))
TypeError: expected <type 'basestring'>

【问题讨论】：

我从未使用过 BytesIO，所以我不确定你想用它做什么。我认为content 不是load_workbook 的可读文件路径，所以这可能会产生错误。为什么不直接引用文件路径呢？
请分享整个错误信息。你为什么用BytesIO？
@AMC 我添加了错误堆栈跟踪。我使用 BytesIO 是因为我无法直接从文件中读取，因为我工作的环境是 GCP。
@giliev 哦，原来代码不是这样的？
是的，但错误是一样的，即它可以用我在这里提供的代码重现

标签： python excel python-2.7 openpyxl xlsx

【解决方案1】：

您可以指定要操作的工作表：

wb = openpyxl.load_workbook('H:\\myfile.xlsx')
ws = wb['sheet1']
ws['E1'] = 'The sky is gray.'
wb.save('H:\\myfile.xlsx')
wb.close()

如果需要先检查，还可以获取所有工作表名称的列表：

print(wb.sheetnames)

【讨论】：

@giliev 如果没有看到您的代码，很难确定导致错误的原因。如果您使用代码编辑您的帖子，我可能会提供帮助。
我添加了代码 sn-p。将尝试提供文件，但需要创建新的数据透视表，以避免泄露真实信息。