如何在 Excel 文档单元格中查找文本子集的格式答案

【问题标题】：How do I find the formatting for a subset of text in an Excel document cell如何在 Excel 文档单元格中查找文本子集的格式
【发布时间】：2016-06-13 01:13:01
【问题描述】：

使用 Python，我需要在给定的 Excel 工作表单元格中查找粗体或斜体的所有子字符串。

我的问题是这样的：

Using XLRD module and Python to determine cell font style (italics or not)

..但该解决方案不适用于我，因为我不能假设单元格中的 all 内容具有相同的格式。单个单元格中的值可能如下所示：

1.一些粗体文本 一些普通文本。 一些斜体文字。

有没有办法使用 xlrd（或任何其他 Python Excel 模块）查找单元格中一系列字符的格式？

【问题讨论】：

标签： python xlrd

【解决方案1】：

感谢@Vyassa 提供的所有正确指针，我已经能够编写以下代码，它遍历 XLS 文件中的行并输出具有“单一”样式信息的单元格的样式信息（例如，整个单元格是斜体）或样式“段”（例如，单元格的一部分是斜体，一部分不是）。

import xlrd

# accessing Column 'C' in this example
COL_IDX = 2

book = xlrd.open_workbook('your-file.xls', formatting_info=True)
first_sheet = book.sheet_by_index(0)

for row_idx in range(first_sheet.nrows):
  text_cell = first_sheet.cell_value(row_idx, COL_IDX)
  text_cell_xf = book.xf_list[first_sheet.cell_xf_index(row_idx, COL_IDX)]

  # skip rows where cell is empty
  if not text_cell:
    continue
  print text_cell,

  text_cell_runlist = first_sheet.rich_text_runlist_map.get((row_idx, COL_IDX))
  if text_cell_runlist:
    print '(cell multi style) SEGMENTS:'
    segments = []
    for segment_idx in range(len(text_cell_runlist)):
      start = text_cell_runlist[segment_idx][0]
      # the last segment starts at given 'start' and ends at the end of the string
      end = None
      if segment_idx != len(text_cell_runlist) - 1:
        end = text_cell_runlist[segment_idx + 1][0]
      segment_text = text_cell[start:end]
      segments.append({
        'text': segment_text,
        'font': book.font_list[text_cell_runlist[segment_idx][1]]
      })
    # segments did not start at beginning, assume cell starts with text styled as the cell
    if text_cell_runlist[0][0] != 0:
      segments.insert(0, {
        'text': text_cell[:text_cell_runlist[0][0]],
        'font': book.font_list[text_cell_xf.font_index]
      })

    for segment in segments:
      print segment['text'],
      print 'italic:', segment['font'].italic,
      print 'bold:', segment['font'].bold

  else:
    print '(cell single style)',
    print 'italic:', book.font_list[text_cell_xf.font_index].italic,
    print 'bold:', book.font_list[text_cell_xf.font_index].bold

【讨论】：

【解决方案2】：

xlrd 可以做到这一点。您必须使用 kwarg formatting_info=True 调用 load_workbook()，然后工作表对象将具有属性 rich_text_runlist_map，它是将单元格坐标（(row, col) 元组）映射到该单元格的 runlist 的字典。运行列表是(offset, font_index) 对的序列，其中offset 告诉您字体在单元格中的哪个位置开始，font_index 索引到工作簿对象的font_list 属性（工作簿对象是load_workbook() 返回的内容），它为您提供了一个Font object，描述了字体的属性，包括粗体、斜体、字体、大小等。

【讨论】：

这有点手动，但我认为这是唯一有效的方法

【解决方案3】：

我不知道您是否可以使用 xlrd 做到这一点，但既然您询问任何其他 Python Excel 模块：openpyxl 不能在版本 1.6.1 中做到这一点。

富文本在函数get_string() 中重构为openpyxl/reader/strings.py。在该模块中设置带有“原始”字符串的第二个表相对容易。

【讨论】：