快速猴子补丁,没有转换器或类似的东西,如果你想将所有带有超链接的单元格视为超链接,我想更复杂的方式,至少能够选择,哪些列被视为超链接或收集数据,或以某种方式将数据和超链接保存在数据框的同一单元格中。并使用转换器,不知道。 (顺便说一句,我也玩过data_only,keep_links,没有帮助,只更改了read_only 结果没问题,我想它会减慢你的代码速度)。
P.S.:仅适用于 xlsx,即引擎是 openpyxl
P.P.S.:如果您以后阅读此评论并发出 https://github.com/pandas-dev/pandas/issues/13439 仍然打开,请不要忘记在 pandas.io.excel._openpyxl 上查看 _convert_cell 和 load_workbook 的更改并相应地更新它们。
import pandas
from pandas.io.excel._openpyxl import OpenpyxlReader
import numpy as np
from pandas._typing import FilePathOrBuffer, Scalar
def _convert_cell(self, cell, convert_float: bool) -> Scalar:
from openpyxl.cell.cell import TYPE_BOOL, TYPE_ERROR, TYPE_NUMERIC
# here we adding this hyperlink support:
if cell.hyperlink and cell.hyperlink.target:
return cell.hyperlink.target
# just for example, you able to return both value and hyperlink,
# comment return above and uncomment return below
# btw this may hurt you on parsing values, if symbols "|||" in value or hyperlink.
# return f'{cell.value}|||{cell.hyperlink.target}'
# here starts original code, except for "if" became "elif"
elif cell.is_date:
return cell.value
elif cell.data_type == TYPE_ERROR:
return np.nan
elif cell.data_type == TYPE_BOOL:
return bool(cell.value)
elif cell.value is None:
return "" # compat with xlrd
elif cell.data_type == TYPE_NUMERIC:
# GH5394
if convert_float:
val = int(cell.value)
if val == cell.value:
return val
else:
return float(cell.value)
return cell.value
def load_workbook(self, filepath_or_buffer: FilePathOrBuffer):
from openpyxl import load_workbook
# had to change read_only to False:
return load_workbook(
filepath_or_buffer, read_only=False, data_only=True, keep_links=False
)
OpenpyxlReader._convert_cell = _convert_cell
OpenpyxlReader.load_workbook = load_workbook
在您的 python 文件中添加上述内容后,您将能够调用df = pandas.read_excel(input_file)
写完所有这些东西后,我想到了,也许只使用openpyxl会更容易和更清洁^_^