PDFkit 无法读取 html 以转换为 pdf（html 文件存在）答案

【问题标题】：PDFkit cannot read html to convert to pdf (the html file exists)PDFkit 无法读取 html 以转换为 pdf（html 文件存在）
【发布时间】：2021-12-12 07:33:35
【问题描述】：

注意：使用 Python。 我正在尝试使用 pdfkit.from_file 命令将 html 转换为 pdf。这是我的输入：

html_path="abfss://container@DataLakeName.dfs.core.windows.net/user/trusted-service-user/for_html/htmltest.html"
pdf_path = "abfss://container@DataLakeName.dfs.core.windows.net/user/trusted-service-user/for_html/htmltest.pdf"

我正在使用的命令：

        pdfkit.from_file(html_path, pdf_path, options = myoptions)

我的输出是：

 No such file: abfss://container@DataLakeName.dfs.core.windows.net/user/trusted-service-user/for_html/htmltest.html

对于上下文： 在此之前，我使用 mssparkutils.fs.put() 将 html 文件放在那里。所以系统可以写，但不能读？这令人困惑。

我尝试过的其他方法：

pdfkit.from_string()
pdfkit.from_url()

他们给出的错误：

No wkhtmltopdf executable found: "b''"

【问题讨论】：

标签： python-3.x apache-spark-sql pdfkit azure-synapse azure-data-lake-gen2

【解决方案1】：

正如您所提到的，您可以写入文件但无法读取。这意味着在 Databricks 上安装存储帐户没有问题。错误No wkhtmltopdf executable found: "b''" 表示需要安装WKHTMLTOPDF 二进制文件：

https://wkhtmltopdf.org/downloads.html

更多相关详情，可以访问https://github.com/JazzCore/python-pdfkit/wiki/Installing-wkhtmltopdf

【讨论】：

感谢您的回复。 WKHTMLTOPDF 是工作区中 requirements.txt 文件的一部分。因此，这是在 Azure Synapse 中启动群集时安装的。另请注意：我没有使用数据块，也没有安装任何东西。我正在使用 Azure Synapse，它具有 Azure Data Lake Storage gen2 的默认存储帐户。还有什么建议吗？