【发布时间】:2018-07-19 12:59:55
【问题描述】:
我正在尝试使用 pdfminer.six 从 pdf 中提取文本,我按照here 中提到的以下代码进行操作
import pdfminer
import io
def extract_raw_text(pdf_filename):
output = io.StringIO()
laparams = pdfminer.layout.LAParams()
with open(pdf_filename, "rb") as pdffile:
pdfminer.high_level.extract_text_to_fp(pdffile, output, laparams=laparams)
return output.getvalue()
print(extract_raw_text('simple1.pdf'))
但它会产生错误
Traceback (most recent call last):
File "extract.py", line 13, in <module>
print(extract_raw_text('simple1.pdf'))
File "extract.py", line 6, in extract_raw_text
laparams = pdfminer.layout.LAParams()
AttributeError: module 'pdfminer' has no attribute 'layout'
我只是想从 pdf 中提取整个文本,我们将不胜感激。
【问题讨论】:
标签: python-3.x pdf text-extraction