【问题标题】:Convert pdf to docx format in python在python中将pdf转换为docx格式
【发布时间】:2021-01-28 14:37:35
【问题描述】:

请问如何将 pdf 转换为 docx。我尝试使用 pdfminer 转换为 html 以提取文本,但看起来仍然不够好。

【问题讨论】:

  • 您能详细解释一下您要解决的问题吗?

标签: pdf docx python-docx pdfminer


【解决方案1】:

pdf2docx

  1. 安装pdf2docx包点击here

安装

  • 克隆或下载 pdf2docx

     pip install pdf2docx
         or
     # download the package and install your environment
     python setup.py install 
    
  • 选项 1

    from pdf2docx import Converter
    
    pdf_file  = r'C:\Users\ABCD\Desktop\XYZ/Document1.pdf'# source file 
    docx_file = r'C:\Users\ABCD\Desktop\XYZ/sample.docx'  # destination file
    
    # convert pdf to docx
    cv = Converter(pdf_file)
    cv.convert(docx_file, start=0, end=None)
    cv.close()
    
    #Output
    
    Parsing Page 53: 53/53...
    Creating Page 53: 53/53...
    --------------------------------------------------
    Terminated in 6.258919400000195s.
    
  • 选项 2

    from pdf2docx import parse
    
    pdf_file  = r'C:\Users\ABCD\Desktop\XYZ/Document2.pdf' # source file
    docx_file = r'C:\Users\ABCD\Desktop\XYZ/sample_2.docx' # destination file
    
    # convert pdf to docx
    parse(pdf_file, docx_file, start=0, end=None)
    
    # output
    Parsing Page 53: 53/53...
    Creating Page 53: 53/53...
    --------------------------------------------------
    Terminated in 5.883666100000482s.
    

【讨论】:

  • Linux 是否支持此功能?
猜你喜欢
  • 1970-01-01
  • 2021-10-03
  • 2015-10-18
  • 2018-10-02
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2017-04-30
相关资源
最近更新 更多