【问题标题】:Find text within a PDF file, copy, and paste it in a spreadsheet在 PDF 文件中查找文本,将其复制并粘贴到电子表格中
【发布时间】:2016-07-05 19:42:36
【问题描述】:

我正在使用 VBA 将文本从 PDF 文件提取到 xls 电子表格。

文本总是相同的“X 价格”、“Y 价格”、“Z 价格”

我需要在电子表格中查找、复制和粘贴它们。

我没有找到任何类似的主题。

【问题讨论】:

  • 有很多信息可以完成 PDF 文本搜索。查看hereherehere。为了获得更具体的帮助,您必须编写自己的代码并将其发布并询问无法工作的特定部分。
  • 这将是我第一次自动化 PDF/VBA。到目前为止,我还没有尝试过任何具体的事情。我要检查链接,并写下我的代码。
  • 应该在 VBA 项目中激活哪些引用?

标签: excel vba pdf


【解决方案1】:

我认为最好的办法是将 PDF 转换为文本文件(另存为文本文件)并将文本文件导入 Excel。

您可以谷歌如何做到这一点;这很容易,对你来说将是一个很好的学习练习。如果您有其他问题,请回复。

【讨论】:

  • 我认为这是个好主意。唯一的问题是它必须转换的 pdf 文件的数量(大约 400 个),并将文本导入 excel。我会谷歌如何做到这一点,我会尝试找到一种更简单的方法。
  • 我一直在研究API,这是一种很难编码的方法。我仍然认为 API 可以轻松地将文本导入到 excel 中。
  • 我仍然认为 API 可以使用 FindWindow、SetForegroundWindow、SendMessage 和 PostMessage 函数轻松地将文本导入 excel。您是否有在 PDF 文件中工作的这些功能的示例?对不起,额外的评论。我无法编辑上一个。
【解决方案2】:

如果您安装了 Adob​​e Acrobat,您可以将所有 PDF 文件转换为 Excel 文件。

除了主程序之外,我还编写了一个循环,以便一次转换多个 PDF 文件。因此,如果您有一个包含 PDF 文件的文件夹,您可以使用此工具获取它们的文件路径。然后,您可以使用附加的工作簿将它们转换为不同的格式。该代码实际上使用了 Adob​​e Professional 的另存为命令,以便将文件保存为所需的格式。可用的格式有:

eps
html and htm
jpeg, jpg and jpe
jpf, jpx, jp2, j2k, j2c and jpc
docx
doc
png
ps
rft
xlsx
xls
txt
tiff and tif
xml

VBA 代码

Option Explicit
Option Private Module

Sub SavePDFAsOtherFormat(PDFPath As String, FileExtension As String)

    'Saves a PDF file as another format using Adobe Professional.

    'By Christos Samaras
    'http://www.myengineeringworld.net

    'In order to use the macro you must enable the Acrobat library from VBA editor:
    'Go to Tools -> References -> Adobe Acrobat xx.0 Type Library, where xx depends
    'on your Acrobat Professional version (i.e. 9.0 or 10.0) you have installed to your PC.

    'Alternatively you can find it Tools -> References -> Browse and check for the path
    'C:\Program Files\Adobe\Acrobat xx.0\Acrobat\acrobat.tlb
    'where xx is your Acrobat version (i.e. 9.0 or 10.0 etc.).

    Dim objAcroApp      As Acrobat.AcroApp
    Dim objAcroAVDoc    As Acrobat.AcroAVDoc
    Dim objAcroPDDoc    As Acrobat.AcroPDDoc
    Dim objJSO          As Object
    Dim boResult        As Boolean
    Dim ExportFormat    As String
    Dim NewFilePath     As String

    'Check if the file exists.
    If Dir(PDFPath) = "" Then
        MsgBox "Cannot find the PDF file!" & vbCrLf & "Check the PDF path and retry.", _
                vbCritical, "File Path Error"
        Exit Sub
    End If

    'Check if the input file is a PDF file.
    If LCase(Right(PDFPath, 3)) <> "pdf" Then
        MsgBox "The input file is not a PDF file!", vbCritical, "File Type Error"
        Exit Sub
    End If

    'Initialize Acrobat by creating App object.
    Set objAcroApp = CreateObject("AcroExch.App")

    'Set AVDoc object.
    Set objAcroAVDoc = CreateObject("AcroExch.AVDoc")

    'Open the PDF file.
    boResult = objAcroAVDoc.Open(PDFPath, "")

    'Set the PDDoc object.
    Set objAcroPDDoc = objAcroAVDoc.GetPDDoc

    'Set the JS Object - Java Script Object.
    Set objJSO = objAcroPDDoc.GetJSObject

    'Check the type of conversion.
    Select Case LCase(FileExtension)
        Case "eps": ExportFormat = "com.adobe.acrobat.eps"
        Case "html", "htm": ExportFormat = "com.adobe.acrobat.html"
        Case "jpeg", "jpg", "jpe": ExportFormat = "com.adobe.acrobat.jpeg"
        Case "jpf", "jpx", "jp2", "j2k", "j2c", "jpc": ExportFormat = "com.adobe.acrobat.jp2k"
        Case "docx": ExportFormat = "com.adobe.acrobat.docx"
        Case "doc": ExportFormat = "com.adobe.acrobat.doc"
        Case "png": ExportFormat = "com.adobe.acrobat.png"
        Case "ps": ExportFormat = "com.adobe.acrobat.ps"
        Case "rft": ExportFormat = "com.adobe.acrobat.rft"
        Case "xlsx": ExportFormat = "com.adobe.acrobat.xlsx"
        Case "xls": ExportFormat = "com.adobe.acrobat.spreadsheet"
        Case "txt": ExportFormat = "com.adobe.acrobat.accesstext"
        Case "tiff", "tif": ExportFormat = "com.adobe.acrobat.tiff"
        Case "xml": ExportFormat = "com.adobe.acrobat.xml-1-00"
        Case Else: ExportFormat = "Wrong Input"
    End Select

    'Check if the format is correct and there are no errors.
    If ExportFormat <> "Wrong Input" And Err.Number = 0 Then

        'Format is correct and no errors.

        'Set the path of the new file. Note that Adobe instead of xls uses xml files.
        'That's why here the xls extension changes to xml.
        If LCase(FileExtension) <> "xls" Then
            NewFilePath = WorksheetFunction.Substitute(PDFPath, ".pdf", "." & LCase(FileExtension))
        Else
            NewFilePath = WorksheetFunction.Substitute(PDFPath, ".pdf", ".xml")
        End If

        'Save PDF file to the new format.
        boResult = objJSO.SaveAs(NewFilePath, ExportFormat)

        'Close the PDF file without saving the changes.
        boResult = objAcroAVDoc.Close(True)

        'Close the Acrobat application.
        boResult = objAcroApp.Exit

        'Inform the user that conversion was successfully.
        MsgBox "The PDf file:" & vbNewLine & PDFPath & vbNewLine & vbNewLine & _
        "Was saved as: " & vbNewLine & NewFilePath, vbInformation, "Conversion finished successfully"

    Else

        'Something went wrong, so close the PDF file and the application.

        'Close the PDF file without saving the changes.
        boResult = objAcroAVDoc.Close(True)

        'Close the Acrobat application.
        boResult = objAcroApp.Exit

        'Inform the user that something went wrong.
        MsgBox "Something went wrong!" & vbNewLine & "The conversion of the following PDF file FAILED:" & _
        vbNewLine & PDFPath, vbInformation, "Conversion failed"

    End If

    'Release the objects.
    Set objAcroPDDoc = Nothing
    Set objAcroAVDoc = Nothing
    Set objAcroApp = Nothing

End Sub

这是循环遍历工作表“路径”的“B”列中包含的所有文件路径并将 PDF 文件转换为不同文件类型的宏。宏 ExportAllPDFs 使用 SavePDFAsOtherFormatNoMsg 宏,它类似于 SavePDFAsOtherFormat 宏,但没有消息框。

Sub ExportAllPDFs()

    'Convert all the PDF files that their paths are on column B of
    'the worksheet "Paths" into a different file format.
    'By Christos Samaras
    'http://www.myengineeringworld.net

    Dim FileFormat As String
    Dim LastRow As Long
    Dim i As Integer

    'Change this according to your own needs.
    'Available formats: eps html, htm jpeg, jpg, jpe jpf, jpx, jp2,
    'j2k, j2c, jpc, docx, doc, png, ps, rft, xlsx, xls, txt, tiff, tif and xml.
    'In this example the PDF file will be saved as text file.
    FileFormat = "txt"

    If FileFormat = "" Then
        shPaths.Range("B2").Select
        MsgBox "There are no file paths to convert!", vbInformation, "File paths missing"
        Exit Sub
    End If

    shPaths.Activate

    'Find the last row.
    With shPaths
        LastRow = .Cells(.Rows.Count, "B").End(xlUp).Row
    End With

    'Check that there are available file paths.
    If LastRow < 2 Then
        shPaths.Range("B2").Select
        MsgBox "There are no file paths to convert!", vbInformation, "File paths missing"
        Exit Sub
    End If

    'For each cell in the range "B2:B" & last row convert the pdf file
    'into different format (here to text - txt).
    For i = 2 To LastRow
        SavePDFAsOtherFormatNoMsg Cells(i, 2).Value, FileFormat
    Next i

    'Inform the user that conversion finished.
    MsgBox "All files were converted successfully!", vbInformation, "Finished"

End Sub

http://www.myengineeringworld.net/2013/03/vba-macro-to-convert-pdf-files-into.html

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2023-03-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2012-03-04
    相关资源
    最近更新 更多