错误：模块“pandas”没有属性“read_pdf”答案

【问题标题】：Error: module 'pandas' has no attribute 'read_pdf'错误：模块“pandas”没有属性“read_pdf”
【发布时间】：2021-12-10 16:04:13
【问题描述】：

当使用从 pandas 导入 read_pdf 方法时

import pandas as pd

如示例所示

它显示以下错误消息

AttributeError: AttributeError: module 'pandas' 没有属性 'read_pdf'

环境

python --version: python 3.8.8
OS and it's version: ? windows 10
Anaconda (version 1.7.2)

我试图从已经存在的文件系统中读取 .pdf / .docx / .txt 类型的文件。

示例代码：

import pandas as pd
import os # Os moduel for Operating System opertions
import mimetypes

# To change the current working directory to a new directory we use
# os.chdir("Directory path")
os.chdir("C:\\Users\\adity\\Documents\\Parent") 

# To List the files and folders in the current working directory
# Return the file in form of List
fid = os.listdir() #filesInDirectory

# To check whether child 1 2 3 exist or not
def checkChild(d):
    if len(d) == 0:
        return False
    if 'Child1' in d and 'Child2' in d and 'Child3' in d:
        return True
    else:
        return False

# dc1 Directory Child One; dof Dictionary File System
# nof name of file , ext Extension

if checkChild(fid) == True: # if folders are there than read respective files
    for folder in fid:
        fileDir = folder
        os.chdir(directory+f"\\{fileDir}") # Changing directory to respective child
        dc = os.listdir()[0] # dc contains the name of file with extension
        nof,ext = os.path.splitext(dc)        
        if ext =='':
            ext = mimetypes.guess_extension(os.getcwd())
        
        if ext == '.pdf' and fileDir == 'Child1':
            child1Pdf = pd.read_pdf(f'{dc}')  #**Error Line**

错误输出：

*AttributeError Traceback（最近一次调用最后一次）在 9 打印（直流） 10 如果 ext == '.pdf' 和 fileDir == 'Child1'： ---> 11 child1Pdf = pd.read_pdf(f'{dc}') 12 13

~\anaconda3\lib\site-packages\pandas_init_.py in getattr（名称）第242章 243 --> 244 raise AttributeError(f"module 'pandas' has no attribute '{name}'") 245 246

AttributeError: 模块 'pandas' 没有属性 'read_pdf'*

我没有得到任何解决此错误的方法

【问题讨论】：

pandas 有read_pdf 功能吗？我不这么认为。
这可能会有所帮助，stackoverflow.com/a/50053943/4985099
Pandas 本身不支持从 pdf 读取。这是所有可用的列表IO operations

标签： python python-3.x pandas attributeerror

【解决方案1】：

如果您将 pdf 数据作为表格数据导入

import tabula
import pandas as pd

#declare the path of your file
file_path = "/path/to/pdf_file/data.pdf

#Convert your file
df = tabula.read_pdf(file_path)

【讨论】：

【解决方案2】：

如果只是将 PDF 读取为 pandas 数据框，那么请查看 here，答案清楚地显示了如何使用 tabula 将 pdf 读取为 pandas 数据框

import tabula

df = tabula.read_pdf(
    "https://github.com/chezou/tabula-py/raw/master/tests/resources/data.pdf")

df_0 = df[0]

print("type of df :", type(df))
print("type of df_0", type(df_0))

输出

type of df : <class 'list'>
type of df_0 <class 'pandas.core.frame.DataFrame'>

糟糕的是，我刚刚在 python 中探索了实际的包 tabula 并意识到那里没有 read_pdf，所以在深入研究并重现问题后，我的最终实验产生了愚蠢的结果

使用tabula-io使用函数read_pdf
安装运行命令 pip install tabula-py

使用以下代码示例

from tabula.io import read_pdf
df = read_pdf("https://github.com/chezou/tabula-py/raw/master/tests/resources/data.pdf")
df_0 = df[0]
print("type of df :", type(df))

输出

【讨论】：

得到一个 AttributeError：模块 'tabula' 没有属性 'read_pdf'
你说得对，我刚刚复制了这个问题，更新了我的答案