【问题标题】:Can not read csv with pandas in azure functions with python无法使用 python 在 azure 函数中使用 pandas 读取 csv
【发布时间】:2022-11-28 11:28:02
【问题描述】:

我在 python 中的 Azure 函数中创建了一个 Azure Blob 存储触发器。 一个 CSV 文件添加到 blob 存储中,我尝试用 pandas 读取它。

import logging
import pandas as pd

import azure.functions as func


def main(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob \n"
                 f"Name: {myblob.name}\n"
                 f"Blob Size: {myblob.length} bytes")

    df_new = pd.read_csv(myblob)
    print(df_new.head())

如果我将myblob传递给pd.read_csv,那么我得到UnsupportedOperation: read1

Python blob trigger function processed blob 
Name: samples-workitems/Data_26112022_080027.csv
Blob Size: None bytes
[2022-11-27T16:19:25.650Z] Executed 'Functions.BlobTrigger1' (Failed, Id=2df388f5-a8dc-4554-80fa-f809cfaeedfe, Duration=1472ms)
[2022-11-27T16:19:25.655Z] System.Private.CoreLib: Exception while executing function: Functions.BlobTrigger1. System.Private.CoreLib: Result: Failure
Exception: UnsupportedOperation: read1

如果我通过myblob.read()

df_new = pd.read_csv(myblob.read())

它给TypeError: Expected file path name or file-like object, got <class 'bytes'> type

Python blob trigger function processed blob 
Name: samples-workitems/Data_26112022_080027.csv
Blob Size: None bytes
[2022-11-27T16:09:56.513Z] Executed 'Functions.BlobTrigger1' (Failed, Id=e3825c28-7538-4e30-bad2-2526f9811697, Duration=1468ms)
[2022-11-27T16:09:56.518Z] System.Private.CoreLib: Exception while executing function: Functions.BlobTrigger1. System.Private.CoreLib: Result: Failure
Exception: TypeError: Expected file path name or file-like object, got <class 'bytes'> type

来自Azure functions Docs

InputStream 是表示输入 blob 的类文件对象。

来自Pandas read_csv Docs

read_csv 采用 filepath_or_bufferstr、路径对象或类文件对象

所以从技术上讲,我应该阅读这个对象。我在这里缺少什么拼图?

【问题讨论】:

  • pd.read_csv 函数应获取带路径的文件名。 myblob 包含什么?
  • 我上传了 Data_26112022_080027.csv
  • Python blob trigger function processed blob Name: samples-workitems/Data_26112022_080027.csv Blob Size: None bytes
  • 这是异常发生前的输出。
  • 我也将输出添加到问题中:)

标签: python pandas azure


【解决方案1】:

如果你参考这个article,它说这段代码可以工作。但是当整个文件进入内存时,建议将其用于较小的文件。不建议用于较大的文件。

import logging
import pandas as pd

import azure.functions as func
from io import BytesIO

def main(myblob: func.InputStream):
    logging.info(f"Python blob trigger function processed blob 
"
                 f"Name: {myblob.name}
"
                 f"Blob Size: {myblob.length} bytes")
    df_new = pd.read_csv(BytesIO(myblob.read()))
    print(df_new.head())

【讨论】:

    猜你喜欢
    • 2021-06-05
    • 1970-01-01
    • 2019-03-18
    • 2019-07-23
    • 2015-08-30
    • 1970-01-01
    • 1970-01-01
    • 2021-11-30
    • 2021-12-06
    相关资源
    最近更新 更多