将 matplotlib 图像从 EC2 实例保存到 S3 存储桶答案

【问题标题】：saving matplotlib image to S3 bucket from EC2 instance将 matplotlib 图像从 EC2 实例保存到 S3 存储桶
【发布时间】：2020-12-22 17:34:20
【问题描述】：

我正在尝试将 matplotlib 保存到我在 AWS 上的 S3 存储桶中。我像这样使用savefig() 函数：

import matplotlib.pyplot as plt

f = plt.figure()
plt.plot(some figure)
f.savefig("s3://bucketpath/foo.pdf", bbox_inches='tight')

但我得到了path not found error。如果我不指定路径，它似乎可以正常工作，但我不知道它保存在哪里。

我正在使用 sagemaker jupyterlab 运行我的代码（在 pyspark 中），因此在其中一个 EC2 实例上运行。有没有办法指定保存 pdf 的路径，以便在将数据帧保存到 S3 存储桶时如何使用 write() 函数？

我在这个网站上看到了this 的帖子，但它是用于使用 boto 从您的本地客户端上传到云上的 S3。有没有办法在不使用 aws 访问密钥等的情况下将其直接保存到 S3？

【问题讨论】：

这在 EC2 中实际上是不可能的吗？

标签： matplotlib amazon-s3 amazon-ec2 pyspark

【解决方案1】：

我在 AWS EMR 上运行的 Jupyter Notebook 上遇到了类似的问题，同时尝试将另一种二进制文件格式 (png) 保存到 S3。我通过使用 s3fs 库与 S3 连接解决了这个问题。

使用您的示例，它应该如下所示：

import io

import matplotlib.pyplot as plt
import s3fs

plt.plot(some figure)

img_data = io.BytesIO()
plt.savefig(img_data, format='pdf', bbox_inches='tight')
img_data.seek(0)

s3 = s3fs.S3FileSystem(anon=False)  # Uses default credentials
with s3.open('s3://bucketpath/foo.pdf', 'wb') as f:
    f.write(img_data.getbuffer())

我注意到您正在研究 Sagemaker JupyterLab，但查看 s3fs 文档，我相信它也会起作用。

我的解决方案基于the answer you mentioned in your question 和s3fs documentation

【讨论】：