Boto3 S3 资源卡在“Object.get”方法上答案

【问题标题】：Boto3 S3 resource get stuck on "Object.get" methodBoto3 S3 资源卡在“Object.get”方法上
【发布时间】：2019-08-22 18:54:09
【问题描述】：

我尝试使用 boto3 库的“Object.get()”方法从 S3 资源中获取一个泡菜文件同时从多个进程。这会导致我的程序卡在其中一个进程（未引发异常并且程序不会继续到下一行）。

我尝试将“配置”变量添加到 S3 连接。这没有帮助。

import pickle
import boto3
from botocore.client import Config

s3_item = _get_s3_name(descriptor_key)  # Returns a path string of the desiered file
config = Config(connect_timeout=5, retries={'max_attempts': 0})
s3 = boto3.resource('s3', config=config)
bucket_uri = os.environ.get(*ct.S3_MICRO_SERVICE_BUCKET_URI)  # Returns a string of the bucket URI
estimator_factory_logger.debug(f"Calling s3 with item {s3_item} from URI {bucket_uri}")
model_file_from_s3 = s3.Bucket(bucket_uri).Object(s3_item)
estimator_factory_logger.debug("Loading bytes...")
model_content = model_file_from_s3.get()['Body'].read()  # <- Program gets stuck here
estimator_factory_logger.debug("Loading from pickle...")
est = pickle.loads(model_content)

没有引发错误消息。 “get”方法似乎陷入了僵局。

您的帮助将不胜感激。

【问题讨论】：

标签： python amazon-web-services amazon-s3 boto3

【解决方案1】：

是否有可能存储桶中的一个文件太大而程序需要很长时间才能读取？

如果是这种情况，作为调试步骤，我会查看model_file_from_s3.get()['Body'] 对象，即botocore.response.StreamingBody 对象，并在其上使用set_socket_timeout() 来尝试强制超时。

https://botocore.amazonaws.com/v1/documentation/api/latest/reference/response.html

【讨论】：

【解决方案2】：

问题是我们在主进程中打开了几个线程后创建了一个子进程。显然，这在 Linux 中是一个很大的禁忌。我们通过使用“spawn”而不是“fork”来修复它

【讨论】：