【发布时间】:2021-07-20 06:03:15
【问题描述】:
我正在尝试从 S3 获取文件并将其读取到 python。对象返回为botocore.response.Streamingbody。通常可以使用strmingbody.read() 方法读取。但是当我尝试使用 read 时,它会抛出一个Overflowerror: Python int too large to convert to C long。
互联网上所有其他可用的解决方案都建议将int 转换为int64 或float64。但是由于这个错误,我不能首先使用.read()。我们甚至尝试腌制 csv 并发送,但这也不起作用。
import boto3
import pandas as pd
def get_cx_data():
""" Get cx data
Returns:
Pandas DataFrame: CX index DataFrame
"""
client = boto3.client('s3',
aws_access_key_id = 'key_id_here',
aws_secret_access_key = 'secret_key_here',
region_name = 'us-east-2')
obj = client.get_object(
Bucket = 'bucket name',
Key = 'key_here')
print(type(obj))
print(obj['Body'])
file_ = obj['Body'].read() #throws_overflowerror
with open('training_data.csv', 'w') as file:
file.write(obj['Body'].read()) #throws_overflowerror
# combine_inde_dep_vars_featools.pkl
# Read data from the S3 object
#data = pandas.read_csv(obj['Body'])
# df_cx_index = pd.read_pickle("combine_inde_dep_vars_featools.pkl")
df_cx_index = pd.read_csv(io.BytesIO(obj['Body'].read())) #throws_overflowerror
print(df_cx_index.head())
return df_cx_index
Traceback 如下所示
<class 'dict'>
<botocore.response.StreamingBody object at 0x0000027EB0533A60>
Traceback (most recent call last):
File "C:/my_folder/git repos/collections_completed_checklist_items/save_csv.py", line 22, in <module>
get_cx_data()
File "C:/my_folder/git repos/collections_completed_checklist_items/save_csv.py", line 18, in get_cx_data
file_ = obj['Body'].read()
File "C:\CX_codes\environments\collections_completed_checklist_items\lib\site-packages\botocore\response.py", line 77, in read
chunk = self._raw_stream.read(amt)
File "C:\CX_codes\environments\collections_completed_checklist_items\lib\site-packages\urllib3\response.py", line 515, in read
data = self._fp.read() if not fp_closed else b""
File "C:\Users\a.mundachal\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 468, in read
s = self._safe_read(self.length)
File "C:\Users\a.mundachal\AppData\Local\Programs\Python\Python38\lib\http\client.py", line 609, in _safe_read
data = self.fp.read(amt)
File "C:\Users\a.mundachal\AppData\Local\Programs\Python\Python38\lib\socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "C:\Users\a.mundachal\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "C:\Users\a.mundachal\AppData\Local\Programs\Python\Python38\lib\ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
OverflowError: Python int too large to convert to C long
有没有其他方法可以在不使用.read() 的情况下读取botocore.response.StreamingBody 对象或将其保存为csv。或者是否有任何解决方法可以使用.read() 而不会获得OverflowError?
【问题讨论】:
-
training_data.csv中是否有任何可重现的数据示例?导致问题的那些文件中有什么? -
csv 主要包含用于训练机器学习模型的数值和一些分类值。它有 228945 行记录,60 多列。
标签: python amazon-s3 boto3 buffer-overflow integer-overflow