【问题标题】:Memory Error while trying to read csv on AWS尝试在 AWS 上读取 csv 时出现内存错误
【发布时间】:2018-06-03 00:09:14
【问题描述】:

当我运行以下代码时,出现错误:

import os
import boto3
import pandas as pd
import sys

if sys.version_info[0] < 3: 
    from StringIO import StringIO # Python 2.x
else:
    from io import StringIO # Python 3.x

# get your credentials from environment variables
aws_id = 'XX'
aws_secret = 'YY'

client = boto3.client('s3', aws_access_key_id=aws_id,
        aws_secret_access_key=aws_secret)

bucket_name = 'arpbhatnagar'

object_key = 'application_train.csv'
csv_obj = client.get_object(Bucket=bucket_name, Key=object_key)
body = csv_obj['Body']
csv_string = body.read().decode('utf-8')

train = pd.read_csv(StringIO(csv_string))

我收到以下错误:

错误:MemoryError Traceback(大多数 最近通话最后)在() 21 csv_obj = client.get_object(Bucket=bucket_name, Key=object_key) 22 身体 = csv_obj['身体'] ---> 23 csv_string = body.read().decode('utf-8') 24 25 火车 = pd.read_csv(StringIO(csv_string),low_memory=True,engine='python')

/usr/lib/python2.7/encodings/utf_8.pyc 在解码(输入,错误) 14 15 def 解码(输入,错误='严格'): ---> 16 返回 codecs.utf_8_decode(输入,错误,真) 17 18类IncrementalEncoder(codecs.IncrementalEncoder):

内存错误:

【问题讨论】:

  • application_train.csv 有多大?

标签: python amazon-ec2


【解决方案1】:

您在下载或提取 application_train.csv 时似乎内存不足。要解决这个问题,您可以先将文件下载到您的磁盘,然后将文件名提供给 Pandas:

tmp_filename = "/tmp/application_train.csv"
client.download_file(bucket_name, object_key, tmp_filename)
training_set = pd.read_csv(tmp_filename)

【讨论】:

    猜你喜欢
    • 2020-06-20
    • 2015-02-13
    • 2021-08-20
    • 1970-01-01
    • 2020-12-30
    • 1970-01-01
    • 2012-01-22
    • 2016-10-06
    • 2017-02-11
    相关资源
    最近更新 更多