如何处理大型excel文件中的内存错误答案

【问题标题】：How to handle memory error in large excel files如何处理大型excel文件中的内存错误
【发布时间】：2020-08-10 19:40:17
【问题描述】：

#我的任务是读取这个文件文件创建一个数据透视表，所以在根据列名读取文件时我得到一个内存错误有没有办法处理它

import os
    import xlrd
    import pandas as pd
    import openpyxl as xl
    ##get current directory
    cwd= os.getcwd()
    print("This script is to update")
    print("Starting to populate the Data Validation File")
    filename_1 =[os.path.join(root, f) for root, _, files in os.walk(os.getcwd())
                           for f in files
                           if f.startswith('1') and f.endswith('.xlsx')]
    filename_1=filename_1[0]

    book_1 = xl.load_workbook(filename_1)
    ws = book_1["J"]
    print("Loaded the filename:")
    for filename in os.listdir(cwd):
        if filename.endswith('.xlsx') and filename.startswith(('Seeep_')):
            book = pd.ExcelFile(filename)
            for sheet in book.sheet_names:
   here i am i am getting the memory error how to handle this the file is really big**


    df = book.parse(sheet)
                if 'comp' in df.columns:
                    df.columns = df.columns.str.replace(' ', '')
                    df['comp1']=df['com']
                    print("For "+filename+"\n"+" Creating the Pivot Table")
                    book2=df[(df.Iy!='SE')].pivot_table(values='comp',index=['os','sv'], columns='comp', aggfunc='count',margins=True)
                    print(book2)
                    book3=df.pivot_table(values='compl',index=['Ring','os','sv'], columns='comp', aggfunc='count',margins=True)

【问题讨论】：

在这个仍在挣扎中的任何帮助

标签： python-3.x excel pandas openpyxl

【解决方案1】：

我建议：

将 Excel 工作表转换为单独的 CSV。例如。使用https://openpyxl.readthedocs.io/en/stable/optimized.html#read-only-mode 遍历行并使用Python 的CSV 编写API 将它们写入CSV https://docs.python.org/3/library/csv.html#csv.writer
然后使用 Pandas 加载 CSV，确保只加载您真正需要的列，并设置列类型以使用更少的内存。 https://pythonspeed.com/articles/pandas-load-less-data/ 介绍了如何执行此操作。

【讨论】：

无需通过 CSV：直接以只读模式创建数据帧。
直接创建数据帧可能仍会占用过多内存。但是，是的，这可能是一种选择。
这里真的没有必要通过 CSV 进行，只读模式不会让你做任何事情。