【问题标题】:Concat a single Sheet from Multiple Excel Files whilst handling files with missing sheets在处理缺少工作表的文件时从多个 Excel 文件中连接单个工作表
【发布时间】:2020-08-04 21:54:50
【问题描述】:

晚安,堆栈!

我遇到了一个与 Python 上的 pandas 库相关的问题。我试图在多个 excel 文件(带有多个工作表)上自动执行批量追加/连接。但是,我无法弄清楚如何简单地跳过不包含该指定 sheet_name 的文件。有任何想法吗?我的代码如下:

PS1:当代码读取每个 xlsx 文件中的每个工作表时,我必须插入一个中断来结束迭代。

PS2:错误是:“XLRDError: No sheet named ”。

PS3:我想出了一个办法:通过在 for 循环之后放置一个 try,并为错误设置一个异常。但是,为此,我需要让其余的代码正常工作。

    import pandas as pd
    import os

    path = r'C:/Users/Thiago/Desktop/Backup/Python/Files test append xlsx'
    files = os.listdir(path)

    df = pd.DataFrame()
    xlsx_files = [path + '\\' + f for f in files if f[-4:] == 'xlsx']

    for i in xlsx_files:
       xlsx = pd.ExcelFile(i)
          for name in xlsx.sheet_names:
             data = pd.read_excel(i, header = 1, sheet_name = "CSNSC 2020")
             data['File'] = i
             print(i)
             df = df.append(data)
             break

    df = df[['Dt. Ref.','Convênio','Tipo de Atendimento','Venc.']]
    df.head()

    df = df.dropna(subset=['Convênio'])
    df.head()

    df.to_excel(r'C:/Users/Thiago/Desktop/Backup/Python/Files test append xlsx/out.xlsx')

谢谢!!

【问题讨论】:

    标签: python pandas


    【解决方案1】:

    我编写了这个简单的函数来连接 excel 文件并处理缺少工作表的 excel 文件。随意调整它以适应您自己的用例

    主要需要注意的是tryexcept 来处理错误。

    模块。

    import pandas as pd
    from pathlib import Path
    from xlrd import XLRDError
    

    在行动

    concat_excels(src,'Sheet2',trg)
    No sheet named <'Sheet2'> in file_0.xlsx, skipping
    No sheet named <'Sheet2'> in file_1.xlsx, skipping
    No sheet named <'Sheet2'> in file_2.xlsx, skipping
    No sheet named <'Sheet2'> in file_3.xlsx, skipping
    No sheet named <'Sheet2'> in file_4.xlsx, skipping
    No sheet named <'Sheet2'> in file_5.xlsx, skipping
    No sheet named <'Sheet2'> in file_6.xlsx, skipping
    No sheet named <'Sheet2'> in file_7.xlsx, skipping
    No sheet named <'Sheet2'> in file_8.xlsx, skipping
    No sheet named <'Sheet2'> in file_9.xlsx, skipping
    File Saved to C:\Users\DataNovice\OneDrive\Documents\2020\python\file_io_ops\move_files_test
    

    功能。

    def concat_excels(source_path, sheet_name, target_path):
    
        """ 
        A simple script to find excel files in a target 
        location and merge them into a single file.
        You need Python installed along with Pandas.
        pathlib is available in Python 3.4 + 
        error handling added.
        """
    
        # create list for excel files.
        excel_files = [file for file in Path(source_path).glob("*.xlsx")]
    
        # create empty list to store each individual dataframe.
        excel_dataframe = []
    
        # loop through our file to read each file and append it to our list.
    
        for file in excel_files:
            try:
                df = pd.read_excel(file, sheet_name=sheet_name)
                df.columns = df.columns.str.lower()  # lowercase all columns
                df.columns = (
                    df.columns.str.strip()
                )  # remove any trailing or leading white space.
                excel_dataframe.append(df)
            except XLRDError as err:
                print(f"{err} in {file.name}, skipping")
    
        try:
            final_dataframe = pd.concat(excel_dataframe, axis=1)
            final_dataframe.to_excel(target_path + "\master_file.xlsx", index=False)
    
            print(f"File Saved to {target_path}")
    
        except ValueError as err_2:
            print(
                f"No Sheets Matched in any of your excel files, are you sure {sheet_name} is correct?"
            )
        return excel_dataframe
    

    【讨论】:

    • 非常感谢,Datanovice!我编辑了我的帖子说我已经想出了一个这样的解决方案:使用尝试和异常。 =) 你成功了!
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-11-10
    • 1970-01-01
    • 1970-01-01
    • 2021-12-30
    • 1970-01-01
    相关资源
    最近更新 更多