【问题标题】:How to delete the first 4 rows from multiple excel files using python pandas (Dataframes)如何使用python pandas(Dataframes)从多个excel文件中删除前4行
【发布时间】:2020-01-13 22:36:57
【问题描述】:

目前我正在编写一个组合多个 Excel 电子表格的程序。

我想知道如何在合并之前从每个电子表格中删除前 4 行。下面是试图删除前 4 行的具体语句,但出现错误。

frames[0:] = [df.drop(df.index[[0,3]]) for df in frames[0:]]

下面是完整的程序

import tkinter as tk
from tkinter import filedialog
from pathlib import Path
import pandas as pd

root = tk.Tk()
root.withdraw()

files = filedialog.askopenfilenames()
print("--------------")
print(files)
ExcelFileNames = [Path(x).name for x in files]
print("--------------")
print(type(ExcelFileNames))
print("--------------")
print(ExcelFileNames)
print("--------------")
print (ExcelFileNames[0])
print("--------------")
print("Number of files is:", len(ExcelFileNames))

# read them in
excels = [pd.ExcelFile(name) for name in ExcelFileNames]

# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None) for x in excels]


frames[0:] = [df.drop(df.index[[0,3]]) for df in frames[0:]]
# delete the first row for all frames except the first
# i.e. remove the header row -- assumes it's the first
frames[1:] = [df[1:] for df in frames[1:]]

# concatenate them..
combined = pd.concat(frames)

# write it out
combined.to_excel("DNcombined.xlsx", header=False, index=False)

【问题讨论】:

  • 为什么不df[3:]

标签: python pandas dataframe


【解决方案1】:

IIUC,

您可以将skiprows 添加到您的参数中,以便在遍历您的列表时跳过这些行。

# read them in
excels = [pd.ExcelFile(name) for name in ExcelFileNames]

# turn them into dataframes
frames = [x.parse(x.sheet_names[0], header=None,index_col=None, skiprows=4) for x in excels]

【讨论】:

  • 这行得通,我将 skiprows = 14 更改为 skiprows 4,以跳过前 4 个。尽管在处理多个数据帧时,我会继续尝试让 dropframes 工作,但我非常感谢你的帮助!
  • @JhangirAwan 您可以使用.iloc 并将它们过滤掉,但是如果不看到您的示例数据是什么样子就很难说。对不起,我以为我读了 14!
猜你喜欢
  • 1970-01-01
  • 2017-12-25
  • 2021-09-28
  • 2016-07-27
  • 2019-11-17
  • 1970-01-01
  • 2018-08-03
  • 1970-01-01
  • 2022-12-19
相关资源
最近更新 更多