【发布时间】:2021-09-23 22:19:08
【问题描述】:
我有 10 个 csv 文件,在每个文件中,我想删除 UID 列中包含以下数字的行 - 1002、1007、1008。
请注意,所有 10 个csv 文件具有相同的列名
# one of the csv files looks like this
import pandas as pd
df = {
'UID':[1001,1002,1003,1004,1005,1006,1007,1008,1009,1010],
'Name':['Ray','James','Juelz','Cam','Jim','Jones','Bleek','Shawn','Beanie','Amil'],
'Income':[100.22,199.10, 191.13,199.99,230.6,124.2,122.9,128.7,188.12,111.3],
'Age':[24,32,27,54,23,41,44,29,30,68]
}
df = pd.DataFrame(df)
df = df[['UID','Name','Age','Income']]
df
尝试
#I know I need a for loop or glob to iterate through the folder and filter out the desired UIDs. My dilemma is I don't know how to incorporate steps II & III in I
#Step I: looping through the .csv files in the folder
import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
if filename.endswith(".csv"):
print(os.path.join(directory, filename))
# StepII: UID to be removed - 1002,1007,1008
df2 = df[~(df.UID.isin([1002,1007,1008]))]
# Step III: Export the new dataframes as .csv files (10 csv files)
df2.to_csv(r'mypath\data.csv)
谢谢
【问题讨论】:
标签: python pandas csv for-loop glob