【问题标题】:In python, I want to loop through multiple csv files and remove specific rows在python中,我想遍历多个csv文件并删除特定行
【发布时间】:2021-09-23 22:19:08
【问题描述】:

我有 10 个 csv 文件,在每个文件中,我想删除 UID 列中包含以下数字的行 - 100210071008

请注意,所有 10 个csv 文件具有相同的列名

# one of the csv files looks like this

import pandas as pd

df = { 
        'UID':[1001,1002,1003,1004,1005,1006,1007,1008,1009,1010],
        'Name':['Ray','James','Juelz','Cam','Jim','Jones','Bleek','Shawn','Beanie','Amil'],
        'Income':[100.22,199.10, 191.13,199.99,230.6,124.2,122.9,128.7,188.12,111.3],
        'Age':[24,32,27,54,23,41,44,29,30,68]
}
 
df = pd.DataFrame(df)
df = df[['UID','Name','Age','Income']]
df 



尝试

#I know I need a for loop or glob to iterate through the folder and filter out the desired UIDs. My dilemma is I don't know how to incorporate steps II & III  in I

#Step I: looping through the .csv files in the folder

import os
directory = r'C:\Users\admin'
for filename in os.listdir(directory):
    if filename.endswith(".csv"):
        print(os.path.join(directory, filename))

# StepII: UID to be removed - 1002,1007,1008 

df2 = df[~(df.UID.isin([1002,1007,1008]))] 

# Step III: Export the new dataframes as .csv files (10 csv files)
df2.to_csv(r'mypath\data.csv)
  

谢谢

【问题讨论】:

    标签: python pandas csv for-loop glob


    【解决方案1】:

    试试这个:

    import os
    directory = r'C:\Users\admin'
    for filename in os.listdir(directory):
        if filename.endswith(".csv"):
            filepath = os.path.join(directory, filename)
            df = pd.read_csv(filepath)
            df2 = df[~df['UID'].isin([1002,1007,1008])]
            filename, ext = filepath.rsplit('.', maxsplit=1)
            filename = f'{filename}_mod.{ext}'
            df2.to_csv(filename)
    

    注意:@TimRoberts 是对的,pandas 在这里有点矫枉过正,但如果你想在这里学习,这是一个潜在的解决方案。

    【讨论】:

      【解决方案2】:

      您不需要为此编写程序,当然也不需要 pandas。如果您有 Linux 工具:

      grep -v -e 1002, -e 1007, -e 1008, incoming.csv > fixed.csv
      

      窗户:

      findstr /v /c:1002, /c:1007, /c:1008, incoming.csv > fixed.csv
      

      所以,在一个批处理文件中:

      cd C:\Users\admin
      mkdir fixed
      for %i in (*.csv) do findstr /v /c:1002, /c:1007, /c:1008, %%i > fixed\%%i
      

      【讨论】:

      • 很遗憾,我没有 Linux 工具。
      • 这就是我给你 Windows 秘诀的原因。
      【解决方案3】:

      对不起我的英语不好

      第二步:

      如果我没有错过理解,您想从 df 中的此列表 [1001,1002,1003,1004,1005,1006,1007,1008,1009,1010] 中删除值 [1002,1007,1008]字典。很简单,您可以像这样遍历 dict 的键:

      values = [1002,1007,1008] 
      
      for key in df.keys():
      

      然后检查该键的值中是否有任何要删除的值

      values = [1002,1007,1008] 
      for key in df.keys():
          for value in values:
              if value in df[key]:
                  df[key].remove(value)
      

      第三步

      import csv
      
      with open('my_file.csv', mode='w') as file:
          file_writer = csv.writer(file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
      
          file_writer.writerow(df)
          
      

      【讨论】:

        猜你喜欢
        • 2023-02-24
        • 1970-01-01
        • 2018-12-22
        • 1970-01-01
        • 1970-01-01
        • 2014-06-14
        • 1970-01-01
        • 2013-04-23
        • 2021-10-10
        相关资源
        最近更新 更多