【问题标题】:How to load only the most recent file from a directory where the filenames startswith the date?How to load only the most recent file from a directory where the filenames startswith the date?
【发布时间】:2022-12-01 18:41:31
【问题描述】:

I have files in one directory/folder named:

  1. 2022-07-31_DATA_GVAX_ARPA_COMBINED.csv
  2. 2022-08-31_DATA_GVAX_ARPA_COMBINED.csv
  3. 2022-09-30_DATA_GVAX_ARPA_COMBINED.csv

    The folder will be updated with each month's file in the same format as above eg.:

    • 2022-10-31_DATA_GVAX_ARPA_COMBINED.csv
    • 2022-11-30_DATA_GVAX_ARPA_COMBINED.csv

    I want to only load the most recent month's .csv into a pandas dataframe, not all the files. How can I do this (maybe using glob)?

    I have seen this used for prefixes using:

    dir_files = r'/path/to/folder/*'
    
    dico={}
    
    for file in Path(dir_files).glob('DATA_GVAX_COMBINED_*.csv'):
        dico[file.stem.split('_')[-1]] = file
    
    max_date = max(dico) 
    

【问题讨论】:

  • With that file naming convention you only need a list of all files in the directory which you can then naturally sort. Are there any other files in the directory apart from ones with this naming structure?
  • yes there will be other with different naming conventions @Cobra

标签: python pandas dataframe csv glob


【解决方案1】:

You could try something like this:


import pandas as pd
from pathlib import Path


dir_files = r'/path/to/folder/*'

dico = {}

for file in Path(dir_files).glob('*DATA_GVAX_ARPA_COMBINED*.csv'):
    date_value = pd.to_datetime(file.name.split('_')[0], errors="coerce")
    if pd.notna(date_value):
        dico[date_value] = file

max_date = max(dico.keys())
filepath = dico[max_date]
print(f'{max_date} -> {filepath}')
# Prints:
#
# 2022-10-31 00:00:00 -> 2022-10-31_DATA_GVAX_ARPA_COMBINED.csv

【讨论】:

    【解决方案2】:

    Glob the directory with the pattern for known files of interest. Sort (natural) on the basename.

    from glob import glob as GLOB
    from os.path import join as JOIN, basename as BASENAME
    
    def get_latest(directory):
        if all_files := list(GLOB(JOIN(directory, '*_DATA_GVAX_ARPA_COMBINED.csv'))):
            return sorted(all_files, key=BASENAME)[-1]
    
    print(get_latest('/Users/Cobra'))
    

    【讨论】:

      猜你喜欢
      • 2022-11-20
      • 2022-12-01
      • 2022-12-01
      • 2022-12-27
      • 2022-12-27
      • 1970-01-01
      • 2023-02-01
      • 2022-12-27
      • 2022-12-01
      相关资源
      最近更新 更多