【发布时间】:2022-01-14 17:55:38
【问题描述】:
我有一个现有循环,用于遍历大量文件路径,最终通过云处理管道发送文件。我需要更新循环以将文件名与数据框列 (fileName) 匹配,然后从第二列 (date) 获取关联的数据值并将其作为变量存储在我的循环中。
# dataframe that I need to extract 'date' from
df = pd.DataFrame({'id':['dat1', 'dat2', 'dat3'],
'date':[2019, 2021, 2015],
'fileName': ['dat1.file', 'dat2.file', 'dat3.file']})
# list of file paths that I need the fileName from to match with my dataframe
gs_files = ['path/dat1.file', 'path/dat2.file']
bucket = 'path/'
for f in gs_files:
# get file path
print('Path: ', f)
# get file name (need to keep this for later processing steps)
fbname = f.replace(bucket, '')
print('Image name: ', fbname)
# match fbname with df['fileName']. Store associated 'date' as a separate variable (not as a column in df)
if fbname in df['fileName']:
year = df['date']
print('Collection date: ',year)
# Extra processing steps will be executed below.
# Resulting output from the above code:
Path: path/dat1.file
Image name: dat1.file
Path: path/dat2.file
Image name: dat2.file
# Desired output:
Path: path/dat1.file
Image name: dat1.file
Collection date: 2019
Path: path/dat2.file
Image name: dat2.file
Collection date: 2021
【问题讨论】:
标签: python pandas dataframe loops