【发布时间】:2018-04-20 13:50:45
【问题描述】:
我有一个 csv 文件 (VV_AL_3T3_P3.csv),每个 csv 文件的每一行都对应于浮游生物的 tiff 图像。它看起来像这样:
Particle_ID Diameter Image_File Lenght ....etc
1 15.36 VV_AL_3T3_P3_R3_000001.tif 18.09
2 17.39 VV_AL_3T3_P3_R3_000001.tif 19.86
3 17.21 VV_AL_3T3_P3_R3_000001.tif 21.77
4 9.42 VV_AL_3T3_P3_R3_000001.tif 9.83
图像全部放在一个文件夹中,然后在文件夹中按形状分类。 tiff 图像的名称由 Image_file + Particle ID 组成;例如第一行:VV_AL_3T3_P3_R3_000001_1.tiff
现在,我想使用 python 在我已经拥有的 csv 文件 (VV_AL_3T3_P3.csv) 中添加一个名为“Class”的新列,其中包含每个 .tiff 文件所在的文件夹(类)的名称;像这样:
Particle_ID Diameter Image_File Lenght Class
1 15.36 VV_AL_3T3_P3_R3_000001.tif 18.09 Spherical
2 17.39 VV_AL_3T3_P3_R3_000001.tif 19.86 Elongated
3 17.21 VV_AL_3T3_P3_R3_000001.tif 21.77 Pennates
4 9.42 VV_AL_3T3_P3_R3_000001.tif 9.83 Others
到目前为止,我有一个包含每个 tiff 文件所在文件夹名称的列表。这是将成为新列的列表。但是,我该怎么做才能让每个文件夹都适合它的行呢?换句话说,将“类”与“粒子 ID”和“图像文件”进行匹配。
现在:
## Load modules:
import os
import pandas as pd
import numpy as np
import cv2
## Function to recursively list files in dir by extension
def file_match(path,extension):
cfiles = []
for root, dirs, files in os.walk('./'):
for file in files:
if file.endswith(extension):
cfiles.append(os.path.join(root, file))
return cfiles
## Load all image file at all folders:
image_files = file_match(path='./',extension='.tiff')
## List of directories where each image was found:
img_dir = [os.path.dirname(one_img)[2:] for one_img in image_files]
len(img_dir)
## List of images:
# Image file column in csv files:
img_file = [os.path.basename(one_img)[:22] for one_img in image_files]
len(img_file)
# Particle id column in csv files:
part_id = [os.path.basename(one_img)[23:][:-5] for one_img in image_files]
len(part_id)
## I have the information related with the collage picture, particle id and the classification folder.
# Now i need to create a loop where this information is merged...
## Load csv file:
data = pd.read_csv('VV_AL_3T3.csv')
sample_file = data['Image File'] # Column name
sample_id = data['Particle ID'] # Particle ID
我在这里看到过类似的案例:Create new column in dataframe with match values from other dataframe
但我真的不知道如何使用“map.set_index”,而且他有两个数据框,而我只有一个。
【问题讨论】: