【发布时间】:2020-06-10 08:34:35
【问题描述】:
我正在使用 Python 组装数据集。在我运行一系列 shell 命令下载、解压缩和删除 tar 文件后,我正在使用这些命令来组装数据集:
!rm -rf sample_data # This is just a default directory in colab that includes some starter datasets. I remove it to maintain disk space.
!rm -rf food-101
!wget https://s3.amazonaws.com/fast-ai-imageclas/food-101.tgz
!tar -xvf food-101.tgz
!rm -rf food-101.tgz food-101/h5/
!mkdir food-101/meta food-101/images/train food-101/images/test
!mv food-101/*.txt food-101/*.json food-101/meta/
# Moves all test images to the food-101/images/test directory and renames them
with open('food-101/meta/test.txt') as test_file:
for line in test_file:
name_of_folder = line.split('/')[0]
name_of_file = line.split('/')[1].rstrip()
Path('food-101/images/' + name_of_folder + '/' + name_of_file + '.jpg').rename('food-101/images/test/' + name_of_folder + '_' + name_of_file + '.jpg')
# Moves all training images to the food-101/images/train directory and renames them
with open('food-101/meta/train.txt') as train_file:
for line in train_file:
name_of_folder = line.split('/')[0]
name_of_file = line.split('/')[1].rstrip()
Path('food-101/images/' + name_of_folder + '/' + name_of_file + '.jpg').rename('food-101/images/train/' + name_of_folder + '_' + name_of_file + '.jpg')
# Removes empty directories inside Food-101/images
with open('food-101/meta/train.txt') as train_file:
for folder in train_file:
name_of_folder = folder.split('/')[0]
if os.path.exists('food-101/images/' + name_of_folder)
因此,基本上数据集随附属于专用于其类的文件夹的每个图像(训练和测试)。这些图像没有标记,它们的标签分别在 train.txt 和 test.txt 中。这些 txt 文件包含图像属于哪个文件夹(训练/测试)的列表。在重命名它们时,我基本上会浏览每个 txt 文件并将图像移动到相应的文件中。
这基本上是重复相同的代码。我尝试将它们与 open() 块合并为一个,但不知道该怎么做。
【问题讨论】:
-
为什么不将移动代码放在函数中,并将文件路径作为参数传递?
-
我试过但不知道如何@cyboashu
标签: python google-colaboratory