【问题标题】:CelebA Dataset inaccessible using tfds.load()CelebA 数据集无法使用 tfds.load()
【发布时间】:2021-01-24 04:02:16
【问题描述】:

我正在尝试在深度学习项目中使用 CelebA 数据集。我有来自 Kaggle 的压缩文件夹。 我想解压缩然后将图像拆分为训练、测试和验证,但后来发现在我的不太强大系统上是不可能的。

所以,为了避免浪费时间,我想使用 TensorFlow-datasets 方法来加载 CelebA 数据集。但不幸的是,无法访问数据集并出现以下错误:

(代码优先)

ds = tfds.load('celeb_a', split='train', download=True)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-69-d7b9371eb674> in <module>
----> 1 ds = tfds.load('celeb_a', split='train', download=True)

c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\load.py in load(name, split, data_dir, batch_size, shuffle_files, download, as_supervised, decoders, read_config, with_info, builder_kwargs, download_and_prepare_kwargs, as_dataset_kwargs, try_gcs)
    344   if download:
    345     download_and_prepare_kwargs = download_and_prepare_kwargs or {}
--> 346     dbuilder.download_and_prepare(**download_and_prepare_kwargs)
    347 
    348   if as_dataset_kwargs is None:

c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in download_and_prepare(self, download_dir, download_config)
    383           self.info.read_from_directory(self._data_dir)
    384         else:
--> 385           self._download_and_prepare(
    386               dl_manager=dl_manager,
    387               download_config=download_config)

c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in _download_and_prepare(self, dl_manager, download_config)
   1020   def _download_and_prepare(self, dl_manager, download_config):
   1021     # Extract max_examples_per_split and forward it to _prepare_split
-> 1022     super(GeneratorBasedBuilder, self)._download_and_prepare(
   1023         dl_manager=dl_manager,
   1024         max_examples_per_split=download_config.max_examples_per_split,

c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\dataset_builder.py in _download_and_prepare(self, dl_manager, **prepare_split_kwargs)
    959     split_generators_kwargs = self._make_split_generators_kwargs(
    960         prepare_split_kwargs)
--> 961     for split_generator in self._split_generators(
    962         dl_manager, **split_generators_kwargs):
    963       if str(split_generator.split_info.name).lower() == "all":

c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\image\celeba.py in _split_generators(self, dl_manager)
    137     all_images = {
    138         os.path.split(k)[-1]: img for k, img in
--> 139         dl_manager.iter_archive(downloaded_dirs["img_align_celeba"])
    140     }
    141 

c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\download\download_manager.py in iter_archive(self, resource)
    559     if isinstance(resource, six.string_types):
    560       resource = resource_lib.Resource(path=resource)
--> 561     return extractor.iter_archive(resource.path, resource.extract_method)
    562 
    563   def extract(self, path_or_paths):

c:\users\aman\appdata\local\programs\python\python38\lib\site-packages\tensorflow_datasets\core\download\extractor.py in iter_archive(path, method)
    221     An iterator of `(path_in_archive, f_obj)`
    222   """
--> 223   return _EXTRACT_METHODS[method](path)

KeyError: <ExtractMethod.NO_EXTRACT: 1>

谁能解释我做错了什么?

附带说明,如果这不起作用,有没有办法将已经从 Kaggle 下载的压缩文件转换为所需的格式,而无需解压缩然后单独遍历每个图像?基本上,对于这么大的数据集,我不能走 unzip-then-split 路线...

TIA!


编辑: 我在 Colab 上尝试了同样的方法,但得到了类似的错误:

【问题讨论】:

    标签: python-3.x image deep-learning tensorflow2.0 tensorflow-datasets


    【解决方案1】:

    将 tfds 升级到适用于我的夜间版本

    【讨论】:

      【解决方案2】:

      下载表格 GDrive 似乎有某种配额限制。转到错误中显示的 google drive 链接,然后将副本复制到您的驱动器。您也可以通过gdowngoogle_drive_downloader等库下载副本。

      【讨论】:

      • 截至 2021 年 10 月,谷歌驱动器链接返回 403 禁止错误:(
      猜你喜欢
      • 2023-03-24
      • 2021-11-10
      • 1970-01-01
      • 1970-01-01
      • 2021-04-19
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-10-05
      相关资源
      最近更新 更多