【问题标题】:NonMatchingSplitsSizesError loading huggingface BookCorpusNonMatchingSplitsSizesError loading huggingface BookCorpus
【发布时间】:2021-12-17 20:47:20
【问题描述】:

我想像这样加载bookcorpus

train_ds, test_ds = load_dataset('bookcorpus', split=['train', 'test']),

但是,得到以下错误:

Traceback (most recent call last):             
  File "<stdin>", line 1, in <module>
  File "/home/marcelbraasch/.local/lib/python3.8/site-packages/datasets/load.py", line 1627, in load_dataset
    builder_instance.download_and_prepare(
  File "/home/marcelbraasch/.local/lib/python3.8/site-packages/datasets/builder.py", line 607, in download_and_prepare
    self._download_and_prepare(
  File "/home/marcelbraasch/.local/lib/python3.8/site-packages/datasets/builder.py", line 709, in _download_and_prepare
    verify_splits(self.info.splits, split_dict)
  File "/home/marcelbraasch/.local/lib/python3.8/site-packages/datasets/utils/info_utils.py", line 74, in verify_splits
    raise NonMatchingSplitsSizesError(str(bad_splits))
datasets.utils.info_utils.NonMatchingSplitsSizesError: [{'expected': SplitInfo(name='train', num_bytes=4853859824, num_examples=74004228, dataset_name='bookcorpus'), 'recorded': SplitInfo(name='train', num_bytes=2982081448, num_examples=45726619, dataset_name='bookcorpus')}]

我想继续将它保存到磁盘,因为我不想每次使用它时都下载它。是什么导致了这个错误?

【问题讨论】:

    标签: python dataset huggingface-transformers huggingface-datasets


    【解决方案1】:

    BookCorpus 不再公开。

    这是一个解决方法:

    https://github.com/soskek/bookcorpus

    【讨论】:

      猜你喜欢
      • 2021-08-29
      • 1970-01-01
      • 2023-04-03
      • 1970-01-01
      • 1970-01-01
      • 2016-04-27
      • 2020-10-05
      • 2021-11-16
      • 2022-10-22
      相关资源
      最近更新 更多