使用 ShuffleSplit 将数据分成多个集合，然后将每个集合存储在一个集合中答案

【问题标题】：Divide the data into multiple sets using ShuffleSplit then storing each in a set使用 ShuffleSplit 将数据分成多个集合，然后将每个集合存储在一个集合中
【发布时间】：2021-05-14 06:54:01
【问题描述】：

我已加载 CIFAR10 数据集，但我想将其分成多个部分。这是我下载数据集的方式

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

然后我使用 ShuffleSplit 创建了一个生成器来拆分数据，如下所示：

from sklearn.model_selection import ShuffleSplit
rs = ShuffleSplit(n_splits=3, test_size=0.1, random_state=0)
splits = rs.split(x_train)

我知道我可以使用以下方法迭代生成的拆分：

for train_index, test_index in splits:
  #train_index is a np array which hold the indies 
  print("TRAIN:", train_index, "TEST:", test_index)

假设我想在最后。

x_train1, y_train1, x_train2, y_train2, x_train3, y_train3

如何根据生成的索引划分数据，以便一个训练拆分同时包含训练和测试索引？

我尝试将索引组合到列表中或联系数组，但没有成功。

【问题讨论】：

标签： python numpy scikit-learn

【解决方案1】：

更好的方法:)

num_shreds = 10
shred_size = len(X_train)//num_shreds

X_train, y_train = shuffle(X_train, y_train)
shred_X = [X_train[i:i + shred_size] for i in range(0, shred_size* num_shreds, shred_size)]
shred_y = [y_train[i:i + shred_size] for i in range(0, shred_size* num_shreds, shred_size)]

【讨论】：

【解决方案2】：

我能够通过使用不同的方法来解决问题，代码如下：

  partitions_train_x = []
  partitions_train_y = []
  partitions_test_x = []
  partitions_test_y = []
  x = np.arange(len(y_train))
  np.random.shuffle(x)
  indices = np.split(x, num_partitions)
    for data_indices in zip(indices):
      x = x_train[data_indices]
      y = y_train[data_indices]
      partions_train_x.append(x)
      partitions_train_y.append(y)

      x = np.arange(len(y_test))
      indices = np.split(x, num_partitions)
      for data_indices in zip(indices):
        x = x_test[data_indices]
        y = y_test[data_indices]
      partitions_test_x.append(x)
      partitions_test_y.append(y)

我知道这可能不是最好的方法，但它确实有效。

【讨论】：