【问题标题】:How to split datasets into specific numbers?如何将数据集拆分为特定的数字?
【发布时间】:2021-03-16 08:58:39
【问题描述】:

我正在使用 keras MNIST 数据集,其中包含 60k 图像的训练集和 10k 图像的测试集。在我的作业中,我被提示将训练集进一步拆分为 50k 用于训练和 10k 用于验证。我有点不确定如何执行此操作和/或处理此操作,因为我不必像以前那样将数据集拆分为特定数字。这是我到目前为止的代码:

import numpy as np
import scipy
import matplotlib.pyplot as plt
from keras.datasets import mnist
from util import func_confusion_matrix

# load (downloaded if needed) the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# transform each image from 28 by28 to a 784 pixel vector
pixel_count = x_train.shape[1] * x_train.shape[2]
x_train = x_train.reshape(x_train.shape[0], pixel_count).astype('float32')
x_test = x_test.reshape(x_test.shape[0], pixel_count).astype('float32')

# normalize inputs from gray scale of 0-255 to values between 0-1
x_train = x_train / 255
x_test = x_test / 255

【问题讨论】:

    标签: python tensorflow machine-learning keras dataset


    【解决方案1】:

    你可以直接使用sklearn包:

    import numpy as np
    import scipy
    import matplotlib.pyplot as plt
    from keras.datasets import mnist
    from util import func_confusion_matrix
    
    # load (downloaded if needed) the MNIST dataset
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    
    from sklearn.model_selection import train_test_split
    x_train,x_val,y_train,y_val = train_test_split(x_train,y_train,test_size=0.1)
    
    # here test_size=0.1 means you are selecting 10% of data for split.
    

    现在您拥有 x_train,x_test,x_val 和 y_train,y_test,y_val 变量中的所有数据。

    【讨论】:

    • 那么 60k 中的 10% 会进入验证集?
    • 是的。也可以从here 读取函数定义
    【解决方案2】:

    在这种特定情况下,mnist 是预先打乱的,因此您可以使用索引来选择训练集的最后 10,000 个观察值作为验证集。

    from tensorflow.keras.datasets import mnist
    
    (x_train, y_train), (x_test, y_test) = mnist.load_data()
    
    y_val = y_train[-10000:]
    x_val = x_train[-10000:]
    
    x_train = x_train[:-10000]
    y_train = y_train[:-10000]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2022-01-04
      • 2023-02-16
      • 1970-01-01
      • 2019-05-06
      • 1970-01-01
      • 2018-11-05
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多