【问题标题】:Split dataset containing multiple labels拆分包含多个标签的数据集
【发布时间】:2021-05-09 09:37:27
【问题描述】:

我有一个包含多个标签的数据集,即每个 X 我有 2 个 y,我需要分成训练集和测试集。

我尝试使用 sklearn 函数 train_test_split():

import numpy as np
from sklearn.model_selection import train_test_split

X = np.random.randn(10)
y1 = np.random.randint(1,10,10)
y2 = np.random.randint(1,3,10)

X_train, X_test, [Y1_train, Y2_train], [Y1_test, Y2_test] = train_test_split(X, [y1, y2], test_size=0.4, random_state=42)

但我收到一条错误消息:

ValueError: Found input variables with inconsistent numbers of samples: [10, 2]

【问题讨论】:

    标签: python numpy scikit-learn train-test-split


    【解决方案1】:

    这段代码应该适合你。

    import numpy as np
    from sklearn.model_selection import train_test_split
    
    X = np.random.randn(10)
    y1 = np.random.randint(1,10,10)
    y2 = np.random.randint(1,3,10)
    y = [[y1[i],y2[i]] for i in range(len(y1))] 
    
    X_train, X_test, Y_train, Y_test  = train_test_split(X, y, test_size=0.4, random_state=42)
    

    它将产生以下输出

    print(X_train)
    [ 0.42534237  1.35471168  0.00640736  1.34057234  0.50608562 -1.73341641]
    

    print(Y_train)
    [[3, 1], [7, 1], [6, 2], [4, 2], [6, 2], [2, 2]]
    

    在您的代码中,标签数组的形状为 (2,10),但输入数组的形状为 (10,)。

    print([y1,y2])
    [array([2, 3, 7, 6, 4, 9, 2, 3, 6, 6]), array([2, 2, 1, 2, 2, 2, 2, 1, 1, 2])]
    
    print(np.array([y1,y2]).shape)
    (2, 10)
    
    print(X.shape)
    (10,)
    

    但您想要的标签形状是 (10,2):

    print(y)
    [[2, 2], [3, 2], [7, 1], [6, 2], [4, 2], [9, 2], [2, 2], [3, 1], [6, 1], [6, 2]]
    
    print(np.array(y).shape)
    (10, 2)
    

    输入和输出必须具有相同的形状。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-08-05
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-07-17
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多