【问题标题】:random_state and shuffle togetherrandom_state 和随机播放
【发布时间】:2019-04-14 10:23:03
【问题描述】:

我对同时使用random_stateshuffle 有点困惑。我想拆分数据而不对其进行洗牌。在我看来,当我将 shuffle 设置为 False 时,我为 random_state 选择的数字是多少并不重要,我有相同的输出(对于 random_state 42 或 2、7、17 等,拆分是相同的)。为什么?

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42,shuffle=False )

但如果 shuffle 为 True,我对不同的 random_states 有不同的输出(拆分),这是有道理的。

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,random_state=42)

【问题讨论】:

    标签: python scikit-learn shuffle


    【解决方案1】:

    如果您将 shuffle 设置为 False,train_test_split 只会按照原始顺序读取您的数据。因此参数random_state 被完全忽略。

    例子:

    X = [k for k in range(0, 50)] # create array with numbers ranging from 0 to 49
    y = X # just for testing
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)
    
    print(X_train) // prints [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36]
    

    只要将shuffle 设置为True,random_state 就会用作随机数生成器的种子。结果,您的数据集被随机分成训练集和测试集。

    random_state=42 的示例:

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=True)
    
    print(X_train) // prints [8, 3, 6, 41, 46, 47, 15, 9, 16, 24, 34, 31, 0, 44, 27, 33, 5, 29, 11, 36, 1, 21, 2, 43, 35, 23, 40, 10, 22, 18, 49, 20, 7, 42, 14, 28, 38]
    

    random_state=44 的示例:

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=44, shuffle=True)
    
    print(X_train) // prints [13, 11, 2, 12, 34, 41, 30, 16, 39, 28, 24, 8, 18, 9, 4, 10, 0, 19, 21, 29, 14, 1, 48, 38, 7, 43, 25, 22, 23, 42, 46, 49, 32, 3, 45, 35, 20]
    

    【讨论】:

      猜你喜欢
      • 2023-01-23
      • 2015-11-16
      • 1970-01-01
      • 2011-12-07
      • 2021-03-30
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多