sklearn中的随机状态意义答案

【问题标题】：Random state significance in sklearnsklearn中的随机状态意义
【发布时间】：2019-04-09 18:12:55
【问题描述】：

我正在处理sklearn 中的train_test_split，但我无法理解random_state 参数。它的功能到底是什么以及我们为什么使用它。

请提供一个必要的例子。

提前致谢。

【问题讨论】：

random_state parameter in sklearn's train_test_split的可能重复
@Akshay 下面的回答很棒。另外，如果您不熟悉（伪）随机数生成的概念，我建议您查看来自维基百科的this 和this 之一。

标签： python scikit-learn

【解决方案1】：

train_test_split 中的random_state 参数可帮助您在每次运行该代码时重现相同的结果。

随机状态确保您生成的拆分是可重现的。 Scikit-learn 使用随机排列来生成拆分。您提供的随机状态用作随机数生成器的种子。这样可以确保随机数以相同的顺序生成。

不使用 random_state 参数

from sklearn.model_selection import train_test_split

a = [1,5,6,7,8,6]
b = [2,3,5,2,1,4]

x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25)

print(x1)
# output: [1, 6, 8, 7]

## run the code again

x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25)

print(x1)
# output: [6, 8, 6, 7]

每次运行代码时，值都会改变。

使用 random_state 参数

x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25, random_state=42)

print(x1)
# output: [6, 6, 8, 7]

## run the code again
x1, x2, y1, y2 = train_test_split(a,b,test_size=0.25, random_state=42)

print(x1)
# output: [6, 6, 8, 7]

如您所见，相同的值已重现，并且每次运行代码时都会创建相同的拆分。

【讨论】：