为什么在加载时序列化的 numpy random_state 对象不同？答案

【问题标题】：Why are serialized numpy random_state objects different when are loaded?为什么在加载时序列化的 numpy random_state 对象不同？
【发布时间】：2018-10-06 11:44:06
【问题描述】：

我试图弄清楚为什么某些交叉验证使用一组定义的索引、相同的输入数据和sklearn 中的相同random_state 使用相同的LogisticRegression 模型超参数会给出不同的结果。我的第一个想法是最初的random_state 在后续运行中可能会有所不同。然后我意识到当我pickle random_state 它说当我直接比较两个对象时对象是不同的，但get_state 方法中的值是相同的。为什么是这样？

random_state = np.random.RandomState(0)
print(random_state)
# <mtrand.RandomState object at 0x12424e480>

with open("./rs.pkl", "wb") as f:
    pickle.dump(random_state, f, protocol=pickle.HIGHEST_PROTOCOL)
with open("./rs.pkl", "rb") as f:
    random_state_copy = pickle.load(f)
    print(random_state_copy)
# <mtrand.RandomState object at 0x126465240>
print(random_state == random_state_copy)
# False
print(str(random_state.get_state()) == str(random_state_copy.get_state()))
# True

版本：

numpy='1.13.3',

Python='3.6.4 |Anaconda, Inc.| （默认，2018 年 1 月 16 日，12:04:33）\n[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]')

【问题讨论】：

奖励：对 LogisticRegression 交叉验证不同的任何见解。我正在使用saga 求解器。这可能是为什么？
看来 numpy 的 RandomState 类没有实现相等性测试，因此它获得了仅基于对象标识的默认行为。您未腌制的对象肯定与最初腌制的对象不是 相同的对象，因此即使两个对象的每个细节都相等，比较也会返回 False。（我对实际问题的猜测：您的计算不知何故，有时使用全局（共享）随机状态而不是提供的随机状态。）
请展示LR的完整代码并进行交叉验证。
我的错误是交叉验证指标在逻辑回归方面被抵消了。

标签： python numpy random seed

【解决方案1】：

您的示例中初始随机状态的未腌制副本实际上会产生相同的随机数序列（在 python 3.6、numpy 1.15.4 上测试）。正如@jasonharper 指出的那样，RandomState 可能没有实施平等测试。 == 返回 False，但状态在行为上是相同的。

在您提供的相关代码之后插入以下代码片段：

a = random_state.randint(0, 10, 5)
b = random_state_copy.randint(0, 10, 5)
print(a)
print(b)
print(a==b)

生产：

[5 0 3 3 7]
[5 0 3 3 7]
[ True  True  True  True  True]

因此，很可能不是RandomState 使运行结果不同：在其他地方寻找差异的原因。

【讨论】：