即使值相同，Pandas 替换也不适用于不同的 numpy-int 类型答案

【问题标题】：Pandas replace does not work with different numpy-int types even when values are the same即使值相同，Pandas 替换也不适用于不同的 numpy-int 类型
【发布时间】：2022-01-11 12:14:12
【问题描述】：

我今天调试时遇到了一个奇怪的问题。长话短说，我有一个聚类模型，它为每个观察产生一些标签。我想根据它们的聚类均值更改这些标签，使得具有最高均值的聚类具有最大的聚类数等等。

问题是我的映射字典键有 np.int64 作为类型，但我的一系列标签有 np.int32 作为类型，这导致替换不会发生（我觉得这很奇怪！）

看这个例子：

maps = pd.Series([np.int64(0),np.int64(2),np.int64(1)]) #index-value should be mapped to series-value

# 0 0    0-->0
# 1 2    1-->2
# 2 1    2-->1

map_dict = {old:new for (old,new) in zip(maps.values,maps.index)}

map_dict == {0: 0, 2: 1, 1: 2} #True

labs = pd.Series([1,1,1,0,0,2,2,2]).astype(np.int32) #Simulate my list of clusters of observations

(labs.replace({0: 0, 2: 1, 1: 2}) == labs.replace(map_dict)).mean() #0.25

正如您所见，map_dict=={0: 0, 2: 1, 1: 2} 评估为 True，因此我假设使用其中任何一个都会产生相同的结果。

此外，这很奇怪：

np.int64(1) == np.int(32) # True
1 == np.int(32) # True

也就是说，如果我使用np.int64、np.int32 或仅使用int，它应该不会有什么不同。实际上，如果我像这样将map_dict 转换为int：

.
.


map_dict = {int(old):int(new) for (old,new) in zip(maps.values,maps.index)} #cast to "int"

map_dict == {0: 0, 2: 1, 1: 2} #True

labs = pd.Series([1,1,1,0,0,2,2,2]).astype(np.int32) #Simulate my list of clusters of observations

(labs.replace({0: 0, 2: 1, 1: 2}) == labs.replace(map_dict)).mean() #1.0 - all of them gets replaced correctly

我知道int64 和int32（位数）在技术上是有区别的，但我觉得上面的内容很奇怪，当我们比较它们时，这两个映射字典的计算结果为True，但是由于某种原因，我们在使用replace 方法时必须使用完全相同的整数类型。

通常整数类型也无所谓（我以前在 Python 中没有遇到过这样的问题）：

pd.Series([np.int32(1)]) == pd.Series([np.int64(1)]) #True

【问题讨论】：

提示：map_dict = maps.to_dict()
哦..是的。谢谢！
我的错。 map_dict = maps.to_dict() 不会产生与您的理解相同的输出！！！
确实如此，只是“1”和“2”的顺序相反，但（键，值）对是相同的
Rolling.sum （可能）是同样的问题

标签： python pandas numpy integer

【解决方案1】：

问题可能是map_dict和replace函数的key类型：

maps = pd.Series([np.int64(0),np.int64(2),np.int64(1)])

map_dict = {old:new for (old,new) in zip(maps.values,maps.index)}
for k in map_dict:
    print(type(k))

# <class 'numpy.int64'>
# <class 'numpy.int64'>
# <class 'numpy.int64'>

map_dict = maps.to_dict()
for k in map_dict:
    print(type(k))

# <class 'int'>
# <class 'int'>
# <class 'int'>

使用replace：

# with map_dict generate from comprehension
>>> labs.replace(map_dict)
0    1
1    1
2    1
3    0
4    0
5    2
6    2
7    2
dtype: int32

# with map_dict generate from to_dict
>>> labs.replace(map_dict)
0    2
1    2
2    2
3    0
4    0
5    1
6    1
7    1
dtype: int64

但是，当您使用 map 而不是 replace 时，map_dict 的构建方式没有区别：

>>> labs.map(map_dict)
0    2
1    2
2    2
3    0
4    0
5    1
6    1
7    1
dtype: int64  # <- Note the cast from int32 to int64

【讨论】：

这是正确的 - 这是类型差异，这是我的整个问题以及我提到的问题;-) 我只是不明白 为什么 i> 根据我提供的示例，这是一个问题。它与map 一起工作让我更加困惑！