apply 与 operator.itemgetter vs. 的行为不一致applymap operator.itemgetter答案

【问题标题】：Inconsistent behavior of apply with operator.itemgetter v.s. applymap operator.itemgetterapply 与 operator.itemgetter vs. 的行为不一致applymap operator.itemgetter
【发布时间】：2014-04-12 01:41:31
【问题描述】：

在实际情况中可能不是排列数据的最佳方式，但它是一个很好的例子：

In [16]:
import operator
In [17]:
DF=pd.DataFrame({'Val1':[[2013, 37722.322],[1998, 32323.232]],
                 'Val2':[[2013, 37722.322],[1998, 32323.232]]})
In [18]:
print DF
                Val1               Val2
0  [2013, 37722.322]  [2013, 37722.322]
1  [1998, 32323.232]  [1998, 32323.232]

[2 rows x 2 columns]

apply 给出错误结果

In [19]:
print DF.apply(operator.itemgetter(-1), axis=1)
   Val1       Val2
0  2013  37722.322
1  1998  32323.232

[2 rows x 2 columns]

但是applymap 给出了正确的结果！

In [20]:
print DF.applymap(operator.itemgetter(-1))
        Val1       Val2
0  37722.322  37722.322
1  32323.232  32323.232

[2 rows x 2 columns]

为什么会这样？

【问题讨论】：

apply 被传递了一整行，这是一系列 2 个元素的列表；最后一个列表被返回并强制为一个系列。嵌入列表作为元素通常不是一个好主意。
我同意，这不是在现实生活中存储数据的好方法。非常有趣的是，第一个元素被分配给Val1。现在我明白了，谢谢！

标签： python pandas

【解决方案1】：

如果你使用它更容易看到发生了什么

df = pd.DataFrame({'Val1':[[1, 2],[3, 4]],
                 'Val2':[[5, 6],[7, 8]]})

     Val1    Val2
0  [1, 2]  [5, 6]
1  [3, 4]  [7, 8]

df.apply(operator.itemgetter(-1), axis=1) 在每一行调用operator.itemgetter(-1)。

例如，在第一行，operator.itemgetter(-1) 返回最后一项，即[5, 6]。因为这个值是可迭代的，所以它的值被分配给Val1 和Val2 两列。所以结果是

In [149]: df.apply(operator.itemgetter(-1), axis=1)
Out[149]: 
   Val1  Val2
0     5     6
1     7     8

相比之下，applymap 单独对 DataFrame 中的每个单元格进行操作，因此 operator.itemgetter(-1) 返回每个单元格的最后一项。

In [150]: df.applymap(operator.itemgetter(-1))
Out[150]: 
   Val1  Val2
0     2     6
1     4     8

【讨论】：

【解决方案2】：

只是补充一下@unutbu 和@jeff 所说的话，如果有3 列开头：

In [26]:

print DF
                Val1               Val2               Val3
0  [2013, 37722.322]  [2014, 37722.322]  [2015, 37722.322]
1  [1997, 32323.232]  [1998, 32323.232]  [1999, 32323.232]

[2 rows x 3 columns]
In [27]:

print DF.apply(operator.itemgetter(-1), axis=1)
0    [2015, 37722.322]
1    [1999, 32323.232]
dtype: object

生成的列表（长度为 2）不能强制为长度为 3 的系列，结果现在是一系列列表。

【讨论】：