Numpy.where 用 pandas 列引发 KeyError答案

【问题标题】：Numpy.where raise KeyError with pandas columnsNumpy.where 用 pandas 列引发 KeyError
【发布时间】：2021-10-13 10:30:24
【问题描述】：

我正在处理不完整的数据，这些数据围绕具有不同数据结构的文件进行拆分。所以我用 np.where 编写了一个脚本来检查列名中是否有键，以及是否在 df 中写入 em。我正在使用带有 np.where 的 pandas 并引发 KeyError。示例：

df['col_result'] = np.where('col1' in df.columns, df['col1'], 'None')

KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'col1'

【问题讨论】：

欢迎来到 Stackoverflow。请花时间阅读how to provide a great pandas example 上的这篇文章以及如何提供minimal, complete, and verifiable example 并相应地修改您的问题。 how to ask a good question 上的这些提示也可能有用。
如果 'col1' 不在 df.columns 中，np.where 将仍然评估 df['col1'] 并因此引发 KeyError。（或者更可能的是，np.where 评估 df['col1']，然后继续处理它的第一个参数。）
需要对我的回答进行任何澄清吗？让我知道它是否适合您。

标签： python pandas dataframe numpy

【解决方案1】：

当您正在测试col1 是否在df 的列中时，似乎col1 在列中不可用。对于这种情况，当您在代码中编码df['col1'] 时，它将为col1 引发KeyError。

当您想将一整列分配给新列时，使用np.where() 并没有太多好处（主要用于到达行可以满足或不满足条件时）。因此，您可以考虑将代码更改为简单的 if-else 语句，如下所示：

if 'col1' in df.columns:
    df['col_result'] = df['col1']
else:
    df['col_result'] = 'None'

【讨论】：