使用 np.select 根据来自多个其他列的数据生成条件列答案

【问题标题】：Using np.select to generate conditional column based off data from multiple other columns使用 np.select 根据来自多个其他列的数据生成条件列
【发布时间】：2019-12-17 15:23:59
【问题描述】：

我正在尝试在现有数据帧上生成一个新列，该列是根据条件语句构建的，输入是数据帧中多个列的数据。

我正在使用 np.select() 方法，因为我读到这是使用多列作为条件级别输入的最佳方法。但是，当我运行代码时，即使满足行中的条件，也会填充默认值。下面是一些示例代码

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,2, size=(20,3)), columns = list('ABC'))

choices = ['C Highest','B Highest','A Highest']
conditions = [
        (df['C'] is True), 
        (df['C'] is False & df['B'] is True),
        (df['A'] is True & df['C']is False & df['B'] is False)]

#conditions = [
#        (df['C'] == 1), 
#        (df['C'] == 0 & df['B'] == 1),
#        (df['A'] == 1 & df['C'] == 0 & df['B'] == 0)]

df['Highest Column'] = np.select(conditions, choices, default=np.nan)

当我运行上面的代码时，我没有收到任何错误，但数据框中的最高列都是 NaN。就好像代码有效，但似乎没有满足任何条件（尽管它们为真），所以只填充了默认值。

当我将条件切换到被注释掉的条件（然后注释掉之前的条件变量）时，我得到"ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

显然，这些数据只是随机的，是从我的用例中抽象出来的，但底层代码应该几乎相同。如果 C 列中有 1，则应在数据框中的最高列系列中将其标记为 C 列。如果 C 列是 0，但 B 有 1，那么最高应该是 B 列。等等。

我知道我可以在 excel 中很快做到这一点，但我更愿意学习如何在 Python/pandas 中做到这一点，因此非常感谢任何建议！

【问题讨论】：

您忘记了注释条件中的括号：(df['C'] == 0) & (df['B'] == 1),

标签： python pandas numpy

【解决方案1】：

试试：

choices = ['C Highest','B Highest','A Highest']
conditions = [
       (df['C'] == 1), 
       ((df['C'] == 0) & (df['B'] == 1)),
       ((df['A'] == 1) & (df['C'] == 0) & (df['B'] == 0))]

df['Highest Column'] = np.select(conditions, choices, default=np.nan)

# df.head()

    A   B   C   Highest Column
0   1   0   0   A Highest
1   0   0   1   C Highest
2   1   1   0   B Highest
3   1   0   1   C Highest
4   1   1   0   B Highest

【讨论】：