条件在过滤数据帧行时单独工作但不能一起工作[ValueError：系列的真值不明确]答案

【问题标题】：conditions work separately but not together in filtering dataframe rows [ValueError: The truth value of a Series is ambiguous]条件在过滤数据帧行时单独工作但不能一起工作[ValueError：系列的真值不明确]
【发布时间】：2023-04-02 02:55:02
【问题描述】：

我再次尝试在时间条件下挑选出某些行来计算不同时间段的平均值/标准。

file = pd.read_csv('test/Res/1002', sep='\t', encoding = 'utf-8')

print(file['hm']==47)
print(file['hm']==48)
print(1<=file['hm']<=14)

我得到正确评估的 True/False 布尔值列表。但在下面，改为接收这个 -> ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

overn_std = file[(file['hm'] == 47)| (file['hm'] == 48) | (1 <= file['hm'] <= 14) ]

过去我通过将条件or 替换为| 来解决相同的问题。 hm 列的 dtype 是 int。

【问题讨论】：

标签： python pandas dataframe filter conditional-statements

【解决方案1】：

我认为需要将 (1 <= file['hm'] <= 14) 拆分为 2 个由 & 链接的单独条件，用于 AND：

overn_std = file[(file['hm'] == 47) | 
                 (file['hm'] == 48) | 
                 ((file['hm'] >= 1) & (file['hm'] <= 14)) ]

您也可以单独创建每个蒙版：

m1 = (file['hm'] == 47)
m2 = (file['hm'] == 48)
m3 = (file['hm'] >= 1)
m4 = (file['hm'] <= 14)

overn_std = file[m1 | m2 | (m3 & m4 )]

更好的是使用isin 和between：

overn_std = file[(file['hm'].isin([47,48])) | (file['hm'].between(1,14)) ]

示例：

file = pd.DataFrame({
    'hm': range(50)
})


overn_std = file[(file['hm'].isin([47,48])) | (file['hm'].between(1,14)) ]
print (overn_std)
    hm
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  10
11  11
12  12
13  13
14  14
47  47
48  48

【讨论】：

谢谢。需要注意的是，如果我将条件语句保存在单独的变量（例如 cond1、cond2）中，然后在文件 [] 的索引中合并为 cond1 和 cond2，它会起作用（但尚未完全理解）。
@HSL - 是的，如果有很多复杂的条件，这个方法是非常可读的。但它是一样的，只使用变量，例如m1、m2 而不是 (file['hm'] == 47)、(file['hm'] == 48)。