Pandas：IndexingError：作为索引器提供的不可对齐布尔系列答案

【问题标题】：Pandas: IndexingError: Unalignable boolean Series provided as indexerPandas：IndexingError：作为索引器提供的不可对齐布尔系列
【发布时间】：2018-01-03 07:24:04
【问题描述】：

我正在尝试运行我认为是简单的代码来消除所有包含所有 NaN 的列，但无法让它工作（axis = 1 在消除行时工作得很好）：

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,np.nan,np.nan], 'b':[4,np.nan,6,np.nan], 'c':[np.nan, 8,9,np.nan], 'd':[np.nan,np.nan,np.nan,np.nan]})

df = df[df.notnull().any(axis = 0)]

print df

完全错误：

raise IndexingError('Unalignable boolean Series provided as 'pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

预期输出：

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

【问题讨论】：

标签： python pandas

【解决方案1】：

您需要loc，因为按列过滤：

print (df.notnull().any(axis = 0))
a     True
b     True
c     True
d    False
dtype: bool

df = df.loc[:, df.notnull().any(axis = 0)]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

或者过滤列然后按[]选择：

print (df.columns[df.notnull().any(axis = 0)])
Index(['a', 'b', 'c'], dtype='object')

df = df[df.columns[df.notnull().any(axis = 0)]]
print (df)

     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

或dropna 和参数how='all' 仅删除由NaNs 填充的所有列：

print (df.dropna(axis=1, how='all'))
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

【讨论】：

啊哈，因为df[] 方法正在寻找基于行的索引，而不是基于列的索引。明白了，谢谢。
@pshep123 - 很高兴能帮上忙！
这是违反直觉的，因为索引数据帧的最简单形式是关联的，即选择具有列标题之一的列：df['headername']
我正在寻找 df.loc 在列上一起使用两个条件（通过 '&'），不知道 ':,' 在 df.loc[:,] 中很重要.谢谢！

【解决方案2】：

您可以将dropna 与axis=1 和thresh=1 一起使用：

In[19]:
df.dropna(axis=1, thresh=1)

Out[19]: 
     a    b    c
0  1.0  4.0  NaN
1  2.0  NaN  8.0
2  NaN  6.0  9.0
3  NaN  NaN  NaN

这将删除任何没有至少 1 个非 NaN 值的列，这意味着任何包含所有 NaN 的列都将被删除

你尝试失败的原因是因为布尔掩码：

In[20]:
df.notnull().any(axis = 0)

Out[20]: 
a     True
b     True
c     True
d    False
dtype: bool

不能在默认使用的索引上对齐，因为这会在列上产生一个布尔掩码

【讨论】：

谢谢 Ed - 我不知道 thresh 参数。刚刚了解到您可以同时使用两个轴来修剪所有空行和列：df = df.dropna(axis = [0,1], how = 'all')
是的，它非常灵活且有用的方法

【解决方案3】：

我来到这里是因为我试图像这样过滤前 2 个字母：

filtered = df[(df.Name[0:2] != 'xx')]

解决方法是：

filtered = df[(df.Name.str[0:2] != 'xx')]

【讨论】：