IndexingError：作为索引器提供的不可对齐的布尔系列答案

【问题标题】：IndexingError: Unalignable boolean Series provided as indexerIndexingError：作为索引器提供的不可对齐的布尔系列
【发布时间】：2020-12-15 00:19:03
【问题描述】：

假设我有一个数据集，其头部如下

https://gist.github.com/ahmadmustafaanis/9ba3b5ea25b46b2b87ab858dc57ec15d

现在我想检查 df['Link'] 中的链接是否包含“edx”或“coursera”，那么名称中也应该包含它。

我首先必须查看所有链接，其中包含“edx”或“coursera”。我的逻辑是

df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)

对于其中包含 Coursera 或 Edx 的链接，返回 True 和 False 的布尔系列。

现在，如果我想使用布尔索引通过将此代码封装在 df[mycode] 或 df.loc[mycode] 中来访问整个数据帧，它会给我错误和警告。

df[df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)]

警告是

<ipython-input-47-d903df486dc7>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df[df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)]

错误信息是

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

【问题讨论】：

标签： python pandas dataframe

【解决方案1】：

你的代码行对我来说都没有失败。似乎是一种非常复杂的过滤数据框的方法。只需为您想要的行定义一个具有True 的掩码，然后使用loc[mask]

import requests
res = requests.get("https://gist.githubusercontent.com/ahmadmustafaanis/9ba3b5ea25b46b2b87ab858dc57ec15d/raw/53c5f357f2e9db0d37e420a9b18a60ac7a8bdfa6/test.csv")
df = pd.read_csv(io.StringIO(res.content.decode()))

df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)
df[df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)]

mask = df["Link"].str.contains("coursera") | df["Link"].str.contains("edx")
df.loc[mask]

【讨论】：