python：如果一个匹配条件（字符串比较），如何删除组中的所有项目？答案

【问题标题】：python: How to delete all items in a group if one matches a condition (string comparison)?python：如果一个匹配条件（字符串比较），如何删除组中的所有项目？
【发布时间】：2021-06-29 00:00:08
【问题描述】：

免责声明：我尝试在 SQL 中执行此操作，但没有一个答案/我的尝试奏效，所以我一直在尝试使用 python，因为它似乎更适合

我希望创建一个函数，如果其中任何一个符合特定条件，则可以删除组中的所有项目。

具体来说，我有一个数据集“家庭”，如果该家庭包含双胞胎，我想删除该家庭的所有成员。

数据集的一部分如下所示：

Subject ID	Mother_ID	Zygosity_SR
1001	2001	MZ
1002	2001	MZ
1003	2001	NotTwin
1004	2002	NotTwin
1005	2002	NotTwin

在这种情况下，我想删除所有具有与 Zygosity_SR = MZ 的主题相同 Mother_ID 的个人的行。

我的结果表如下所示：

Subject ID	Mother_ID	Zygosity_SR
1004	2002	NotTwin
1005	2002	NotTwin

这是我的python代码：

import pandas as pd

family = pd.read_excel('HCP database 97 excel vers.xlsx')
family_drop = family.groupby('Mother_ID').filter(lambda x: x['ZygositySR'].str.strip() == 'MZ' )
family_drop.reset_index(drop=True, inplace=True)
family_drop = family_drop[['Subject','Mother_ID']] 
print(family_drop)

我收到了错误：

TypeError: filter function returned a Series, but expected a scalar bool

任何有关如何解决此问题的提示将不胜感激。非常感谢！

【问题讨论】：

我们如何识别双胞胎？如果Zygosity_SR中有重复，我们认为它是双胞胎？原始数据中没有任何内容表明 MZ 行是双胞胎。要概括解决方案，请分享其他信息。否则，它将是价值MZ的定制解决方案@

标签： python pandas conditional-statements pandas-groupby

【解决方案1】：

如果您需要删除 MZ 及其母亲的行：

tdf = df[df['Zygosity_SR'] == 'MZ']   # rows of MZ
tset = set(tdf['Mother_ID'])          # set of MZ's Mother_ID
fdf = df[~df['Mother_ID'].isin(tset)] # rows with NotTwin Mother_ID

这个条件~df['Mother_ID'].isin(tset)表示过滤掉集合中的Mother_ID。

print(fdf)
   Subject ID  Mother_ID Zygosity_SR
3        1004       2002     NotTwin
4        1005       2002     NotTwin

【讨论】：

【解决方案2】：

这条 SQL 语句似乎可以满足您的要求（删除其母亲生过双胞胎的所有受试者）。

delete 
from table 
where mother_id in (
    select distinct mother_id 
    from table 
    where Zygosity_SR = MZ
)

【讨论】：

【解决方案3】：

DataFrameGroupBy.filter() 需要一个布尔值来确定是否应该返回给定的组。

在这种情况下，您似乎正在尝试获取“所有Zygosity_SR 值都不是MZ”的组：

family.groupby('Mother_ID').filter(
    lambda group: all(group.Zygosity_SR.str.strip() != 'MZ'))

#    Subject ID  Mother_ID Zygosity_SR
# 3        1004       2002     NotTwin
# 4        1005       2002     NotTwin

【讨论】：