选择 A 列的值以 B 列的值开头的行答案

【问题标题】：Select rows where value of column A starts with value of column B选择 A 列的值以 B 列的值开头的行
【发布时间】：2020-10-13 14:22:52
【问题描述】：

我有一个 pandas 数据框，并且想要选择列的值以另一列的值开头的行。我尝试了以下方法：

import pandas as pd

df = pd.DataFrame({'A': ['apple', 'xyz', 'aa'],
                   'B': ['app', 'b', 'aa']})

df_subset = df[df['A'].str.startswith(df['B'])]

但它出错了，我发现 this solutions 也没有帮助。

KeyError: "None of [Float64Index([nan, nan, nan], dtype='float64')] are in the [columns]"

来自here 的np.where(df['A'].str.startswith(df['B']), True, False) 也为所有返回True。

【问题讨论】：

标签： python pandas string filter

【解决方案1】：

对于逐行比较，我们可以使用DataFrame.apply:

m = df.apply(lambda x: x['A'].startswith(x['B']), axis=1)
df[m]

       A    B
0  apple  app
2     aa   aa

您的代码不起作用的原因是因为Series.str.startswith 接受character sequence（字符串标量），而您使用的是熊猫Series。引用docs：

拍拍：str
字符序列。不接受正则表达式。

【讨论】：

太棒了！我也尝试过apply 和lambda，但没能成功；缺少axis=1。
是的，一开始可能会令人困惑，基本上这个想法是你想在每一行（所以在列轴上）而不是每列（这是索引轴）上应用你的函数。在这个axis='columns' 中也足够了。

【解决方案2】：

您可能需要使用 for 循环，因为 str.startswith 不支持行检查

[x.startswith(y) for x , y in zip(df.A,df.B)]
Out[380]: [True, False, True]
df_sub=df[[x.startswith(y) for x , y in zip(df.A,df.B)]].copy()

【讨论】：

【解决方案3】：

不用for循环也可以实现：

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': ['apple', 'xyz', 'aa'],
                   'B': ['app', 'b', 'aa']})

ufunc = np.frompyfunc(str.startswith, 2, 1)
idx = ufunc(df['A'], df['B'])
df[idx]

Out[22]: 
       A    B
0  apple  app
2     aa   aa

【讨论】：