检查 Pandas DF Column1 是否包含（str）Column2 [重复]答案

【问题标题】：Check if Pandas DF Column1 Contains (str) Column2 [duplicate]检查 Pandas DF Column1 是否包含（str）Column2 [重复]
【发布时间】：2020-07-10 20:57:54
【问题描述】：

我正在尝试在 Pandas DataFrame 中创建一个列，以显示（字符串）“Column1”是否包含“Column2”中的字符串。下面的可复制示例：

# Have
df = pd.DataFrame({'col1': ['a', 'aa', 'b', 'bb', 'c', 'cc'],
                    'col2': ['a', 'b',  'c', 'd',  'e', 'c']})
# Want: Series of 'does col1 contain col2?'
want: pd.Series([True, False, False, False, False, True])

# tried
tried = df.col1.str.contains(df.col2) # TypeError

我的错误是由于str.contains 想要在上面右侧的单个字符串，而不是另一个pd.Series。但我不确定如何解决这个问题......

【问题讨论】：

标签： python pandas string series

【解决方案1】：

这是使用pd.DataFrame.apply 和lambda 函数的一种循环方式。

df = pd.DataFrame({'col1': ['a', 'aa', 'b', 'bb', 'c', 'cc'],
                    'col2': ['a', 'b',  'c', 'd',  'e', 'c']})

df['test'] = df.apply(lambda x: x['col2'] in x['col1'], axis=1)

结果：

  col1 col2   test
0    a    a   True
1   aa    b  False
2    b    c  False
3   bb    d  False
4    c    e  False
5   cc    c   True

【讨论】：

【解决方案2】：

这不是一个你想的那么简单的问题，因为你无法合理地将它矢量化。

您的首选应该是列表推导式。

pd.Series([b in a for a, b in zip(df.col1, df.col2)])

0     True
1    False
2    False
3    False
4    False
5     True
dtype: bool

您的第二选择是np.vectorize:

f = np.vectorize(lambda a, b: b in a)
pd.Series(f(df.col1, df.col2))

0     True
1    False
2    False
3    False
4    False
5     True
dtype: bool

你最后的选择应该是apply，@jpp 已经介绍过了。

【讨论】：