【发布时间】:2021-09-28 10:28:37
【问题描述】:
我的问题与How to test if a string contains one of the substrings in a list, in pandas? 非常相似,只是要检查的子字符串列表因观察而异,并且存储在列表列中。有没有办法通过引用系列以矢量化方式访问该列表?
示例数据集
import pandas as pd
df = pd.DataFrame([{'a': 'Bob Smith is great.', 'b': ['Smith', 'foo'])},
{'a': 'The Sun is a mass of incandescent gas.', 'b': ['Jones', 'bar']}])
print(df)
我想生成第三列“c”,如果任何“b”字符串是其相应行的“a”的子字符串,则该列等于 1,否则为零。也就是说,我希望在这种情况下:
a b c
0 Bob Smith is great. [Smith, foo] 1
1 The Sun is a mass of incandescent gas. [Jones, bar] 0
我的尝试:
df['c'] = df.a.str.contains('|'.join(df.b)) # Does not work.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_4092606/761645043.py in <module>
----> 1 df['c'] = df.a.str.contains('|'.join(df.b)) # Does not work.
TypeError: sequence item 0: expected str instance, list found
【问题讨论】:
标签: python pandas string dataframe match