【问题标题】:How to test if a string contains one of the substrings stored in a list column in pandas?如何测试字符串是否包含存储在熊猫列表列中的子字符串之一?
【发布时间】:2021-09-28 10:28:37
【问题描述】:

我的问题与How to test if a string contains one of the substrings in a list, in pandas? 非常相似,只是要检查的子字符串列表因观察而异,并且存储在列表列中。有没有办法通过引用系列以矢量化方式访问该列表?

示例数据集

import pandas as pd

df = pd.DataFrame([{'a': 'Bob Smith is great.', 'b': ['Smith', 'foo'])},
                   {'a': 'The Sun is a mass of incandescent gas.', 'b': ['Jones', 'bar']}])
print(df)

我想生成第三列“c”,如果任何“b”字符串是其相应行的“a”的子字符串,则该列等于 1,否则为零。也就是说,我希望在这种情况下:

                                        a             b  c
0                     Bob Smith is great.  [Smith, foo]  1
1  The Sun is a mass of incandescent gas.  [Jones, bar]  0

我的尝试:

df['c'] = df.a.str.contains('|'.join(df.b))  # Does not work.


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_4092606/761645043.py in <module>
----> 1 df['c'] = df.a.str.contains('|'.join(df.b))  # Does not work.

TypeError: sequence item 0: expected str instance, list found

【问题讨论】:

    标签: python pandas string dataframe match


    【解决方案1】:

    您可以只使用zip 和列表理解:

    df['c'] = [int(any(w in a for w in b)) for a, b in zip(df.a, df.b)]
    
    df
    #                                        a             b  c
    #0                     Bob Smith is great.  [Smith, foo]  1
    #1  The Sun is a mass of incandescent gas.  [Jones, bar]  0
    

    如果你不关心大小写:

    df['c'] = [any(w.lower() in a for w in b) for a, b in zip(df.a.str.lower(), df.b)]
    

    【讨论】:

      猜你喜欢
      • 2021-06-15
      • 2020-06-22
      • 2019-07-31
      • 2015-09-16
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多