【问题标题】:Check if Multiple Strings are present in a DataFrame Column检查 DataFrame 列中是否存在多个字符串
【发布时间】:2021-10-18 09:35:37
【问题描述】:

我想检查列表中的项目是否在我的 DF 的列中。

简单明了的基础知识:

fruit = ['apple','banana']    # This items should be in the column 
fruit = ', '.join(fruit)      # Think this is the point where it goes wrong... 

fruit_resulst = df['all_fruit'].str.contains(fruit) # Check if column contains fruit 
df_new = df[fruit_resulst]   # Filter so that we only keep the TRUEs 

这可行,但不完全。它仅按此特定顺序工作,但我希望它在所有订单中工作(例如,如果列行包含列表中的所有项目,那么我想保留它们。否则,删除。

df['all_fruit']

Apple, Banana             #Return! Because it contains apple and banana
Banana                    # Do not return 
Banana, Apple             #Return! Because it contains apple and banana    
Apple                     # Do not return
Apple, Banana, Peer       #Return! Because it contains apple and banana

提前非常感谢!

【问题讨论】:

    标签: python pandas string dataframe


    【解决方案1】:

    将值转换为小写,然后拆分为列表并通过将fruit 转换为set 来测试issubset

    df1 = df[df.all_fruit.str.lower().str.split(', ').map(set(fruit).issubset)]
    print (df1)
                 all_fruit
    0        Apple, Banana
    2        Banana, Apple
    4  Apple, Banana, Peer
    

    您的解决方案将布尔掩码列表传递给np.logical_and.reduce

    df1 = df[np.logical_and.reduce([df.all_fruit.str.contains(f, case=False) for f in fruit])]
    print (df1)
                 all_fruit
    0        Apple, Banana
    2        Banana, Apple
    4  Apple, Banana, Peer
    

    【讨论】:

    • 这是完美的。快速一个(抱歉打扰,希望您能提供帮助) - 是否也可以在原始 DF 中添加一列,显示 FALSE,除非匹配为 True?然后它显示True?
    • @Roverflow - 第一个解决方案df['test'] = ~df.all_fruit.str.lower().str.split(', ').map(set(fruit).issubset),第二个解决方案df['test'] = ~np.logical_and.reduce([df.all_fruit.str.contains(f, case=False) for f in fruit])
    • @Roverflow - 所以它的意思是假,真,假,真,假?
    • 非常感谢!反过来……对,错,对,错,对 :-)
    • @Roverflow - 然后删除 ~ 用于反转掩码
    【解决方案2】:
    df = pd.DataFrame({'all_fruit': [
        'Apple, Banana',
        'Banana',
        'Banana, Apple',
        'Apple',
        'Apple, Banana, Peer',
    ]})
    fruit = ['apple','banana']
    have_fruits = [df.all_fruit.str.contains(f, case=False) for f in fruit]
    indexes = True
    for f in have_fruits:
        indexes = indexes * f
    df[indexes]
    

    【讨论】:

      【解决方案3】:

      试试这个代码:

      x = df['all_fruit'].str.split(',', expand=True)
      print(df[x.replace('Apple', '').ne(x).any(1) & x.replace(' Banana', '').ne(x).any(1)])
      

      输出:

                   all_fruit
      0        Apple, Banana
      2        Banana, Apple
      4  Apple, Banana, Peer
      

      【讨论】:

        猜你喜欢
        • 2013-08-01
        • 2019-10-03
        • 2021-04-13
        • 1970-01-01
        • 2011-03-24
        • 2021-07-07
        • 1970-01-01
        相关资源
        最近更新 更多