【问题标题】:For each row in Pandas dataframe, check if row contains string from list对于 Pandas 数据框中的每一行,检查行是否包含列表中的字符串
【发布时间】:2019-11-20 11:10:46
【问题描述】:

我有一个给定的字符串列表,如下所示:

List=['plastic', 'carboard', 'wood']

我的数据框中有一列 dtype 字符串,如下所示:

Column=['beer plastic', 'water cardboard', 'eggs plastic', 'fruits wood']

对于列中的每一行,我想知道该行是否包含列表中的一个单词,如果是,我想只保留该单词之前的文本,如下所示:

New_Column=['beer', 'water', 'eggs', 'fruits']

有没有办法对我的数据框的每一行(数百万行)进行系统化?谢谢

PS。我试过用正则表达式模式匹配这样的函数来构建一个函数

pattern=re.compile('**Pattern to be defined to include element from list**')

def truncate(row, pattern):
    Column=row['Column']
    if bool(pattern.match(Column)):
        Column=Column.replace(**word from list**,"")
        return Column

df['New_column']=df.apply(truncate,axis=1, pattern=pattern)

【问题讨论】:

    标签: python pandas


    【解决方案1】:
    ##df
    
          0
    0     beer plastic
    1  water cardboard
    2     eggs plastic
    3      fruits wood
    
    
    l=['plastic', 'cardboard', 'wood']
    


    使用str.findall
    df[0].str.findall('\w+\s*(?=' + '|'.join(l) +')').apply(lambda x: x[0].strip() if len(x) else 'NotFound')
    
    ##output
    
    0      beer
    1     water
    2      eggs
    3    fruits
    Name: 0, dtype: object
    

    【讨论】:

      【解决方案2】:
      import pandas as pd
      ...
      for index, row in df.iterrows():
          for word in List_name:
              row['Column_name'] = row['Column_name'].partition(word)[0] if (word in row['Column_name']) else row['Column_name']
      

      如果你想运行一个工作示例:

      import pandas as pd
      
      List=['plastic', 'carboard', 'wood']
      df = pd.DataFrame([{'c1':"fun carboard", 'c2':"jolly plastic"}, {'c1':"meh wood",'c2':"aba"}, {'c1':"aaa",'c2':"bbb"}, {'c1':"old wood",'c2':"bbb"}])
      
      for index, row in df.iterrows():
          for word in List:
              row['c1'] = row['c1'].partition(word)[0] if (word in row['c1']) else row['c1']
              row['c2'] = row['c2'].partition(word)[0] if (word in row['c2']) else row['c2']
      df
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 2017-10-06
        • 2021-03-22
        • 1970-01-01
        • 2017-09-12
        • 2018-08-24
        • 2020-09-19
        • 2017-09-12
        相关资源
        最近更新 更多