【问题标题】:How to check if first word of a DataFrame string column is present in a List in Python?如何检查 Python 列表中是否存在 DataFrame 字符串列的第一个单词?
【发布时间】:2018-12-17 07:17:24
【问题描述】:

我有一个 DataFrame df_sentences 和一个 List question_words 如下:

df_sentences:

sentence                         label
you will not forget this movie   0
will the novel ever die          1
why we drink alcohol             1
did trump win the election       1
ambiance is perfect              0


question_words = ['what', 'why', 'when', 'where', 'whose', 'which', 'whom', 'who', 'how', 
                         'do', 'are', 'will', 'did', 'will', 'am', 'are', 'was', 'were', 'can', 'has', 'have']

我想检查sentence 列的第一个单词是否存在于列表question_words 中,并将结果返回到新列ques_word

预期输出:

sentence                         label  ques_word
you will not forget this movie   0      0
will the novel ever die          1      1
why we drink alcohol             1      1
did trump win the election       1      1
the ambiance is perfect          0      0

到目前为止,我尝试的是使用.str.contains('|'.join(question_words)).astype(int),但正如预期的那样,它返回与question_words 列表匹配的所有子字符串的所有数量。

【问题讨论】:

    标签: python regex dataframe


    【解决方案1】:
    .str.split(" ")[0].contains('|'.join(question_words)).astype(int)
    

    应该做的工作

    【讨论】:

      【解决方案2】:

      如果您想要一个快速的解决方案,请使用列表推导式。

      q_set = set(question_words)
      df['ques_word'] = [
          1 if w.split(None, 1)[0]  in q_set else 0 for w in df.sentence
      ]
      

      df
                               sentence  label  ques_word
      0  you will not forget this movie      0          0
      1         will the novel ever die      1          1
      2            why we drink alcohol      1          1
      3      did trump win the election      1          1
      4             ambiance is perfect      0          0
      

      【讨论】:

      • 谢谢!正是我想要的。
      猜你喜欢
      • 2017-02-28
      • 2021-10-18
      • 1970-01-01
      • 1970-01-01
      • 2013-08-01
      • 2019-10-03
      • 2021-10-24
      相关资源
      最近更新 更多