如果数据框单元格不包含列表中的单词，则清除它答案

【问题标题】：Clear dataframe cell if it does not contain a word from list如果数据框单元格不包含列表中的单词，则清除它
【发布时间】：2021-12-16 01:43:40
【问题描述】：

我有一个带有标记化文本的数据框，如下所示：

index  id    text1                   text2        
1      123   ['it', 'was', 'cold']   ['i', 'wasn't', 'there']   
2      124                           ['hello', 'there'] 
3      125   ['the', 'heat']         ['the' 'cold']     
4      126                           ['the', 'heat']

还有一个包含天气词语的列表，例如lst = ['heat', 'cold', 'rain']

如果它不包含列表中的单词，我想要清除数据框中的单元格。所以数据框最终会如下所示：

index  id    text1                   text2        
1      123   ['it', 'was', 'cold']   
2      124                            
3      125   ['the', 'heat']         ['the' 'cold']     
4      126                           ['the', 'heat']

到目前为止，我只找到了在找不到单词时清除整行的解决方案，但我希望数据框保持完整，尤其是保持 id 列！！

我想解决的另一个问题是能够标记列会很棒。在这种情况下，我可以将列表拆分为 pos=[heat, sun] 和 neg=[cold, rain]。因此，输出将是：

index  id    text1                  label1   text2           label2  
1      123   ['it', 'was', 'cold']  neg 
2      124                            
3      125   ['the', 'heat']        pos      ['the' 'cold']  neg
4      126                                   ['the', 'heat'] pos

提前致谢！

【问题讨论】：

标签： python pandas list dataframe tokenize

【解决方案1】：

lst = ['heat', 'cold', 'rain']
df.iloc[:,2:] = df.iloc[:,2:].applymap(lambda x: x if set(x.split()).intersection(lst) else '')
print(df)

输出：

index   id        text1     text2 text3
1      123  it was cold                
2      124                             
3      125     the heat  the cold      
4      126      my bike  the heat

【讨论】：

嗨！您的解决方案并不能完全解决我的问题（某些项目没有被删除，或者没有任何重叠的项目被保留或采用了错误的列等）。我对代码进行了一些更改并更新了我的问题，您能帮我解决更新后的问题吗？