【发布时间】:2021-10-23 13:33:18
【问题描述】:
我有一个数据框df,它有一列不同的名称。我有可变数据框,例如search_df 或 search_df1 包含我想在名称列中通过正则表达式搜索的搜索词。
如果找到该单词,请将其写入新列,例如df_final$which_word_search_df。
如果找到多个单词,我想将结果粘贴在一起。
结果应该类似于df_final。
# load packages
pacman::p_load(tidyverse)
# words I would like to search for
search_df <- data.frame(search_words = c("apple", "peach"))
search_df1 <- data.frame(search_words = c("strawberry", "peach", "banana"))
# data frame which is the basis for my search
df <- data.frame(name = c("apple123", "applepeach", "peachtime", "peachab", "bananarrr", "bananaxy"))
# how I expect the final result to look like
df_final <- data.frame(name = c("apple123", "applepeach", "peachtime", "peachab", "bananarrr", "bananaxy"),
which_word_search_df = c("apple", "apple; peach", "peach", "peach", NA, NA),
which_word_search_df1 = c(NA, NA, "peach", "peach", "banana", "banana"))
这是我目前的解决方案,但您可以看到它不是动态的。我手动输入每个搜索词,而不是自动遍历所有搜索词。
df_trial <- df %>%
mutate(which_search_word_trial = ifelse(grepl("apple", name, ignore.case = T), "apple", ""),
which_search_word_trial = ifelse(grepl("peach", name, ignore.case = T),
paste(which_search_word_trial, "peach", sep = ";"), which_search_word_trial)
)
我分享的例子只是一个最小的例子。对于实际用例,df 将有 ~200k 行,而我的 search_df 将有 ~1k 行。
【问题讨论】: