【发布时间】:2021-02-02 13:25:19
【问题描述】:
我有这个数据框 (DF1)
structure(list(ID = 1:3, Text = c("there was not clostridium", "clostridium difficile positive", "test was OK")), class = "data.frame", row.names = c(NA, -3L))
ID TEXT
1 "there was not clostridium"
2 "clostridium difficile positive"
3 "test was OK"
和数据框(DF2)
structure(list(ID = 1:3, Microorganisms = c("ESCHERICHIA COLI", "CLOSTRIDIUM DIFFICILE", "FUNGI")), class = "data.frame", row.names = c(NA, -3L))
ID Microorganisms
1 ESCHERICHIA COLI
2 CLOSTRIDIUM DIFFICILE
3 FUNGI
我想用正则表达式找到匹配的 DF1 和 DF2 并将它们放到这样的新列中
ID TEXT Microorganism
1 "there was not clostridium" CLOSTRIDIUM DIFFICILE
2 "clostridium difficile positive" CLOSTRIDIUM DIFFICILE
3 "test was OK" no
我尝试过类似的方法
DF1 %>% mutate(Mikroorganism = ifelse(grepl(DF2$Microorganisms, TEXT), str_extract(TEXT, DF2$Microorganisms), "no"))
但事实并非如此。
【问题讨论】:
-
一个简单的正则表达式不适用于您的第一行:没有
"difficile"。您是否正在寻找与DF2中的任何单词匹配的匹配项,而不是整个字符串? -
是的,我想匹配 DF2 中的任何单词。有可能吗?
标签: r matching string-matching