【发布时间】:2016-05-15 23:55:02
【问题描述】:
我正在尝试添加一个基于另一个使用模式匹配的新列。 我读过this post,但没有得到想要的输出。
我想基于 GreatGroup 列创建一个新列 (SubOrder)。 我尝试了以下方法:
SubOrder <- rep(NA_character_, length(myData))
SubOrder[grepl("udults", myData, ignore.case = TRUE)] <- "Udults"
SubOrder[grepl("aquults", myData, ignore.case = TRUE)] <- "Aquults"
SubOrder[grepl("aqualfs", myData, ignore.case = TRUE)] <- "aqualfs"
SubOrder[grepl("humods", myData, ignore.case = TRUE)] <- "humods"
SubOrder[grepl("udalfs", myData, ignore.case = TRUE)] <- "udalfs"
SubOrder[grepl("orthods", myData, ignore.case = TRUE)] <- "orthods"
SubOrder[grepl("udalfs", myData, ignore.case = TRUE)] <- "udalfs"
SubOrder[grepl("psamments", myData, ignore.case = TRUE)] <- "psamments"
SubOrder[grepl("udepts", myData, ignore.case = TRUE)] <- "udepts"
SubOrder[grepl("fluvents", myData, ignore.case = TRUE)] <- "fluvents"
SubOrder[grepl("aquods", myData, ignore.case = TRUE)] <- "aquods"
例如,我在任何单词中查找“udults”,例如 Hapludults 或 Paleudults,然后只返回“udults”。
编辑:如果有人想看看 alistaire 的评论,这就是我会使用的搜索模式。
subOrderNames <- c("Udults", "Aquults", "Aqualfs", "Humods", "Udalfs", "Orthods", "Psamments", "Udepts", "fluvents")
下面的示例数据。
myData <- dput(head(test))
structure(list(1:6, SID = c(200502L, 200502L, 200502L, 200502L,
200502L, 200502L), Groupdepth = c(11L, 12L, 13L, 14L, 21L, 22L
), AWC0to10 = c(0.12, 0.12, 0.12, 0.12, 0.12, 0.12), AWC10to20 = c(0.12,
0.12, 0.12, 0.12, 0.12, 0.12), AWC20to50 = c(0.12, 0.12, 0.12,
0.12, 0.12, 0.12), AWC50to100 = c(0.15, 0.15, 0.15, 0.15, 0.15,
0.15), Db3rdbar0to10 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43),
Db3rdbar10to20 = c(1.43, 1.43, 1.43, 1.43, 1.43, 1.43), Db3rdbar20to50 = c(1.43,
1.43, 1.43, 1.43, 1.43, 1.43), Db3rdbar50to100 = c(1.43,
1.43, 1.43, 1.43, 1.43, 1.43), HydrcRatngPP = c(0L, 0L, 0L,
0L, 0L, 0L), OrgMatter0to10 = c(1.25, 1.25, 1.25, 1.25, 1.25,
1.25), OrgMatter10to20 = c(1.25, 1.25, 1.25, 1.25, 1.25,
1.25), OrgMatter20to50 = c(1.02, 1.02, 1.02, 1.02, 1.02,
1.02), OrgMatter50to100 = c(0.12, 0.12, 0.12, 0.12, 0.12,
0.12), Clay0to10 = c(8, 8, 8, 8, 8, 8), Clay10to20 = c(8,
8, 8, 8, 8, 8), Clay20to50 = c(9.4, 9.4, 9.4, 9.4, 9.4, 9.4
), Clay50to100 = c(40, 40, 40, 40, 40, 40), Sand0to10 = c(85,
85, 85, 85, 85, 85), Sand10to20 = c(85, 85, 85, 85, 85, 85
), Sand20to50 = c(83, 83, 83, 83, 83, 83), Sand50to100 = c(45.8,
45.8, 45.8, 45.8, 45.8, 45.8), pHwater0to20 = c(6.3, 6.3,
6.3, 6.3, 6.3, 6.3), Ksat0to10 = c(23, 23, 23, 23, 23, 23
), Ksat10to20 = c(23, 23, 23, 23, 23, 23), Ksat20to50 = c(19.7333,
19.7333, 19.7333, 19.7333, 19.7333, 19.7333), Ksat50to100 = c(9,
9, 9, 9, 9, 9), TaxClName = c("Fine, mixed, semiactive, mesic Oxyaquic Hapludults",
"Fine, mixed, semiactive, mesic Oxyaquic Hapludults", "Fine, mixed, semiactive, mesic Oxyaquic Hapludults",
"Fine, mixed, semiactive, mesic Oxyaquic Hapludults", "Fine, mixed, semiactive, mesic Oxyaquic Hapludults",
"Fine, mixed, semiactive, mesic Oxyaquic Hapludults"), GreatGroup = c("Hapludults",
"Hapludults", "Hapludults", "Hapludults", "Hapludults", "Hapludults"
)), .Names = c("", "SID", "Groupdepth", "AWC0to10", "AWC10to20",
"AWC20to50", "AWC50to100", "Db3rdbar0to10", "Db3rdbar10to20",
"Db3rdbar20to50", "Db3rdbar50to100", "HydrcRatngPP", "OrgMatter0to10",
"OrgMatter10to20", "OrgMatter20to50", "OrgMatter50to100", "Clay0to10",
"Clay10to20", "Clay20to50", "Clay50to100", "Sand0to10", "Sand10to20",
"Sand20to50", "Sand50to100", "pHwater0to20", "Ksat0to10", "Ksat10to20",
"Ksat20to50", "Ksat50to100", "TaxClName", "GreatGroup"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -6L))
【问题讨论】:
-
为了让你的代码更干燥,制作你的模式向量(和替换,如果它们不同的话),并使用
sapply调用grepl或gsub或任何你喜欢。 -
我尝试了类似的东西:subOrderNames
-
使用
for循环,pat <- c('udults', 'aquults', 'aqualfs', 'humods', 'udalfs', 'orthods', 'psamments', 'udepts', 'fluvents', 'aquods'); for(x in 1:length(pat)){SubOrder[grepl(pat[x], myData$GreatGroup, ignore.case = TRUE)] <- pat[x]}为替换创建第二个向量,如果需要,将其替换为第二个pat[x]。 -
或者更直接地说,
myData$SubOrder <- myData$GreatGroup; for(x in pat){myData$SubOrder <- gsub(paste0('.*', x, '.*'), x, myData$SubOrder, ignore.case = TRUE)}。如果在这种情况下没有匹配,那么它的值将保留为GreatGroup而不是NA。
标签: r regex pattern-matching