【问题标题】:Str_Extract Issues: Missing PatternsStr_Extract 问题:缺少模式
【发布时间】:2019-06-25 03:20:41
【问题描述】:

我有一个我试图从示例数据集中提取的颜色列表。它似乎错过了一些颜色并找到了其他颜色。

color_list <- c("gray", "brown", "green", "plum", "mist", "forest", "sienna", "grape", "ruby", "emerald", "copper", 
                "silver", "gold", "blue")
str_extract(df, fixed(color_list, ignore_case = TRUE))
[1] "GRAY"   NA       NA       NA       NA       NA       NA       NA       NA       NA       NA       "silver" "GOLD"   "blue"  

但第一个匹配项应该是“silver”。

样本数据:

structure(list(df = c("Tsilver flash mirror", "E:~ ADD FLASH FRONT MI", 
"E:~", "E##T Color: G 15#3; MC", "E:~ ## PLEASE USE 8 BA", "E:~ ## blue flash ##", 
"E:~ ## Silver Mirror #", "Ssilver mirror", "E:~ ## Treatment: Fee-", 
"E:~Further Instruction", "E:~ ## FORREST GRAY Xp", "ESILVER", 
"EGOLD")), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13"))

是否也可以使用 Str_Extract 进行“模糊”匹配?因为数据中有一些颜色拼写错误。

【问题讨论】:

  • str_extract_all(df, paste(color_list,collapse = "|"))
  • @M-M 当此示例数据中至少有 (5) 个结果时,此代码仅返回 (3) 个结果。
  • 那是因为它区分大小写。

标签: r stringr


【解决方案1】:

以下代码将输出一个数据框,其中包含用于提取的列。我放入 tolower() 函数将示例更改为全小写。如果您想要“模糊”匹配,您可能需要查看正则表达式。 https://stringr.tidyverse.org/articles/regular-expressions.html

 example <- structure(list(df = c("Tsilver flash mirror", "E:~ ADD FLASH FRONT MI", 
"E:~", "E##T Color: G 15#3; MC", "E:~ ## PLEASE USE 8 BA", "E:~ ## blue flash ##", 
"E:~ ## Silver Mirror #", "Ssilver mirror", "E:~ ## Treatment: Fee-", 
"E:~Further Instruction", "E:~ ## FORREST GRAY Xp", "ESILVER", 
"EGOLD")), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13"))

color_list <- c("gray", "brown", "green", "plum", "mist", "forest", "sienna", "grape", "ruby", "emerald", "copper", 
                "silver", "gold", "blue")

example %>% 
  mutate(extract = str_extract(tolower(df), paste(color_list, collapse = "|")))

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 2016-12-17
    • 1970-01-01
    • 1970-01-01
    • 2011-06-02
    • 2016-03-08
    • 1970-01-01
    • 1970-01-01
    • 2015-11-23
    相关资源
    最近更新 更多